Unexpected restart

ali_ebrahimi · January 5, 2022, 10:20pm

Hello,
We are experiencing unexpected restart of openvidu container from time to time while presumably there are enough resource on this physical server. It starts with java complaining about insufficient memory and then java.lang.OutOfMemoryError and crash.
log
As specified in the log file, it is an Intel E5-2697 v2 with 125 GB of ram and Ubuntu 20.04.
We already changed java heap size (-Xms/-Xmx) in openvidu java command.
Is there any tweak or tuning we can make in the server to prevent this or at least less frequent?

pabloFuente · January 6, 2022, 6:04pm

Seeing your log, I’m afraid that the JVM is simply telling that it cannot allocate 16KB of new RAM. How many GBs of memory does your server have?

micael.gallego · January 7, 2022, 3:37pm

Please make sure you configure the memory available to the OpenVidu Server java process using the JAVA_OPTIONS property in the .env file:

github.com

OpenVidu/openvidu/blob/master/openvidu-server/deployments/pro/docker-compose/openvidu-server-pro/.env#L311

    
      
          # --------------------------
          # Uncomment the next line and define this variable to change
          # the verbosity level of the logs of Openvidu Service
          # RECOMENDED VALUES: INFO for normal logs DEBUG for more verbose logs
          # OV_CE_DEBUG_LEVEL=INFO
          
          
# OpenVidu Java Options
          # --------------------------
          # Uncomment the next line and define this to add options to java command
          # Documentation: https://docs.oracle.com/cd/E37116_01/install.111210/e23737/configuring_jvm.htm#OUDIG00058
          # JAVA_OPTIONS=-Xms2048m -Xmx4096m
          
          
# ElasticSearch Java Options
          # --------------------------
          # Uncomment the next line and define this to add options to java command of Elasticsearch
          # Documentation: https://docs.oracle.com/cd/E37116_01/install.111210/e23737/configuring_jvm.htm#OUDIG00058
          # By default ElasticSearch is configured to use "-Xms2g -Xmx2g" as Java Min and Max memory heap allocation
          # ES_JAVA_OPTS=-Xms2048m -Xmx4096m
          
          
# Kibana And ElasticSearch Credentials Configuration
          # --------------------------

Currently there are a limitation and Java process is not using all the memory available in the machine.

Best regards

ali_ebrahimi · January 10, 2022, 10:41am

We run openvidu server java with these -Xms2048m -Xmx4096m args but no tangible changes felt and server still crashes while at least half of RAM is available.
top output has been logged here.
You can see it yourself in the top log that when crash took place somewhere around 09:51 to 09:53 no resource shortage observed.

cruizba · January 10, 2022, 5:04pm

I’m pretty sure you’re not setting up correctly those parameters. Can you show me the output of:

ps aux | grep openvidu-server

Regards

cruizba · January 10, 2022, 5:10pm

Also you don’t need to use those specific values, adapt to your environment. With 125GB try to use…
-Xms10240m -Xmx20480m. Play whith those parameters, until you reach your desired peak performance.

ali_ebrahimi · January 15, 2022, 5:56am

@cruizba Sorry for delay,

root@server6:~# ps aux | grep openvidu-server
root 309623 0.0 0.0 8164 2460 pts/0 S+ 05:51 0:00 grep --color=auto openvidu-server
root 4122463 19.6 1.9 31501944 2513976 ? Sl Jan14 123:27 java -Xms2048m -Xmx4096m -jar openvidu-server.jar

It explicitly says in the log that in order to prevent jvm from crashing one of the thing is decreasing heap size!

cruizba · January 17, 2022, 7:58pm

Sure, but did you try to increment the heap size in the way I’ve recommended? Where is this log you’re talking about?

ali_ebrahimi · January 17, 2022, 8:52pm

Yes, We did change java options as you said.

root@server14:~# ps aux |grep openvidu-server
root 753927 0.0 0.0 5196 2412 pts/0 S+ 20:42 0:00 grep --color=auto openvidu-server
root 3126609 36.3 5.8 49346368 7771104 ? Sl 13:25 158:41 java -Xms10240m -Xmx20480m -jar openvidu-server.jar

As the time of this very post, server has crashed and restarted 7 hours ago

root@server14:~# docker ps |grep openvidu-server
b8786352f3f6 openvidu/openvidu-server:2.20.0 “/usr/local/bin/entr…” 2 weeks ago Up 7 hours openvidu_node_1_openvidu-server_1

And in the openvidu container, a file has been created hs_err_pid16.log consisting of all crash related lines. This file is located beside openvidu-server.jar in /.

root@server14:/# ls -lsh
total 126M
4.0K drwxr-xr-x 1 root root 4.0K Jan 7 19:08 bin
4.0K drwxr-xr-x 2 root root 4.0K Apr 24 2018 boot
0 drwxr-xr-x 5 root root 340 Jan 17 13:25 dev
8.0K drwxr-xr-x 1 root root 4.0K Jan 5 12:32 etc
4.0K drwxr-xr-x 2 root root 4.0K Apr 24 2018 home
36M -rw-r–r-- 1 root root 36M Jan 7 12:47 hs_err_pid15.log
36M -rw-r–r-- 1 root root 36M Jan 17 13:25 hs_err_pid16.log
4.0K drwxr-xr-x 1 root root 4.0K May 23 2017 lib
4.0K drwxr-xr-x 2 root root 4.0K Aug 27 07:18 lib64
4.0K drwxr-xr-x 2 root root 4.0K Aug 27 07:16 media
4.0K drwxr-xr-x 2 root root 4.0K Aug 27 07:16 mnt
45M -rw-rw-r-- 1 root root 45M Sep 22 16:16 openvidu-server.jar
4.0K drwxr-xr-x 1 root root 4.0K Jan 3 12:37 opt
0 dr-xr-xr-x 943 root root 0 Jan 17 13:25 proc
4.0K drwx------ 1 root root 4.0K Jan 5 12:29 root
4.0K drwxr-xr-x 1 root root 4.0K Jan 3 12:37 run
4.0K drwxr-xr-x 2 root root 4.0K Aug 27 07:18 sbin
4.0K drwxr-xr-x 2 root root 4.0K Aug 27 07:16 srv
0 dr-xr-xr-x 13 root root 0 Jan 15 01:12 sys
16K drwxrwxrwt 1 root root 12K Jan 17 13:25 tmp
8.0K drwxr-xr-x 1 root root 4.0K Aug 27 07:16 usr
8.0K drwxr-xr-x 1 root root 4.0K Aug 27 07:18 var

cruizba · January 17, 2022, 9:38pm

There’s a lot of AbstractJsonRpcClientWebSocket blocked threads. Did your KMS container restarted also?. Do you have peaks of demand or it just happens randomly?

ali_ebrahimi · January 18, 2022, 9:07am

No, KMS did not restart while Openvidu did sporadically without any meaningful relation to load on server.
Actually there is 3 other server with same hardware and specifications which are all suffering this very same problem.

cruizba · January 18, 2022, 7:41pm

Hello @ali_ebrahimi

We have not seen such behaviour never, even in our demos infrastructure which actually have a lot of demand. This look like a memory leak due to locked threads, but it is seems related with your application logic. In order to help with your problem, we need to replicate that error and more information about the problem. I would like you for these requirements:

Are you using OpenVidu CE or PRO?
Do you see any kind of progression when this happen? I mean, the process takes more and more RAM progresively during the period it works correctly?
Would it be possible to know how many sessions you have when the error is produced?
Can you provide a project which uses OpenVidu to replicate that exact leak?

Best regards,
Carlos.

ali_ebrahimi · January 25, 2022, 5:54pm

Hello @cruizba

1- We use the community edition.
2- It seems that server consumes RAM progressively, but still way behind memory ceiling.

3- Between 25 to 30 sessions, each one had 5 to 15 publishers.

4-We are ready to provide you with the ssh credentials of the mentioned server(s) suffering from the issue.

ps: complete resource log which above diagrams are based on can be accessed here . server restarted in between 10:25:25 to 10:25:27. The second restart at 10:25:54 was intentional as the result of our monitoring system.

micael.gallego · January 26, 2022, 6:01pm

It is clear a memory leak is present in OpenVidu.

As we never seen this error before, we suspect your application is using OpenVidu in some way we never tested before.

Can you provide a minimal version of your app so we can reproduce the issue ourselves?

Regards

Topic		Replies	Views
Memory leak in openvidu server Issues developing apps v2	2	720	February 4, 2022
Openvidu docker crashes after some time Issues with deployment v2	5	357	October 28, 2021
Openvidu recording out of memory Issues with deployment v2	7	648	December 10, 2020
OpenVidu server are die often Issues with deployment v2	34	2395	June 15, 2020
New install of OV running at 100% CPU Useage. What's wrong? Issues with deployment v2	7	907	May 22, 2020

Unexpected restart

Related topics