Unexpected restart

Hello,
We are experiencing unexpected restart of openvidu container from time to time while presumably there are enough resource on this physical server. It starts with java complaining about insufficient memory and then java.lang.OutOfMemoryError and crash.
log
As specified in the log file, it is an Intel E5-2697 v2 with 125 GB of ram and Ubuntu 20.04.
We already changed java heap size (-Xms/-Xmx) in openvidu java command.
Is there any tweak or tuning we can make in the server to prevent this or at least less frequent?

Seeing your log, I’m afraid that the JVM is simply telling that it cannot allocate 16KB of new RAM. How many GBs of memory does your server have?

Please make sure you configure the memory available to the OpenVidu Server java process using the JAVA_OPTIONS property in the .env file:

Currently there are a limitation and Java process is not using all the memory available in the machine.

Best regards

We run openvidu server java with these -Xms2048m -Xmx4096m args but no tangible changes felt and server still crashes while at least half of RAM is available.
top output has been logged here.
You can see it yourself in the top log that when crash took place somewhere around 09:51 to 09:53 no resource shortage observed.

I’m pretty sure you’re not setting up correctly those parameters. Can you show me the output of:

ps aux | grep openvidu-server

Regards

Also you don’t need to use those specific values, adapt to your environment. With 125GB try to use…
-Xms10240m -Xmx20480m. Play whith those parameters, until you reach your desired peak performance.

@cruizba Sorry for delay,

root@server6:~# ps aux | grep openvidu-server
root 309623 0.0 0.0 8164 2460 pts/0 S+ 05:51 0:00 grep --color=auto openvidu-server
root 4122463 19.6 1.9 31501944 2513976 ? Sl Jan14 123:27 java -Xms2048m -Xmx4096m -jar openvidu-server.jar

It explicitly says in the log that in order to prevent jvm from crashing one of the thing is decreasing heap size!

Sure, but did you try to increment the heap size in the way I’ve recommended? Where is this log you’re talking about?

Yes, We did change java options as you said.

root@server14:~# ps aux |grep openvidu-server
root 753927 0.0 0.0 5196 2412 pts/0 S+ 20:42 0:00 grep --color=auto openvidu-server
root 3126609 36.3 5.8 49346368 7771104 ? Sl 13:25 158:41 java -Xms10240m -Xmx20480m -jar openvidu-server.jar

As the time of this very post, server has crashed and restarted 7 hours ago

root@server14:~# docker ps |grep openvidu-server
b8786352f3f6 openvidu/openvidu-server:2.20.0 “/usr/local/bin/entr…” 2 weeks ago Up 7 hours openvidu_node_1_openvidu-server_1

And in the openvidu container, a file has been created hs_err_pid16.log consisting of all crash related lines. This file is located beside openvidu-server.jar in /.

root@server14:/# ls -lsh
total 126M
4.0K drwxr-xr-x 1 root root 4.0K Jan 7 19:08 bin
4.0K drwxr-xr-x 2 root root 4.0K Apr 24 2018 boot
0 drwxr-xr-x 5 root root 340 Jan 17 13:25 dev
8.0K drwxr-xr-x 1 root root 4.0K Jan 5 12:32 etc
4.0K drwxr-xr-x 2 root root 4.0K Apr 24 2018 home
36M -rw-r–r-- 1 root root 36M Jan 7 12:47 hs_err_pid15.log
36M -rw-r–r-- 1 root root 36M Jan 17 13:25 hs_err_pid16.log
4.0K drwxr-xr-x 1 root root 4.0K May 23 2017 lib
4.0K drwxr-xr-x 2 root root 4.0K Aug 27 07:18 lib64
4.0K drwxr-xr-x 2 root root 4.0K Aug 27 07:16 media
4.0K drwxr-xr-x 2 root root 4.0K Aug 27 07:16 mnt
45M -rw-rw-r-- 1 root root 45M Sep 22 16:16 openvidu-server.jar
4.0K drwxr-xr-x 1 root root 4.0K Jan 3 12:37 opt
0 dr-xr-xr-x 943 root root 0 Jan 17 13:25 proc
4.0K drwx------ 1 root root 4.0K Jan 5 12:29 root
4.0K drwxr-xr-x 1 root root 4.0K Jan 3 12:37 run
4.0K drwxr-xr-x 2 root root 4.0K Aug 27 07:18 sbin
4.0K drwxr-xr-x 2 root root 4.0K Aug 27 07:16 srv
0 dr-xr-xr-x 13 root root 0 Jan 15 01:12 sys
16K drwxrwxrwt 1 root root 12K Jan 17 13:25 tmp
8.0K drwxr-xr-x 1 root root 4.0K Aug 27 07:16 usr
8.0K drwxr-xr-x 1 root root 4.0K Aug 27 07:18 var

There’s a lot of AbstractJsonRpcClientWebSocket blocked threads. Did your KMS container restarted also?. Do you have peaks of demand or it just happens randomly?

No, KMS did not restart while Openvidu did sporadically without any meaningful relation to load on server.
Actually there is 3 other server with same hardware and specifications which are all suffering this very same problem.

Hello @ali_ebrahimi

We have not seen such behaviour never, even in our demos infrastructure which actually have a lot of demand. This look like a memory leak due to locked threads, but it is seems related with your application logic. In order to help with your problem, we need to replicate that error and more information about the problem. I would like you for these requirements:

  1. Are you using OpenVidu CE or PRO?
  2. Do you see any kind of progression when this happen? I mean, the process takes more and more RAM progresively during the period it works correctly?
  3. Would it be possible to know how many sessions you have when the error is produced?
  4. Can you provide a project which uses OpenVidu to replicate that exact leak?

Best regards,
Carlos.

Hello @cruizba

1- We use the community edition.
2- It seems that server consumes RAM progressively, but still way behind memory ceiling.


3- Between 25 to 30 sessions, each one had 5 to 15 publishers.

4-We are ready to provide you with the ssh credentials of the mentioned server(s) suffering from the issue.

ps: complete resource log which above diagrams are based on can be accessed here . server restarted in between 10:25:25 to 10:25:27. The second restart at 10:25:54 was intentional as the result of our monitoring system.

It is clear a memory leak is present in OpenVidu.

As we never seen this error before, we suspect your application is using OpenVidu in some way we never tested before.

Can you provide a minimal version of your app so we can reproduce the issue ourselves?

Regards