Client connections keep failing from time to time

Hi,
We have deployed Openvidu Pro with below versions:

  • Openvidu server pro v2.18.0
  • KMS v6.16.0
  • Openvidu-coturn 4.0.0

Most of the time the video conferences run smoothly, but there were cases that the client connections kept being dropped. We asked the users to refresh the page and rejoin the sessions, but they got dropped out again after being connected for a short while. When such situation happened, multiple users in different sessions faced the similar issues during the time, regardless of device models or browser version. The situation did not get improved until we restarted Openvidu and the KMS service, then the connections go smooth again (so it doesn’t seem like client’s network quality issue). Such situation happens every few days and we need to restart the services regularly.

We monitored the server resources, and most of the time the CPU and memory were well below 20% utilisation. We checked the logs from both Openvidu and KMS but couldn’t identify the root cause. Below are some of our observation.

  • From Openvidu browser logs, the client received plenty of timeout / disconnection errors, like:

    • “StreamManager of Stream str_CAM_HFUp_con_OAfKuXjzOZ (Subscriber) did not trigger “streamPlaying” event in 4000 ms”
    • “IceConnectionState of RTCPeerConnection 4ca6c229-2753-4833-b57e-fca2a2048326 (str_CAM_HFUp_con_OAfKuXjzOZ) change to “disconnected”. Possible network disconnection”
  • From KMS logs, the below “handling timeout failed” error kept repeating every minute, until the KMS service got restarted.
    23:20:44.815 warning dtlsconnection gstdtlsconnection.c:312 handle_timeout() <GstDtlsConnection(at)0x7f022821f660> handling timeout failed
    23:20:44.815 warning dtlsconnection gstdtlsconnection.c:312 handle_timeout() <GstDtlsConnection(at)0x7f01a803cde0> handling timeout failed
    23:20:44.816 warning dtlsconnection gstdtlsconnection.c:312 handle_timeout() <GstDtlsConnection(at)0x7f0240219220> handling timeout failed

  • From Openvidu log, it showed a number of ghost sessions.
    13:35:23.308 INFO Running non active sessions garbage collector…
    13:35:23.309 WARN Possible ghost session NDB-bb3971681be94f0eac64e480a6a586fc
    13:35:23.309 WARN Possible ghost session NDB-17ea582b81d94810a521375105b09578

What would be the possible cause of the connection disruptions? And how can we work around such issues without restarting the services?

Thanks in advance.