We are using OpenVidu 2.15 CE on server and OpenVidu Browser 2.15 on the client. We have two challenges primarily: -
-
Initial connection to the session not being successful and sometimes user having to do multiple connection attempts.
-
Network disconnection in the middle of the video call.
In both cases, we want to have 1) better logging of connection issues so we can pro actively review issues our customers are seeing 2) robust error handling and implement Re-connecting feature similar to other video conferencing applications.
For #1)
We have logging in place to trap connectionDestroyed event and while in some cases we do see the event getting logged (in client browser) and sometimes not.
I can imagine that if the user’s signal plane is going away, the connectionDestroyed event may not even reach that user? Is that a correct assumption?
Should we use webhooks as a more reliable way to record lifecycle of connections?
Sometimes we see errors like below: -
For #2, What is the suggested approach here?
Trap connectionDestroyed event + Have our own mechanism of detecting if the user’s network connection went down? As I mentioned above, it seems like relying on connectionDestroyed may not always be reliable.
Do we simply reconnect with a new token and republish the streams?
Any suggestions here will be helpful as without a robust reconnection policy in place, the user experience suffers and causes friction in an other wise well working application.
Hello @ngaheer,
First of all, is better to stay up to date as we fix bugs in every version.
Let me suggest some tips to deal with your issues.
In the upcoming 2.18 version we are going to include a new event to detect errors generated during the media negotiation. We will update the documentation with details on how to use it. Stay tuned.
In any case, frecuent errors are not reasonable. Please review your OpenVidu deployment and let us know the exact issues the user is having.
The best way to detect connection problems in the middle of the call is managing reconnection events:
Reconnection events
Regards
Thanks for your response. While we wait for 2.18,
are there are any timeouts between following events
OpenVidu.initSession()
session.connect()
session.publish()
From what I am seeing if there is a delay between connect and publish, the connection is closed. Note I am initializing two separate connections for video and screenshare.
Received request: {“method”:“participantLeft”,“params”:{“connectionId”:“con_UJar18n0cI”,“reason”:“networkDisconnect”}}
instrument.ts:129 Received request: {“method”:“participantEvicted”,“params”:{“connectionId”:“con_UJar18n0cI”,“reason”:“networkDisconnect”}}
instrument.ts:129 Received request: {“method”:“participantLeft”,“params”:{“connectionId”:“con_SEd4RtoSqc”,“reason”:“networkDisconnect”}}
instrument.ts:129 Received request: {“method”:“participantEvicted”,“params”:{“connectionId”:“con_SEd4RtoSqc”,“reason”:“networkDisconnect”}}
Hello @micael.gallego,
Thanks for you response. I am trying to bring some stability and predictability into my app. I have been debugging the connection issues (intermittent). Most of the times users connect fine but our customers want more predictability and self healing in the app. I am able to get the token for a session but when we connection (joinroom), the call fails
I notice the following statement in open vidu server logs.
io.openvidu.server.kurento.core.KurentoSessionManager - Timeout waiting for join-leave Session lock to be available for participant con_Pgg****** of session dauysmie72xev9p*********** in joinRoom
Looking at the code, it seems like this code waits on a lock for 15 seconds. I was thinking what could be taking an operation that holds this lock over 15 seconds. And I saw the following stack trace : -
Could the below stack trace be the potential reason the joinRoom could not acquire the join-leave lock?
I could imagine the code below waiting to communicate participantLeft notifications, for more than 15 seconds? Shouldn’t such a notification attempt be on a separate queue as it should not hold the other activities from going ahead i.e. evict the participant on the main thread but send the notification lazily and have it not impact the main operation? Is 2.17 behavior different? Looking at the code (KurentoSessionManager.leaveRoom) it does not seem, but you will know better than me.
Will really appreciate some help here.
at org.kurento.jsonrpc.internal.ws.WebSocketServerSession.sendRequestWebSocket(WebSocketServerSession.java:123)
^[[36mopenvidu-server_1 |^[[0m at org.kurento.jsonrpc.internal.ws.WebSocketServerSession.access$000(WebSocketServerSession.java:49)
^[[36mopenvidu-server_1 |^[[0m at org.kurento.jsonrpc.internal.ws.WebSocketServerSession$1.internalSendRequest(WebSocketServerSession.java:74)
^[[36mopenvidu-server_1 |^[[0m at org.kurento.jsonrpc.internal.JsonRpcRequestSenderHelper.sendRequest(JsonRpcRequestSenderHelper.java:75)
^[[36mopenvidu-server_1 |^[[0m at org.kurento.jsonrpc.internal.JsonRpcRequestSenderHelper.sendNotification(JsonRpcRequestSenderHelper.java:156)
^[[36mopenvidu-server_1 |^[[0m at org.kurento.jsonrpc.internal.server.ServerSession.sendNotification(ServerSession.java:121)
^[[36mopenvidu-server_1 |^[[0m at io.openvidu.server.rpc.RpcNotificationService.sendNotification(RpcNotificationService.java:105)
^[[36mopenvidu-server_1 |^[[0m at io.openvidu.server.core.SessionEventsHandler.onParticipantLeft(SessionEventsHandler.java:181)
^[[36mopenvidu-server_1 |^[[0m at io.openvidu.server.kurento.core.KurentoSessionManager.leaveRoom(KurentoSessionManager.java:248)
^[[36mopenvidu-server_1 |^[[0m at io.openvidu.server.kurento.core.KurentoSessionManager.evictParticipant(KurentoSessionManager.java:661)
^[[36mopenvidu-server_1 |^[[0m at io.openvidu.server.rpc.RpcHandler.leaveRoomAfterConnClosed(RpcHandler.java:636)
^[[36mopenvidu-server_1 |^[[0m at io.openvidu.server.rpc.RpcHandler.afterConnectionClosed(RpcHandler.java:689)
^[[36mopenvidu-server_1 |^[[0m at org.kurento.jsonrpc.internal.JsonRpcHandlerManager.afterConnectionClosed(JsonRpcHandlerManager.java:65)
^[[36mopenvidu-server_1 |^[[0m at org.kurento.jsonrpc.internal.server.ProtocolManager.closeSession(ProtocolManager.java:446)
^[[36mopenvidu-server_1 |^[[0m at org.kurento.jsonrpc.internal.server.ProtocolManager$4.run(ProtocolManager.java:421)
^[[36mopenvidu-server_1 |^[[0m at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
^[[36mopenvidu-server_1 |^[[0m at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
^[[36mopenvidu-server_1 |^[[0m at java.util.concurrent.FutureTask.run(FutureTask.java:266)
^[[36mopenvidu-server_1 |^[[0m at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
^[[36mopenvidu-server_1 |^[[0m at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
^[[36mopenvidu-server_1 |^[[0m at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
^[[36mopenvidu-server_1 |^[[0m at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
^[[36mopenvidu-server_1 |^[[0m at java.lang.Thread.run(Thread.java:748)
^[[36mopenvidu-server_1 |^[[0m Caused by: java.lang.IllegalStateException: The WebSocket session [610d] has been closed and no method (apart from close()) may be called on a closed session
^[[36mopenvidu-server_1 |^[[0m at org.apache.tomcat.websocket.WsSession.checkState(WsSession.java:836)
^[[36mopenvidu-server_1 |^[[0m at org.apache.tomcat.websocket.WsSession.getBasicRemote(WsSession.java:433)
^[[36mopenvidu-server_1 |^[[0m at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendTextMessage(StandardWebSocketSession.java:215)
^[[36mopenvidu-server_1 |^[[0m at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:106)
^[[36mopenvidu-server_1 |^[[0m at org.kurento.jsonrpc.internal.ws.WebSocketServerSession.sendRequestWebSocket(WebSocketServerSession.java:119)
^[[36mopenvidu-server_1 |^[[0m … 22 common frames omitted
Hi, if you are getting this lock message on 2.15.0, it is very possible this has been fixed on 2.16.0 or 2.17.0.