Massive Conference - crash while switching publisher

,

Hello,

we are trying to build an app that is similar to “Massive Conference” example, there is always one publisher and all others users are subscribers but there rolle can change between. During change we are usually getting errors:

“mediaServerDisconnect”: OpenVidu Media Node has crashed or lost its connection. A new Media Node instance is active and no media streams are available in the Media Node

Could you give us some advice ore pseudocode how to solve a problem for a large scale conference like in your example of Massive Conference?

Well, the “massive” conference scenario in the image is a single Publisher and 29 subscribers. So, the term “massive” is quite relative. For a Real-Time, Low-Latency WebRTC service, this can be considered a massive video room, such a classroom or a web meetup. For a 10seconds-latency streaming server (like Youtube Live, Twitch or similar), a 1-to-29 room is very small.

So, what topology does your session have? If it is 30-to-30, that is an incredible amount of streams at the same time (900 video streams being sent and received by the server, and 30 streams being coded and decoded by each browser, which is a lot). Either way, this quantity of users and streams will require a fair amount of CPU power. In what kind of machine are you running your Media Node? (I assume you are using OpenVidu Pro since you have posted an OpenVidu Inspector image, which belongs to the PRO tier).

Regards.

At the moment we have deployed AWS pro server with c5.xlarge ruing KMS and m5.large runing OpenVidu Pro server. Im completely aware of the 30x30 its hard to achieve, but what we are trying to get to work is 1 stream to 30 subscribers, and the possibility of changing the the publisher role between. So there will be always one stream going to 30 subscribers, but publisher will change between subscribers. We have develop sort of admin panel where participant (subscriber) can ask for voice (rise a hand) while granted by admin he becomes publisher and all others subscribe. So its always 1 to 30. The problem we are getting is during changing the role we are getting error as describe above.

Okey, seeing what you want to achieve and the machines you are using, then this should be no problem.
If it is actully the change in the user publishing what triggers the Media Node disconnection, then it is possible you hit a bug we’ve known about for a few days. Kurento may be having trouble destroying a high number of WebRtcEndpoints at the same time. And this is probably your case: once the Publisher changes, many things are happening under the hood. The Publisher endpoint must be destroyed in Kurento, but also the other 30 subscrieber endpoints. And then the same topology must be re-created for the new Publisher. All of this in a extremely short period of time.

We are currently working on this issue, preventing Kurento to crash when a great amount of endpoints are destroyed at the same time.

1 Like

Is there any workarround that we can do until we have this kurento fix?

Thx for info, at least we know that our app its working correctly and it on your side :slight_smile: I know it would be naive to ask when would you solve the problem, but do you have any advice on how to work around, we already tried to grant second publisher before unpblishing first one but it seemed same result KMS crashes. We desperately looking for solution since in the end of this month we going to host around 35-40 conferences of 30-35 users. For our test we are using around 50 android tablets, up to 20 devices the AWS c5.xlarge KMS doing fine but during role change there is high pick in cpu usage. Could we somehow schedule this process to make it less processor power-hungry?

We are working on providing a workaround as soon as possible while working on a more robust solution.

Please stay tuned

1 Like

Hello
any news on the subject?

We will publish a new KMS beta version with improvements in this regards in the following days.

Please stay tuned to announcements section of the forum.

3 Likes

Hi @pabloFuente,

Could you please give the issue link or ID to us keep tracking this bug?

@Levi_Nobrega we’re using this issue to track progress:

You can now test an experimental branch of Kurento where the problem is fixed. For that, grab the latest experimental tag:
docker pull kurento/kurento-media-server-exp:workerpool-rewrite

and use that same image name in your .env file (in the KMS_IMAGE property).

1 Like

Sorry, maybe I’m wrong, but we are runing something like you want, one publisher and +60 subscriber, and in any moment a subscriber can change to publisher and start to send video. So, in out scenary all them are publisher, but only subscribe to de video, and when the owner of the room (could be the owner or the admin) decides one of them can start to send video (I’m using singnals for this), and when the owner wants he can stops the “subscriber” video. So I never change the rol of the user, it’s allways a publisher, and have no problems.

Right now I’m running multiples c5.4xlarge instances, in each one I can get more than 20 rooms, with the owner (some times two owners), and between 10 and 67 fake subscribers (I mean they are publishers, but the 95% of the time they only subscribe to the session).

Sorry if my english isn’t good enough, but I think you must to use only the PUBLISHER rol and manage from your app when start and when stop to send video.

1 Like

Yes @msalomon, this is the same scenario. What version of openvidu and KMS are you using?

Thanks,

I’m using the OpenVidu 2.12 version, I don’t know what version of KMS, but I don’t change anything in the server.

1 Like

Hi
thx for the great news :smile:
just to be sure:

# Kurento Media Server image
# --------------------------
# Docker hub kurento media server: https://hub.docker.com/r/kurento/kurento-media-server-dev
# Uncomment the next line and define this variable with KMS image that you want use
KMS_IMAGE=kurento/kurento-media-server-exp:6.13
or
KMS_IMAGE=kurento/kurento-media-server-exp:workerpool-rewrite

?

Yep, and then, restart openvidu in that media.

You can check if the new image is being used executing docker ps

Hi

We have still some problem, CPU usage it’s gone after fix in KMS, but still after 3-4 switches of publisher we are get all clients disconnected. There no error logs in KMS just in web browser console, we are getting something like this:

jsonrpcclient.js:173 ERROR:java.lang.IllegalStateException:[KurentoClient] JsonRpcClient is disconnected from WebSocket server at ‘ws://172.31.6.92:8888/kurento’ in Request: method:receiveVideoFrom params:{“sdpOffer”:“v=0\r\no=- 3134578297270382616 2 IN IP4 127.0.0.1\r\ns=-\r\nt=0 0\r\na=group:BUNDLE 0 1\r\na=msid-semantic: WMS\r\nm=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126\r\nc=IN IP4 0.0.0.0\r\na=rtcp:9 IN IP4 0.0.0.0\r\na=ice-ufrag:1viM\r\na=ice-pwd:ftHeWpDB4BHK/iUYrHWIPQlK\r\na=ice-options:trickle\r\na=fingerprint:sha-256 23:70:0A:9D:33:9F:D1:30:2E:E5:A8:9D:9E:D8:A3:13:8B:D8:B6:39:53:D1:6D:24:21:09:E0:E8:A0:56:9E:20\r\na=setup:actpass\r\na=mid:0\r\na=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level\r\na=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time\r\na=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01\r\na=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid\r\na=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id\r\na=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id\r\na=recvonly\r\na=rtcp-mux\r\na=rtpmap:111 opus/48000/2\r\na=rtcp-fb:111 transport-cc\r\na=fmtp:111 minptime=10;useinbandfec=1\r\na=rtpmap:103 ISAC/16000\r\na=rtpmap:104 ISAC/32000\r\na=rtpmap:9 G722/8000\r\na=rtpmap:0 PCMU/8000\r\na=rtpmap:8 PCMA/8000\r\na=rtpmap:106 CN/32000\r\na=rtpmap:105 CN/16000\r\na=rtpmap:13 CN/8000\r\na=rtpmap:110 telephone-event/48000\r\na=rtpmap:112 telephone-event/32000\r\na=rtpmap:113 telephone-event/16000\r\na=rtpmap:126 telephone-event/8000\r\nm=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 122 127 121 125 107 108 109 124 120 123 119 114 115 116\r\nc=IN IP4 0.0.0.0\r\na=rtcp:9 IN IP4 0.0.0.0\r\na=ice-ufrag:1viM\r\na=ice-pwd:ftHeWpDB4BHK/iUYrHWIPQlK\r\na=ice-options:trickle\r\na=fingerprint:sha-256 23:70:0A:9D:33:9F:D1:30:2E:E5:A8:9D:9E:D8:A3:13:8B:D8:B6:39:53:D1:6D:24:21:09:E0:E8:A0:56:9E:20\r\na=setup:actpass\r\na=mid:1\r\na=extmap:14 urn:ietf:params:rtp-hdrext:toffset\r\na=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time\r\na=extmap:13 urn:3gpp:video-orientation\r\na=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01\r\na=extmap:12 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay\r\na=extmap:11 http://www.webrtc.org/experiments/rtp-hdrext/video-content-type\r\na=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-timing\r\na=extmap:8 http://tools.ietf.org/html/draft-ietf-avtext-framemarking-07\r\na=extmap:9 http://www.webrtc.org/experiments/rtp-hdrext/color-space\r\na=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid\r\na=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id\r\na=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id\r\na=recvonly\r\na=rtcp-mux\r\na=rtcp-rsize\r\na=rtpmap:96 VP8/90000\r\na=rtcp-fb:96 goog-remb\r\na=rtcp-fb:96 transport-cc\r\na=rtcp-fb:96 ccm fir\r\na=rtcp-fb:96 nack\r\na=rtcp-fb:96 nack pli\r\na=rtpmap:97 rtx/90000\r\na=fmtp:97 apt=96\r\na=rtpmap:98 VP9/90000\r\na=rtcp-fb:98 goog-remb\r\na=rtcp-fb:98 transport-cc\r\na=rtcp-fb:98 ccm fir\r\na=rtcp-fb:98 nack\r\na=rtcp-fb:98 nack pli\r\na=fmtp:98 profile-id=0\r\na=rtpmap:99 rtx/90000\r\na=fmtp:99 apt=98\r\na=rtpmap:100 VP9/90000\r\na=rtcp-fb:100 goog-remb\r\na=rtcp-fb:100 transport-cc\r\na=rtcp-fb:100 ccm fir\r\na=rtcp-fb:100 nack\r\na=rtcp-fb:100 nack pli\r\na=fmtp:100 profile-id=2\r\na=rtpmap:101 rtx/90000\r\na=fmtp:101 apt=100\r\na=rtpmap:102 H264/90000\r\na=rtcp-fb:102 goog-remb\r\na=rtcp-fb:102 transport-cc\r\na=rtcp-fb:102 ccm fir\r\na=rtcp-fb:102 nack\r\na=rtcp-fb:102 nack pli\r\na=fmtp:102 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001f\r\na=rtpmap:122 rtx/90000\r\na=fmtp:122 apt=102\r\na=rtpmap:127 H264/90000\r\na=rtcp-fb:127 goog-remb\r\na=rtcp-fb:127 transport-cc\r\na=rtcp-fb:127 ccm fir\r\na=rtcp-fb:127 nack\r\na=rtcp-fb:127 nack pli\r\na=fmtp:127 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42001f\r\na=rtpmap:121 rtx/90000\r\na=fmtp:121 apt=127\r\na=rtpmap:125 H264/90000\r\na=rtcp-fb:125 goog-remb\r\na=rtcp-fb:125 transport-cc\r\na=rtcp-fb:125 ccm fir\r\na=rtcp-fb:125 nack\r\na=rtcp-fb:125 nack pli\r\na=fmtp:125 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f\r\na=rtpmap:107 rtx/90000\r\na=fmtp:107 apt=125\r\na=rtpmap:108 H264/90000\r\na=rtcp-fb:108 goog-remb\r\na=rtcp-fb:108 transport-cc\r\na=rtcp-fb:108 ccm fir\r\na=rtcp-fb:108 nack\r\na=rtcp-fb:108 nack pli\r\na=fmtp:108 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42e01f\r\na=rtpmap:109 rtx/90000\r\na=fmtp:109 apt=108\r\na=rtpmap:124 H264/90000\r\na=rtcp-fb:124 goog-remb\r\na=rtcp-fb:124 transport-cc\r\na=rtcp-fb:124 ccm fir\r\na=rtcp-fb:124 nack\r\na=rtcp-fb:124 nack pli\r\na=fmtp:124 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=4d0032\r\na=rtpmap:120 rtx/90000\r\na=fmtp:120 apt=124\r\na=rtpmap:123 H264/90000\r\na=rtcp-fb:123 goog-remb\r\na=rtcp-fb:123 transport-cc\r\na=rtcp-fb:123 ccm fir\r\na=rtcp-fb:123 nack\r\na=rtcp-fb:123 nack pli\r\na=fmtp:123 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=640032\r\na=rtpmap:119 rtx/90000\r\na=fmtp:119 apt=123\r\na=rtpmap:114 red/90000\r\na=rtpmap:115 rtx/90000\r\na=fmtp:115 apt=114\r\na=rtpmap:116 ulpfec/90000\r\n”,“sender”:“str_CAM_Q6yZ_con_RwZsyws4D3”} request:undefined
(anonymous) @ jsonrpcclient.js:172
dispatchCallback @ index.js:612
processResponse @ index.js:743
RpcBuilder.decode @ index.js:803
transportMessage @ index.js:223
jsonrpcclient.js:177 ERROR DATA:“java.lang.IllegalStateException: [KurentoClient] JsonRpcClient is disconnected from WebSocket server at ‘ws://172.31.6.92:8888/kurento’\n\tat org.kurento.jsonrpc.client.JsonRpcClientNettyWebSocket.sendTextMessage(JsonRpcClientNettyWebSocket.java:171)\n\tat org.kurento.jsonrpc.client.AbstractJsonRpcClientWebSocket.internalSendRequestWebSocket(AbstractJsonRpcClientWebSocket.java:369)\n\tat org.kurento.jsonrpc.client.AbstractJsonRpcClientWebSocket$1.internalSendRequest(AbstractJsonRpcClientWebSocket.java:141)\n\tat org.kurento.jsonrpc.internal.JsonRpcRequestSenderHelper.sendRequest(JsonRpcRequestSenderHelper.java:75)\n\tat org.kurento.jsonrpc.internal.JsonRpcRequestSenderHelper.sendRequest(JsonRpcRequestSenderHelper.java:69)\n\tat org.kurento.jsonrpc.client.JsonRpcClient.sendRequest(JsonRpcClient.java:112)\n\tat org.kurento.client.internal.transport.jsonrpc.RomClientJsonRpcClient.sendRequest(RomClientJsonRpcClient.java:228)\n\tat org.kurento.client.internal.transport.jsonrpc.RomClientJsonRpcClient.subscribe(RomClientJsonRpcClient.java:130)\n\tat org.kurento.client.internal.transport.jsonrpc.RomClientJsonRpcClient.subscribe(RomClientJsonRpcClient.java:122)\n\tat org.kurento.client.internal.client.RomManager.subscribe(RomManager.java:190)\n\tat org.kurento.client.internal.client.RemoteObject.addEventListener(RemoteObject.java:252)\n\tat org.kurento.client.internal.client.RemoteObjectInvocationHandler.subscribeEventListener(RemoteObjectInvocationHandler.java:219)\n\tat org.kurento.client.internal.client.RemoteObjectInvocationHandler.internalInvoke(RemoteObjectInvocationHandler.java:133)\n\tat org.kurento.client.internal.client.DefaultInvocationHandler.invoke(DefaultInvocationHandler.java:39)\n\tat com.sun.proxy.$Proxy97.addErrorListener(Unknown Source)\n\tat io.openvidu.server.kurento.core.KurentoParticipantEndpointConfig.addEndpointListeners(KurentoParticipantEndpointConfig.java:203)\n\tat io.openvidu.server.pro.kurento.core.KurentoParticipantEndpointConfigPro.addEndpointListeners(KurentoParticipantEndpointConfigPro.java:55)\n\tat io.openvidu.server.kurento.core.KurentoParticipant.receiveMediaFrom(KurentoParticipant.java:262)\n\tat io.openvidu.server.kurento.core.KurentoSessionManager.subscribe(KurentoSessionManager.java:500)\n\tat io.openvidu.server.rpc.RpcHandler.receiveVideoFrom(RpcHandler.java:355)\n\tat io.openvidu.server.rpc.RpcHandler.handleRequest(RpcHandler.java:130)\n\tat org.kurento.jsonrpc.internal.JsonRpcHandlerManager.handleRequest(JsonRpcHandlerManager.java:142)\n\tat org.kurento.jsonrpc.internal.server.ProtocolManager$3.run(ProtocolManager.java:218)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n”
(anonymous) @ jsonrpcclient.js:176
dispatchCallback @ index.js:612
processResponse @ index.js:743
RpcBuilder.decode @ index.js:803
transportMessage @ index.js:223
OpenViduLogger.ts:31 Event ‘streamDestroyed’ triggered by ‘Session’ StreamEvent {hasBeenPrevented: false, cancelable: true, target: Session, type: “streamDestroyed”, stream: Stream, …}
OpenViduLogger.ts:31 Calling default behavior upon ‘streamDestroyed’ event dispatched by ‘Session’
OpenViduLogger.ts:31 Inbound WebRTCPeer from ‘Stream’ with id [str_CAM_Q6yZ_con_RwZsyws4D3] is now closed
OpenViduLogger.ts:31 Remote MediaStream from ‘Stream’ with id [str_CAM_Q6yZ_con_RwZsyws4D3] is now disposed
OpenViduLogger.ts:31 Remote ‘Connection’ with ‘connectionId’ [con_IPnfghpcPU] is now configured for receiving Streams with options: {id: “str_CAM_SIz5_con_IPnfghpcPU”, createdAt: 1590569973186, connection: Connection, hasAudio: true, hasVideo: true, …}
OpenViduLogger.ts:31 Event ‘streamCreated’ triggered by ‘Session’ StreamEvent {hasBeenPrevented: false, cancelable: false, target: Session, type: “streamCreated”, stream: Stream, …}
OpenViduLogger.ts:31 Subscribing to con_IPnfghpcPU
OpenViduLogger.ts:31 Event ‘videoElementCreated’ triggered by ‘Subscriber’ VideoElementEvent {hasBeenPrevented: false, cancelable: false, target: Subscriber, type: “videoElementCreated”, element: video#remote-video-str_CAM_SIz5_con_IPnfghpcPU}
OpenViduLogger.ts:19 IceConnectionState of RTCPeerConnection c2cdecf0-d200-40d9-a67a-d6759d770299 (str_CAM_SIz5_con_IPnfghpcPU) change to “checking”
OpenViduLogger.ts:31 ‘Subscriber’ (str_CAM_SIz5_con_IPnfghpcPU) successfully subscribed
OpenViduLogger.ts:31 Subscribed correctly to con_IPnfghpcPU
OpenViduLogger.ts:19 IceConnectionState of RTCPeerConnection c2cdecf0-d200-40d9-a67a-d6759d770299 (str_CAM_SIz5_con_IPnfghpcPU) change to “connected”
OpenViduLogger.ts:31 Remote ‘Stream’ with id [str_CAM_SIz5_con_IPnfghpcPU] video is now playing
OpenViduLogger.ts:31 Event ‘streamDestroyed’ triggered by ‘Session’ StreamEvent {hasBeenPrevented: false, cancelable: true, target: Session, type: “streamDestroyed”, stream: Stream, …}
OpenViduLogger.ts:31 Calling default behavior upon ‘streamDestroyed’ event dispatched by ‘Session’
OpenViduLogger.ts:31 Inbound WebRTCPeer from ‘Stream’ with id [str_CAM_SIz5_con_IPnfghpcPU] is now closed
OpenViduLogger.ts:31 Remote MediaStream from ‘Stream’ with id [str_CAM_SIz5_con_IPnfghpcPU] is now disposed

Thx, downgrade to 2.12 help us a lot, lower CPU usage smooth publisher change :slight_smile:

1 Like

@sirfragles Do you have connections issue with your users? Such as a couple of users can’t see or listen an stream for no apparent reason?