Hi,
We managed to have an easy to use tool to test openvidu-server performance, something similar to what mentioned here. It simply makes use of openvidu-call and runs multiple chromes with specific configuration.
The target platform is a dedicated virtual machine with 17 cores and 40 GB ram, running Ubuntu server 16.04 with Openvidu server 2.20.0.
Both tester and Openvidu server are in the same data center sharing high speed bandwith.
I can not figure out why we were not able to create at least one session with 7 participants that all receive all 6 other streams.
In Openvidu server log there are always errors complaining exception while processing request and subsequent timeout.
For demonstration, here is server log for a session with only 4 participants. Only one of them had all 3 other videos!.
We are perfectly OK to provide credential for accessing to both tester & server if anyone is willing to inspect our situation.
Followings are screen shot from each participants web app.
It is the first time I see timeouts of RPC requests from OpenVidu to Kurento. Has your data center any kind of virtualization which could degrade the performance?
If there is some kind of virtualization, maybe requests to KMS_URIS=ws://localhost:8888/kurento
requests are not real “localhost” requests, and the requests pass through some kind of network virtualization which is making those requests to timeout. Or maybe disks or CPU resources are being bottle necked by other things in your stack. This really looks like a degraded environment.
I say that because there’s plenty of ways to virtualize machines and, sometimes, disks are on different machines and requests to “localhost” are not real localhost requests, and operation passes through the network.
I think the problem is related with your datacenter and OpenVidu is somehow showing you a symptom of a bottleneck, but I could be wrong. I am thinking that because I’ve never seen such exception before.
Try to use KMS_URIS=["ws://<PRIVATE_IP>:8888/kurento"]
instead of localhost, maybe it works better. Or try to execute this same test in other environmnet or check wheter if your datacenter is bottlenecking somewhere.
Some more questions:
- Did you deployed OpenVidu following official docs?
- Can you also share Kurento logs? If you’ve followed official instructions they should be at
/opt/openvidu/kurento-logs
@j1elo do you know any kind of situation which could make Kurento to timeout in some RPC requests? There are timeouts even on setting the stun address, like:
Timeout of 10000 milliseconds waiting from response to request {"id":151,"method":"invoke","params":{"object":"7bce1b48-5a69-43f7-992e-ac356cbd0d9e_kurento.MediaPipeline/c44a25f9-8712-49e5-bd88-9555a0ddb04b_kurento.WebRtcEndpoint","operation":"setStunServerAddress","operationParams":{"stunServerAddress":"xxx.xxx.xxx.xxx"},"sessionId":"27531bb6-b573-41a7-9290-a1ffab69611b"},"jsonrpc":"2.0"}
I mean, could I be wrong on my suspects?
/cc @micael.gallego
Thank you for consideration,
To answer you in order:
Yes, we do use virtualization, esxi server on bare metal, Ubuntu server on vm with mentioned resource and of course dockerized Openvidu. But it is a private powerfull server with dedicated disk on the same machine (at least it is what they say).
Yes, we deployed openvidu following official docs.
We retried the same test putting PRIVATE_IP for KMS_URIS to no avail. Here is openvidu log and kurento log.
We are also going to try the same test on a non virtualized environment as you mentioned and share the result as soon
Can you please share the CPU usage of OpenVidu server? Maybe you are overloading the server.
When a participant starts its connection to a session, more CPU is needed. When the participant is already connected less CPU is needed. So if the load test tries to connect all the participants to all the sessions at the same time, it is possible that you get some CPU spikes affecting to other requests.
Also, I recommend you to try OpenVidu Enterprise with a single master node and mediasoup media server. Mediasoup uses less CPU than Kurento, and using it is very likely you will reduce timeout issues.
Regards
Best regards