I had an issue with a video session (it was a 2 people session) getting dropped while using OpenVidu Call on a 2.14.0 instance. The instance (an AWS server) itself became unreachable. Although it came back up after a server reboot, it made me think about a proper way to monitor the health of each of OpenVidu components and OpenVidu as a whole.
How does everyone health-check OpenVidu (regardless of CE or Pro)?
Also, are there any recommended thresholds in metrics like CPU utilization?
You should monitor all your systems for CPU usage.
As OpenVidu is working in realtime, it is important to not saturate the CPU and maintain it below 100%. For example 90% is a safe threshold.
We don’t have a proper “health-check” endpoint in the REST API, but you can use /configendpoint to test if OpenVidu is working as expected.
In OpenVidu PRO 2.15 version (to be released in a week) the CPU usage of all cluster nodes will be sent to ElasticSearch. So you will be able to use the alerting system in ElasticSearch to be notified when CPU is too high.