Hi, I’m updating my OpenVidu installation, and I noticed that fresh v2.27.0 instance on AWS has CPU load of 51% in idle state. Is that normal? I have the old v2.14.0 still deployed, and CPU usage on it is only 0,5% when idle.
No, it is not normal. A regular OpenVidu deployment with no active sessions should not take 50% CPU of a server. How many cores have your server available? Have you tried listing the processes running in your server ordered by CPU usage? (with a tool like htop
or similar)
Hello @ameotoko, this happens sometime at first start, because the EC2 instance goes through a process of update in the ubuntu distribution which can’t be avoided unfortunately (As far as I know). But after some minutes, it should go down to 0.5% again. Is that your case?
@pabloFuente instance type is c5.xlarge
, which is the default value when you deploy the stack on AWS. I will be able to run htop
later today, will share here if I see anything.
@cruizba no, that’s not the case, I already started and stopped the instance many times, and let it run for different periods of time, from minutes to hours.
Are you using OpenVidu PRO or Enterprise HA ?
@cruizba I’m using CE for now, while I’m still developing my frontend. I consider moving to PRO later.
Hi @ameotoko , please have a look with htop
or btop
after the machine has been powered up for a good 10 to 15 minutes already, and is completely idle (no sessions going on), in order to verify if the high CPU usage comes from a process that we can attribute to OpenVidu running in there.
If after doing this, the cause of CPU usage is still unclear, I’d ask you to install atop
to save and share with us a performance log file:
$ sudo apt-get update && sudo apt-get install --yes atop
$ sudo atop -w "atop_$(date '+%Y%m%d')" 5
Then leave it running idle for 5 or 10 minutes. This can be run directly on the host, no need to do it within any of the Docker containers.
@j1elo done, what’s the best way to share the file with you?
Meanwhile, here’s quick results from htop
:
Thanks for monitoring it! I think we won’t need the performance log after all, because judging from your screenshot it seems the issue is clear.
The part taking 30% CPU is not OpenVidu itself (you can see the java processes sitting comfortably at 0% usage), but a management script which is part of the CloudFormation deployment itself. To be precise, it looks to me that the only possible problematic point could be this loop which might be stuck running:
/usr/local/bin/restartCE.sh
I’ve already notified the devs who are in charge of it and should probably have a fix soon. Meanwhile, if you are able to troubleshoot the docker-compose scenario, you might be able to find why the Kurento (kms) service is never showing up as healthy in your system. That might provide helpful information to make the script more robust against hiccups like this.
why the Kurento (kms) service is never showing up as healthy in your system.
Well actually, it is:
/opt/openvidu$ sudo docker-compose ps | grep kms | grep healthy
openvidu-kms-1 kurento/kurento-media-server:7.0.1 "/entrypoint.sh" kms 27 hours ago Up 8 minutes (healthy)
Here’s the fix:
# Restart all services
pushd /opt/openvidu
docker-compose up -d kms
until docker-compose ps | grep kms | grep healthy; do
echo "Waiting kms..."
+ sleep 5
done
docker-compose up -d
popd
I guess docker-compose ps | grep kms | grep healthy
needs a little more time to execute and yield a result; the loop iterates faster than that.
I tested original script manually, and without sleep
it just repeats echoing “Waiting kms…” indefinitely. With sleep 5
it took 6 iterations before KMS appeared healthy, i.e. 30 seconds, so may be you could just sleep 30
there, idk.
Well, this is more like it:
That’s great! Thanks for confirming the fix worked for you. We’re going to add a pause in that loop, and will also probably also rewrite the loop to add more robust checks and a non-infinite amount of retries, just to be extra safe of catching the problem if services don’t become active for whatever reason.