High CPU usage on idle

ameotoko · June 7, 2023, 6:27pm

Hi, I’m updating my OpenVidu installation, and I noticed that fresh v2.27.0 instance on AWS has CPU load of 51% in idle state. Is that normal? I have the old v2.14.0 still deployed, and CPU usage on it is only 0,5% when idle.

pabloFuente · June 8, 2023, 9:27am

No, it is not normal. A regular OpenVidu deployment with no active sessions should not take 50% CPU of a server. How many cores have your server available? Have you tried listing the processes running in your server ordered by CPU usage? (with a tool like htop or similar)

cruizba · June 8, 2023, 11:04am

Hello @ameotoko, this happens sometime at first start, because the EC2 instance goes through a process of update in the ubuntu distribution which can’t be avoided unfortunately (As far as I know). But after some minutes, it should go down to 0.5% again. Is that your case?

ameotoko · June 8, 2023, 11:39am

@pabloFuente instance type is c5.xlarge, which is the default value when you deploy the stack on AWS. I will be able to run htop later today, will share here if I see anything.

@cruizba no, that’s not the case, I already started and stopped the instance many times, and let it run for different periods of time, from minutes to hours.

cruizba · June 8, 2023, 11:41am

Are you using OpenVidu PRO or Enterprise HA ?

ameotoko · June 8, 2023, 11:46am

@cruizba I’m using CE for now, while I’m still developing my frontend. I consider moving to PRO later.

j1elo · June 8, 2023, 4:21pm

Hi @ameotoko , please have a look with htop or btop after the machine has been powered up for a good 10 to 15 minutes already, and is completely idle (no sessions going on), in order to verify if the high CPU usage comes from a process that we can attribute to OpenVidu running in there.

If after doing this, the cause of CPU usage is still unclear, I’d ask you to install atop to save and share with us a performance log file:

$ sudo apt-get update && sudo apt-get install --yes atop

$ sudo atop -w "atop_$(date '+%Y%m%d')" 5

Then leave it running idle for 5 or 10 minutes. This can be run directly on the host, no need to do it within any of the Docker containers.

ameotoko · June 8, 2023, 5:30pm

@j1elo done, what’s the best way to share the file with you?

Meanwhile, here’s quick results from htop:

j1elo · June 8, 2023, 6:19pm

Thanks for monitoring it! I think we won’t need the performance log after all, because judging from your screenshot it seems the issue is clear.

The part taking 30% CPU is not OpenVidu itself (you can see the java processes sitting comfortably at 0% usage), but a management script which is part of the CloudFormation deployment itself. To be precise, it looks to me that the only possible problematic point could be this loop which might be stuck running:

/usr/local/bin/restartCE.sh

github.com

OpenVidu/openvidu/blob/v2.27.0/openvidu-server/deployments/ce/aws/CF-OpenVidu.yaml.template#L278


      
          
                      # Get new amazon URL
                      OldPublicHostname=$(cat /usr/share/openvidu/old-host-name)
                      PublicHostname=$(curl http://169.254.169.254/latest/meta-data/public-hostname)
                      sed -i "s/$OldPublicHostname/$PublicHostname/" $WORKINGDIR/.env
                      echo $PublicHostname > /usr/share/openvidu/old-host-name
          
                      # Restart all services
                      pushd /opt/openvidu
                      docker-compose up -d kms
                      until docker-compose ps | grep kms | grep healthy; do
                          echo "Waiting kms..."
                      done
                      docker-compose up -d
                      popd
                    mode: "000755"
                    owner: "root"
                    group: "root"
          Properties:
            ImageId: !GetAtt CloudformationLambdaInvoke.ImageId
            InstanceType: !Ref InstanceType

I’ve already notified the devs who are in charge of it and should probably have a fix soon. Meanwhile, if you are able to troubleshoot the docker-compose scenario, you might be able to find why the Kurento (kms) service is never showing up as healthy in your system. That might provide helpful information to make the script more robust against hiccups like this.

ameotoko · June 8, 2023, 9:12pm

why the Kurento (kms) service is never showing up as healthy in your system.

Well actually, it is:

/opt/openvidu$ sudo docker-compose ps | grep kms | grep healthy
openvidu-kms-1    kurento/kurento-media-server:7.0.1    "/entrypoint.sh"    kms    27 hours ago    Up 8 minutes (healthy)

ameotoko · June 11, 2023, 11:41am

Here’s the fix:

 # Restart all services
 pushd /opt/openvidu
 docker-compose up -d kms
 until docker-compose ps | grep kms | grep healthy; do
   echo "Waiting kms..."
+  sleep 5
 done
 docker-compose up -d
 popd

I guess docker-compose ps | grep kms | grep healthy needs a little more time to execute and yield a result; the loop iterates faster than that.

I tested original script manually, and without sleep it just repeats echoing “Waiting kms…” indefinitely. With sleep 5 it took 6 iterations before KMS appeared healthy, i.e. 30 seconds, so may be you could just sleep 30 there, idk.

Well, this is more like it:

j1elo · June 12, 2023, 5:38pm

That’s great! Thanks for confirming the fix worked for you. We’re going to add a pause in that loop, and will also probably also rewrite the loop to add more robust checks and a non-infinite amount of retries, just to be extra safe of catching the problem if services don’t become active for whatever reason.

cruizba · June 15, 2023, 3:25pm

@ameotoko Fixed the race condition for all AWS CF from 2.25.0 to 2.27.0 and for next versions with this commit.

This waiting is not needed, OpenVidu starts when Kurento is reachable, so this was unnecessary.

Topic		Replies	Views
Ways to health check Issues developing apps v2	3	767	June 7, 2020
CPU at 94% on Open Vidu Server, and 8% on Media Node Issues with deployment v2	7	752	May 14, 2021
Autoscaling of master node? Issues with deployment v2	2	239	November 18, 2021
New install of OV running at 100% CPU Useage. What's wrong? Issues with deployment v2	7	897	May 22, 2020
High CPU Usage 2.17.0 - 700% on 12 core Server Issues with deployment v2	1	404	April 7, 2021

High CPU usage on idle

Related topics