KMS nodecrash issue 2.22 pro on AWS

Hello
I am using ovp version 2.22 Pro on AWS

OPENVIDU_PRO_CLUSTER_MODE=auto
OPENVIDU_PRO_CLUSTER_ENVIRONMENT=aws
OPENVIDU_PRO_CLUSTER_MEDIA_NODES=3

After the openvidu server failed to communicate with one of the KMS servers, while trying several times to start the KMS server, a nodecrash occurred in the KMS server.
However, like the setting information above, I declared "OPENVIDU_PRO_CLUSTER_MEDIA_NODES=3 ", but only two media servers were alive in the end.

It is understandable to try restarting the KMS that has a problem several times, but if the KMS server is finally shut down, is it normal to automatically create another new KMS instance?

I and my team look forward to your quick response and your prosperity.

Thanks

Hello @Jimahn.Park ,

What is the problem exactly? If I understood correctly, your stack was configured to have 3 media nodes, but one failed, and then OpenVidu created another one and destroyed the not healthy one.

Is it correct?
Regards

Hello
What you say should be normal. However, several attempts were made to automatically create a new KMS server, but were unsuccessful. So there were only 2 kms in Healthy state.

Which region are you deploying?

Were you able to create that missing instance? What logs errors do you have at OpenVidu PRO server side?

Thanks for your quick response…
my region is on AP-NORTHEAST-2 (SEOUL)

Server(OVP) side logs are followed

openvidu-openvidu-server-1 | [ERROR] 2022-09-19 03:09:04,883 [Timer-2] io.openvidu.server.pro.infrastructure.metrics.MediaNodesCpuLoadCollector - Exception collecting CPU load of Media Node media_i-0c2d51e61afe822a8: Connect to 172.29.0.25:3000 [/172.29.0.25] failed: connect timed out

openvidu-openvidu-server-1 | [ERROR] 2022-09-19 03:10:09,011 [Timer-5] io.openvidu.server.kurento.kms.KmsManager - According to Timer KMS with uri ws://172.29.0.25:8888/kurento and KurentoClient [org.kurento.client.KurentoClient@4a238725] is not reconnected yet. Exception org.kurento.jsonrpc.JsonRpcException

I will test on my side.

One more question. Did you upgrade the infrastructure manually? Or did you deploy a new Cloudformation for version 2.22.0?

OPENVIDU_PRO_CLUSTER_MEDIA_NODES is the initial number of media nodes. If some of the nodes fails, the number will not increase to that specified number.

For this behavior you want, you need to set up these environment variables:

OPENVIDU_PRO_CLUSTER_MEDIA_NODES=0
OPENVIDU_PRO_CLUSTER_AUTOSCALING=true
OPENVIDU_PRO_CLUSTER_AUTOSCALING_MAX_NODES=3
OPENVIDU_PRO_CLUSTER_AUTOSCALING_MIN_NODES=3
OPENVIDU_PRO_CLUSTER_AUTOSCALING_MAX_LOAD=100
OPENVIDU_PRO_CLUSTER_AUTOSCALING_MIN_LOAD=0

In this way, you will have a fixed number of media nodes, and the Autoscaling system will check the number of media nodes periodically.

@pabloFuente Pls, correct me if I am wrong.

1 Like

You are right, @cruizba .
Configuration parameter OPENVIDU_PRO_CLUSTER_MEDIA_NODES only indicates how many Media Nodes you want your cluster to have on start up. This is taken directly from OpenVidu docs:


So this property ensures that after launching your OpenVidu cluster in AWS, the number of Media Nodes will be exactly that. But it does not guarantee that new Media Nodes will be automatically added upon a Media Node crash. You must add them manually: Scalability - OpenVidu Docs

In order to let OpenVidu launch new Media Nodes in the case of a crash, you will have to enable the autoscaling feature just as @cruizba said.

2 Likes

Yes, it works!
After setting the OPENVIDU PRO CLUSTER AUTO SCALING and related information, it works properly.

Thank you

2 Likes