Egress fails behind NAT – “Start signal not received” (OpenVidu 3.4.1)

Hello everyone,

I am experiencing an issue with OpenVidu Egress when the deployment is behind NAT, and I would appreciate guidance on whether this is a supported setup or if I am missing a required configuration.


Environment

  • OS: Oracle Linux 9

  • OpenVidu version: 3.4.1 (deployed using the official installation script)

  • TURN/STUN: Caddy with

    --experimental-turn-tls-with-main-domain
    
    
  • Topology:

    • Server internal IP: 10.10.99.9

    • NAT / WAN IP (test setup): 192.168.32.253

    • Production has a real public IP (works fine there (no NAT) )


Network configuration

The following ports are forwarded via DNAT to the OpenVidu server:

  • TCP: 80, 443

  • TCP/UDP: 7881

  • UDP: high port range as recommended in the OpenVidu documentation


What works

  • Clients can successfully:

    • Connect to OpenVidu

    • Create rooms

    • Join conferences

    • Publish/subscribe media

  • This works both:

    • From internal networks (e.g. 10.10.98.2)

    • From a workstation simulating an “external” zone

  • Interestingly, conferencing still works even when high UDP ports are blocked


The problem: Egress

The issue appears only with Egress.

Observations

  • Egress launches Chromium inside the container and attempts to join the room

  • When the OpenVidu deployment is behind NAT, Egress never actually starts recording

  • Even with all ports fully opened, the behavior is the same

  • Network capture shows:

    • Traffic to API over 443

    • Some traffic to Pion / LiveKit over 7881

    • Then the flow breaks


Egress logs (excerpt)

waiting for start signal
...
egress_aborted
error: "Start signal not received"
code: 412
details: "End reason: Source closed"

Chromium starts, the GStreamer pipeline is built, but the start signal is never received, and the pipeline is eventually torn down.


OpenVidu / LiveKit side logs

On the OpenVidu side, I only see peer connection failures related to DTLS:

peer connection state changed: closed
Failed to start SCTP: DTLS not established
failed to open SrtpSession: the DTLS transport has not started yet

This strongly suggests that the Egress participant never establishes a proper WebRTC connection.


Important comparison

With the exact same configuration, when I deploy OpenVidu on a VM with a directly bound public IP (no NAT):

  • Egress works perfectly

  • Recording starts immediately

  • No DTLS / start-signal issues

This makes me believe the problem is specifically related to NAT traversal for Egress, not a general Egress bug.


Question

Is OpenVidu Egress officially supported behind NAT?

If yes:

  • Are there additional requirements for Egress (extra ports, advertised IPs, ICE/TURN settings)?

  • Does Egress require a publicly routable IP for DTLS/SRTP to complete?

If no:

  • Is a public IP on the Egress/OpenVidu node a hard requirement?

Any clarification or pointers would be greatly appreciated.
Thank you in advance for your help.

And here are some raw logs:

openvidu | 2025-12-29T22:18:41.073Z DEBUG livekit analytics/analytics.go:256 events:{id:"AE_sQrRoF2iicjP" type:WEBHOOK timestamp:{seconds:1767046721 nanos:73262371} node_id:"ND_xZMCuLdsNxCB" webhook:{event_id:"EV_u2sDvBijDwCk" event:"egress_ended" egress_id:"EG_afnnZBPofPAd" created_at:{seconds:1767046721} queued_at:{seconds:1767046721 nanos:58632507} queue_duration_ns:191445 sent_at:{seconds:1767046721 nanos:58824133} send_duration_ns:14396705 url:"``http://127.0.0.1:6080/livekit/webhook``" service_status:"EGRESS_ABORTED" service_error_code:412 service_error:"Start signal not received"}}

openvidu | 2025-12-29T22:14:23.365Z INFO livekit.transport.pion.pc v4@v4.1.1/peerconnection.go:507 peer connection state changed: closed {"room": "Roomxxx-voee6epajiez7p9", "roomID": "RM_kaW4odrDoTwQ", "participant": "EG_scjWh5hgiAuZ", "pID": "PA_XVLpoJRCFTip", "remote": false, "transport": "PUBLISHER"} openvidu | 2025-12-29T22:14:23.365Z WARN livekit.transport.pion.pc v4@v4.1.1/peerconnection.go:2662Failed to start manager: connecting canceled by caller {"room": "Roomxxx-voee6epajiez7p9", "roomID": "RM_kaW4odrDoTwQ", "participant": "EG_scjWh5hgiAuZ", "pID": "PA_XVLpoJRCFTip", "remote": false, "transport": "SUBSCRIBER"} openvidu | 2025-12-29T22:14:23.365Z WARN livekit.transport.pion.pc v4@v4.1.1/peerconnection.go:1572Failed to start SCTP: DTLS not established {"room": "Roomxxx-voee6epajiez7p9", "roomID": "RM_kaW4odrDoTwQ", "participant": "EG_scjWh5hgiAuZ", "pID": "PA_XVLpoJRCFTip", "remote": false, "transport": "SUBSCRIBER"} openvidu | 2025-12-29T22:14:23.365Z WARN livekit.transport.pion.pc v4@v4.1.1/peerconnection.go:1859undeclaredMediaProcessor failed to open SrtpSession: the DTLS transport has not started yet {"room": "Roomxxx-voee6epajiez7p9", "roomID": "RM_kaW4odrDoTwQ", "participant": "EG_scjWh5hgiAuZ", "pID": "PA_XVLpoJRCFTip", "remote": false, "transport": "SUBSCRIBER"} openvidu | 2025-12-29T22:14:23.365Z WARN livekit.transport.pion.pc v4@v4.1.1/peerconnection.go:1931undeclaredMediaProcessor failed to open SrtcpSession: the DTLS transport has not started yet {"room": "Roomxxx-voee6epajiez7p9", "roomID": "RM_kaW4odrDoTwQ", "participant": "EG_scjWh5hgiAuZ", "pID": "PA_XVLpoJRCFTip", "remote": false, "transport": "SUBSCRIBER"}

openvidu | 2025-12-29T21:57:49.782Z INFO livekit.webhook webhook/resource_url_notifier.go:295 sent webhook {"event": "egress_ended", "id": "EV_7opZTWgvvvdH", "webhookTime": 1767045469, "url": "``http://127.0.0.1:6080/livekit/webhook``", "egressID": "EG_qCmDdaT3xY6v", "status": "EGRESS_ABORTED", "error": "Start signal not received", "queueDuration": "69.263µs", "sendDuration": "12.313141ms"} openvidu | 2025-12-29T21:57:54.378Z INFO livekit.api service/twirp.go:128 API Egress.ListEgress {"service": "Egress", "method": "ListEgress", "room": "Room-msisq5ityceq30l", "duration": "258.89µs", "status": "200"}

egress | 2025-12-29T21:57:34.406Z DEBUG egress source/web.go:218 launching chrome {"nodeID": "NE_W8GPLiWCgwvg", "handlerID": "EGH_NrpBf84UYi5u", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v", "url": "``http://localhost:7980/?layout=grid&token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3NjcxMzE4NTQsImlzcyI6IkFQSTNkY25KaTIzaWdCTCIsImtpbmQiOiJlZ3Jlc3MiLCJuYmYiOjE3NjcwNDU0NTQsInN1YiI6IkVHX3FDbURkYVQzeFk2diIsInZpZGVvIjp7ImNhblB1Ymxpc2giOmZhbHNlLCJjYW5QdWJsaXNoRGF0YSI6ZmFsc2UsImNhblN1YnNjcmliZSI6dHJ1ZSwiaGlkZGVuIjp0cnVlLCJyZWNvcmRlciI6dHJ1ZSwicm9vbSI6IlJvb20tbXNpc3E1aXR5Y2VxMzBsIiwicm9vbUpvaW4iOnRydWV9fQ.aNXWOE3vFtAicgKBwgLxRGtyU3J0O6QVc9h17Hhkzxw&url=ws%3A%2F%2F127.0.0.1%3A7880``", "sandbox": false, "insecure": false} egress | 2025-12-29T21:57:34.666Z DEBUG egress source/web.go:341 chrome initialized {"nodeID": "NE_W8GPLiWCgwvg", "handlerID": "EGH_NrpBf84UYi5u", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v"} egress | 2025-12-29T21:57:34.719Z DEBUG egress gstreamer/bin.go:70 adding src audio to pipeline {"nodeID": "NE_W8GPLiWCgwvg", "handlerID": "EGH_NrpBf84UYi5u", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v"} egress | 2025-12-29T21:57:34.721Z DEBUG egress gstreamer/bin.go:70 adding src video to pipeline {"nodeID": "NE_W8GPLiWCgwvg", "handlerID": "EGH_NrpBf84UYi5u", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v"} egress | 2025-12-29T21:57:34.722Z DEBUG egress gstreamer/bin.go:76 adding sink file to pipeline {"nodeID": "NE_W8GPLiWCgwvg", "handlerID": "EGH_NrpBf84UYi5u", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v"} egress | 2025-12-29T21:57:34.728Z DEBUG egress pipeline/controller.go:181 waiting for start signal{"nodeID": "NE_W8GPLiWCgwvg", "handlerID": "EGH_NrpBf84UYi5u", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v"} egress | 2025-12-29T21:57:49.743Z INFO egress source/web.go:320 chrome: END_RECORDING {"nodeID": "NE_W8GPLiWCgwvg", "handlerID": "EGH_NrpBf84UYi5u", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v"} egress | 2025-12-29T21:57:49.743Z DEBUG egress pipeline/controller.go:353 stopping pipeline {"nodeID": "NE_W8GPLiWCgwvg", "handlerID": "EGH_NrpBf84UYi5u", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v", "reason": "Source closed"} egress | 2025-12-29T21:57:49.743Z DEBUG egress gstreamer/state.go:75 pipeline state building -> stopping {"nodeID": "NE_W8GPLiWCgwvg", "handlerID": "EGH_NrpBf84UYi5u", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v"} egress | 2025-12-29T21:57:49.743Z DEBUG egress source/web.go:121 closing chrome {"nodeID": "NE_W8GPLiWCgwvg", "handlerID": "EGH_NrpBf84UYi5u", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v"} egress | 2025-12-29T21:57:49.743Z DEBUG egress gstreamer/state.go:75 pipeline state stopping -> finished {"nodeID": "NE_W8GPLiWCgwvg", "handlerID": "EGH_NrpBf84UYi5u", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v"} egress | 2025-12-29T21:57:49.763Z DEBUG egress source/web.go:130 closing X display {"nodeID": "NE_W8GPLiWCgwvg", "handlerID": "EGH_NrpBf84UYi5u", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v"} egress | 2025-12-29T21:57:49.765Z DEBUG egress source/web.go:136 unloading pulse module {"nodeID": "NE_W8GPLiWCgwvg", "handlerID": "EGH_NrpBf84UYi5u", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v"} egress | 2025-12-29T21:57:49.769Z INFO egress info/io.go:230 egress_aborted {"nodeID": "NE_W8GPLiWCgwvg", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v", "requestType": "room_composite", "outputType": "file", "error": "Start signal not received", "code": 412, "details": "End reason: Source closed"} egress | 2025-12-29T21:57:49.770Z DEBUG egress server/server_rpc.go:172 egress metrics {"nodeID": "NE_W8GPLiWCgwvg", "clusterID": "", "egressID": "EG_qCmDdaT3xY6v", "avgCPU": 0.08539135044934755, "maxCPU": 0.6934673366849723, "maxMemory": 1205219328}

The same with 3.5.0 version, egress behind NAT doesnt work

Hello,

Just a simple question: does your NAT private network have access to the Internet? It is possible that the Chrome process inside the egress container requires some resource from a public domain on the Internet, and if it cannot access it that could mess up its initialization.

Also: what deployment type have you exactly tested?

Best regards.

Hello,

My private network does not have direct access to the Internet. For testing purposes, I analyzed the network traffic and created allow rules for all addresses required by Chrome. However, the issue persists.

I am using an OpenVidu CE on-premise single-node setup.

We will try to launch a similar setup with the described networking and test the egress container. If we are able to consistently replicate the issue, it will be easier to fix.

It appears that I have identified the root cause of the issue.

For clarity, let us assume the following network topology:

  • 192.168.1.2 – LAN workstation

  • 10.8.99.134 – OpenVidu server located in the DMZ

  • 192.168.31.253 – Simulated public IP address

  • 192.168.31.5 – External workstation

The OpenVidu server operates in the DMZ and is configured with external_ip = 192.168.31.253. The ports specified in the official documentation (TCP/UDP 443, TCP 7881, and the required high UDP port range) are forwarded from 192.168.31.253 to 10.8.99.134.

The installation is configured to use TURN over TLS, with the domain OpenVidu.home.local. Both the OpenVidu API and the Meet application are configured to use this domain.

From the LAN and DMZ networks, there is no access to the “public” interface (192.168.31.253), as it resides in a highly restricted network segment. DNS resolution within the LAN maps OpenVidu.home.local to 10.8.99.134.

From the LAN:

  • All required ports (as per documentation) are allowed toward the DMZ address.

  • Connectivity tests to the required ports succeed.

  • The LiveKit API and Meet application are reachable.

  • However, joining a room fails.

From the external workstation:

  • Connectivity tests to the required ports succeed.

  • The LiveKit API and Meet application function correctly.

  • Joining a room works.

  • However, recording egress does not work.

This strongly suggests that the issue lies in the ICE server configuration. The browser receives the following ICE configuration:

{
  iceServers: [
    "turn:192.168.31.253:443?transport=udp",
    "turns:openvidu.house.local:443?transport=tcp"
  ],
  iceTransportPolicy: "all",
  bundlePolicy: "balanced",
  rtcpMuxPolicy: "require",
  iceCandidatePoolSize: 0,
  sdpSemantics: "unified-plan",
  extmapAllowMixed: true
}

From the LAN, the connection fails because the ICE configuration advertises the public IP address (192.168.31.253), which is not reachable from that network. In my opinion, the client should fall back to the turns: entry using the domain-based address; however, this does not occur.

I also attempted to add stun_servers in livekit.yaml, but this did not result in a fallback either. This may be due to missing or invalid credentials, as I used placeholder values.

The reason egress recording does not work appears to be similar: the egress service behaves like a LAN client. It does not have access to the public IP address and therefore cannot establish a connection, even though turns: is configured using a domain name that resolves to a reachable DMZ address.

In summary, the issue seems to stem from the fact that this setup requires simultaneous support for both private (DMZ) and public (“Internet-facing”) addresses, controlled via DNS, in a highly restricted environment. From my perspective, the current solution does not appear to be fully compatible with such a deployment model.

Do you agree with this assessment, or is there a recommended approach to make this configuration work in a setup with these constraints?

Hello @n224

I think your configuration approach is possible, but as you suspected, clients in the LAN and DMZ aren’t finding valid candidates to connect.

Based on my understanding of your setup, LAN and DMZ clients can only access OpenVidu through 10.8.99.134. For this to work, you need ICE candidates where the remote IP is 10.8.99.134, but the announced IP for turns (relay) candidates remains 192.168.31.253

In my opinion, the client should fall back to the turns: entry using the domain-based address; however, this does not occur.

It fallbacks, I think the problem is the announced ip of the LiveKit server. The candidate is probably trying to relay using the public ip instead of the private one.

Here’s what I think it could be the solution: configuring external_ip = 10.8.99.134. This should works because the TURN server acts as a relay, forwarding media to that private IP. Clients in the public network will connect accessing openvidu.home.local, while LAN/DMZ clients can connect directly through 10.8.99.134 emitted candidate.

Configuration steps:

  1. SSH into your OpenVidu deployment
  2. Navigate to /opt/openvidu/config/livekit.yaml
  3. Add the following to the rtc config section:
rtc:
    ...
    use_external_ip: false
    node_ip: 10.8.99.134

Why this works (despite appearing contradictory):

  • Public clients: Can connect because TURN relays traffic to the announced IP (10.8.99.134)
  • LAN/DMZ clients: Can access directly via 10.8.99.134 without needing TURN

Trade-off: Public-facing clients will use TCP instead of UDP for media transmission.


Connection Flow:

Public Users:
External Client (x.x.x.x)
→ Domain: openvidu.home.local (192.168.31.253)
→ TURN Relay: relays from 192.168.31.253 to 10.8.99.134 LiveKit
→ LiveKit Server (10.8.99.134)

LAN/DMZ Users and Egress
Internal Client (y.y.y.y)
→ Direct connection to LiveKit (10.8.99.134) via host candidates


Let me know if this resolves your issue.

I will verify this. Thank you for your response. Setting the node_ip to the internal address did not initially seem logical to me. I will check it and get back to you.

It appears that, with the provided configuration, a connection cannot be established to the TURN/TURNS servers:

{
  iceServers: [
    "turn:10.8.99.134:443?transport=udp",
    "turns:openvidu.home.local:443?transport=tcp"
  ],
  iceTransportPolicy: "all",
  bundlePolicy: "balanced",
  rtcpMuxPolicy: "require",
  iceCandidatePoolSize: 0
}

On the WebRTC side, I am seeing the following:

  • URL: turns:openvidu.home.local:443?transport=tcp

  • Host candidate: 192.168.1.x:0

  • Error: Failed to establish connection

  • Error code: 701

TLS certificates are configured correctly, and Caddy appears to be operating as expected. The system attempts to fall back to the turns configuration, but the connection still fails.

Could you advise how I should properly test and validate the TURN/TURNS connectivity in this setup?

There are two scenarios:

  1. User connecting from the public internet
  2. Users connecting from the LAN.

My questions are:

  • Both of them are failing?
  • What should be the correct IP in each scenario to reach the server?

I would like to see what candidates are being generated, please do the following and share the asked information for both scenarios

  1. Create a Room in Firefox
  2. Go to about:webrtc
  3. Share the candidates negotiated for each scenario

The important part of these candidates is that Local Candidate and Remote Candidate are a good match to establish a connection.

With the current configuration (where node_ip is set to the internal interface), the Pion server advertises a local interface address, which makes it inaccessible to external users. As a result, this setup cannot function correctly for external connectivity.

I am now going to switch the configuration to use the actual external address in order to achieve partial functionality.

With the node_ip set to the internal interface are LAN users being able to connect?

@n224

In 3.6.0 we’ve improved OpenVidu setup in NATs environments, just in case you want to try it out.