Fixed issues with Mediasoup port allocation. #7979

barankyle · 2023-05-12T01:24:11Z

Summary

Testing of instanceservers on microk8s on a 24-thread CPU demonstrated some scaling problems with the port allocation to mediasoup. In order to make (data)producers consumable on every core, each (data)producer needs to be piped to every other core. This uses up two local ports per other core, as each router must create a pipeTransport to the other router, which also needs a pipeTransport. This scales exponentially; if mediasoup is given 200 ports to use, it cannot support a 16-core/thread processor, as that alone would require 240 ports, which does not even factor in the ports needed for incoming transports from clients.

The solution involved a newer feature of mediasoup, WebRtcServers. Prior to this, each external transport used up a port of its own, leading to 2n transports per client (one for recvTransport and one for sendTransport). WebRtcServers can handle a near-infinite number of transports on a single inbound port. Now, when an instanceserver starts, it makes a WebRtcServer on each worker and saves a reference on that worker. When a (data)transport is created, instead of passing listenIps to createWebRtcTransport, the routers' worker's WebRtcSerer is passed instead.

The solution also involves greatly expanding the number of ports that mediasoup uses, while trusting that only the first 100-200 are publicly exposed. Prior to this, only the 200 (by default) ports specified in the instanceserver fleet specification in the etherealengine Helm chart were used by mediasoup, under the theory that that many were needed to adequately handle 50-100 connecting clients. With the use of WebRtcServers, only the number of cores' worth of ports need to be exposed publicly, and since they are the first thing to start and be assigned ports, they will get ports somewhere in the range 40000-40199, starting with 40000.

Mediasoup is now given a 10,000-port block in total to work with. The first n ports will be used by the WebRtcServers, and the rest will be free to be allocated to pipeTransports as requested. This supports up to a 100-core/thread CPU, which seems sufficiently future-proof. If a higher-thread-count CPU needs to be supported, all that's needed is to set the environment variable NUM_RTC_PORTS, which will override the default of 10000.

Added an environment variable DEV_CHANNEL='true' on instanceservers' start-channel and dev-channel scripts. This allows those processes to run on a different port range so that, in dev mode, the channel server does not confliect with the ports used by the world server (by default channel will start at port 30000 instead of 40000).

Also made some more explicit closings of (data)producers, rather than just relying on the closing transport to close them.

References

closes #insert number here

Checklist

If this PR is still a WIP, convert to a draft
When this PR is ready, mark it as "Ready for review"
ensure all checks pass
Changes have been manually QA'd
Changes reviewed by at least 2 approved reviewer

QA Steps

List any additional steps required to QA the changes of this PR, as well as any supplemental images or videos.

Testing of instanceservers on microk8s on a 24-thread CPU demonstrated some scaling problems with the port allocation to mediasoup. In order to make (data)producers consumable on every core, each (data)producer needs to be piped to every other core. This uses up two local ports per other core, as each router must create a pipeTransport to the other router, which also needs a pipeTransport. This scales exponentially; if mediasoup is given 200 ports to use, it cannot support a 16-core/thread processor, as that alone would require 240 ports, which does not even factor in the ports needed for incoming transports from clients. The solution involved a newer feature of mediasoup, WebRtcServers. Prior to this, each external transport used up a port of its own, leading to 2n transports per client (one for recvTransport and one for sendTransport). WebRtcServers can handle a near-infinite number of transports on a single inbound port. Now, when an instanceserver starts, it makes a WebRtcServer on each worker and saves a reference on that worker. When a (data)transport is created, instead of passing listenIps to createWebRtcTransport, the routers' worker's WebRtcSerer is passed instead. The solution also involves greatly expanding the number of ports that mediasoup uses, while trusting that only the first 100-200 are publicly exposed. Prior to this, only the 200 (by default) ports specified in the instanceserver fleet specification in the etherealengine Helm chart were used by mediasoup, under the theory that that many were needed to adequately handle 50-100 connecting clients. With the use of WebRtcServers, only the number of cores' worth of ports need to be exposed publicly, and since they are the first thing to start and be assigned ports, they will get ports somewhere in the range 40000-40199, starting with 40000. Mediasoup is now given a 10,000-port block in total to work with. The first n ports will be used by the WebRtcServers, and the rest will be free to be allocated to pipeTransports as requested. This supports up to a 100-core/thread CPU, which seems sufficiently future-proof. If a higher-thread-count CPU needs to be supported, all that's needed is to set the environment variable NUM_RTC_PORTS, which will override the default of 10000. Added an environment variable DEV_CHANNEL='true' on instanceservers' start-channel and dev-channel scripts. This allows those processes to run on a different port range so that, in dev mode, the channel server does not confliect with the ports used by the world server (by default channel will start at port 30000 instead of 40000). Also made some more explicit closings of (data)producers, rather than just relying on the closing transport to close them.

packages/server-core/src/config.ts

hanzlamateen · 2023-05-12T04:27:11Z

@barankyle I tried deploying this branch to microk8s and I am still unable to connect to instance server. Do I need to update rtc port range in config?

…into mediasoup-port-refactor

hanzlamateen

Working fine on microk8s and minikube

cla-bot bot added the verified-contributor label May 12, 2023

barankyle requested a review from hanzlamateen May 12, 2023 01:24

HexaField reviewed May 12, 2023

View reviewed changes

packages/server-core/src/config.ts Show resolved Hide resolved

hanzlamateen and others added 6 commits May 17, 2023 14:39

Merge branch 'dev' into mediasoup-port-refactor

ad0dfa7

Merge branch 'dev' into mediasoup-port-refactor

adc8cd6

Merge branch 'dev' into mediasoup-port-refactor

ac2a1ce

Merge branch 'dev' of https://github.com/EtherealEngine/etherealengine …

fd3f20c

…into mediasoup-port-refactor

Merge branch 'dev' into mediasoup-port-refactor

3813d56

Merge branch 'dev' into mediasoup-port-refactor

62985ff

hanzlamateen enabled auto-merge May 30, 2023 13:07

hanzlamateen approved these changes May 30, 2023

View reviewed changes

hanzlamateen added this pull request to the merge queue May 30, 2023

Merged via the queue into dev with commit 4953293 May 30, 2023

hanzlamateen deleted the mediasoup-port-refactor branch May 30, 2023 13:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed issues with Mediasoup port allocation. #7979

Fixed issues with Mediasoup port allocation. #7979

barankyle commented May 12, 2023 •

edited

Loading

hanzlamateen commented May 12, 2023 •

edited

Loading

hanzlamateen left a comment

Fixed issues with Mediasoup port allocation. #7979

Fixed issues with Mediasoup port allocation. #7979

Conversation

barankyle commented May 12, 2023 • edited Loading

Summary

References

Checklist

QA Steps

hanzlamateen commented May 12, 2023 • edited Loading

hanzlamateen left a comment

Choose a reason for hiding this comment

barankyle commented May 12, 2023 •

edited

Loading

hanzlamateen commented May 12, 2023 •

edited

Loading