Skip to content
This repository has been archived by the owner on Aug 21, 2024. It is now read-only.

Fixed issues with Mediasoup port allocation. #7979

Merged
merged 7 commits into from
May 30, 2023
Merged

Conversation

barankyle
Copy link
Member

@barankyle barankyle commented May 12, 2023

Summary

Testing of instanceservers on microk8s on a 24-thread CPU demonstrated some scaling problems with the port allocation to mediasoup. In order to make (data)producers consumable on every core, each (data)producer needs to be piped to every other core. This uses up two local ports per other core, as each router must create a pipeTransport to the other router, which also needs a pipeTransport. This scales exponentially; if mediasoup is given 200 ports to use, it cannot support a 16-core/thread processor, as that alone would require 240 ports, which does not even factor in the ports needed for incoming transports from clients.

The solution involved a newer feature of mediasoup, WebRtcServers. Prior to this, each external transport used up a port of its own, leading to 2n transports per client (one for recvTransport and one for sendTransport). WebRtcServers can handle a near-infinite number of transports on a single inbound port. Now, when an instanceserver starts, it makes a WebRtcServer on each worker and saves a reference on that worker. When a (data)transport is created, instead of passing listenIps to createWebRtcTransport, the routers' worker's WebRtcSerer is passed instead.

The solution also involves greatly expanding the number of ports that mediasoup uses, while trusting that only the first 100-200 are publicly exposed. Prior to this, only the 200 (by default) ports specified in the instanceserver fleet specification in the etherealengine Helm chart were used by mediasoup, under the theory that that many were needed to adequately handle 50-100 connecting clients. With the use of WebRtcServers, only the number of cores' worth of ports need to be exposed publicly, and since they are the first thing to start and be assigned ports, they will get ports somewhere in the range 40000-40199, starting with 40000.

Mediasoup is now given a 10,000-port block in total to work with. The first n ports will be used by the WebRtcServers, and the rest will be free to be allocated to pipeTransports as requested. This supports up to a 100-core/thread CPU, which seems sufficiently future-proof. If a higher-thread-count CPU needs to be supported, all that's needed is to set the environment variable NUM_RTC_PORTS, which will override the default of 10000.

Added an environment variable DEV_CHANNEL='true' on instanceservers' start-channel and dev-channel scripts. This allows those processes to run on a different port range so that, in dev mode, the channel server does not confliect with the ports used by the world server (by default channel will start at port 30000 instead of 40000).

Also made some more explicit closings of (data)producers, rather than just relying on the closing transport to close them.

References

closes #insert number here

Checklist

  • If this PR is still a WIP, convert to a draft
  • When this PR is ready, mark it as "Ready for review"
  • ensure all checks pass
  • Changes have been manually QA'd
  • Changes reviewed by at least 2 approved reviewer

QA Steps

List any additional steps required to QA the changes of this PR, as well as any supplemental images or videos.

Testing of instanceservers on microk8s on a 24-thread CPU demonstrated some scaling problems with
the port allocation to mediasoup. In order to make (data)producers consumable on every core, each
(data)producer needs to be piped to every other core. This uses up two local ports per other core,
as each router must create a pipeTransport to the other router, which also needs a pipeTransport.
This scales exponentially; if mediasoup is given 200 ports to use, it cannot support a
16-core/thread processor, as that alone would require 240 ports, which does not even factor in the ports
needed for incoming transports from clients.

The solution involved a newer feature of mediasoup, WebRtcServers. Prior to this, each external transport
used up a port of its own, leading to 2n transports per client (one for recvTransport and one for sendTransport).
WebRtcServers can handle a near-infinite number of transports on a single inbound port. Now, when an instanceserver
starts, it makes a WebRtcServer on each worker and saves a reference on that worker. When a (data)transport is created,
instead of passing listenIps to createWebRtcTransport, the routers' worker's WebRtcSerer is passed instead.

The solution also involves greatly expanding the number of ports that mediasoup uses, while trusting that
only the first 100-200 are publicly exposed. Prior to this, only the 200 (by default) ports specified in
the instanceserver fleet specification in the etherealengine Helm chart were used by mediasoup, under the
theory that that many were needed to adequately handle 50-100 connecting clients. With the use of WebRtcServers,
only the number of cores' worth of ports need to be exposed publicly, and since they are the first thing to
start and be assigned ports, they will get ports somewhere in the range 40000-40199, starting with 40000.

Mediasoup is now given a 10,000-port block in total to work with. The first n ports will be used by the WebRtcServers,
and the rest will be free to be allocated to pipeTransports as requested. This supports up to a 100-core/thread CPU,
which seems sufficiently future-proof. If a higher-thread-count CPU needs to be supported, all that's needed is
to set the environment variable NUM_RTC_PORTS, which will override the default of 10000.

Added an environment variable DEV_CHANNEL='true' on instanceservers' start-channel and dev-channel scripts.
This allows those processes to run on a different port range so that, in dev mode, the channel server does not
confliect with the ports used by the world server (by default channel will start at port 30000 instead of 40000).

Also made some more explicit closings of (data)producers, rather than just relying on the closing transport to close them.
@hanzlamateen
Copy link
Member

hanzlamateen commented May 12, 2023

@barankyle I tried deploying this branch to microk8s and I am still unable to connect to instance server. Do I need to update rtc port range in config?

image

@hanzlamateen hanzlamateen enabled auto-merge May 30, 2023 13:07
Copy link
Member

@hanzlamateen hanzlamateen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working fine on microk8s and minikube

@hanzlamateen hanzlamateen added this pull request to the merge queue May 30, 2023
Merged via the queue into dev with commit 4953293 May 30, 2023
@hanzlamateen hanzlamateen deleted the mediasoup-port-refactor branch May 30, 2023 13:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants