-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] eBPF offload consideration #360
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be in-tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! These files are generated by bpf2go during go generate
. The objects can be reused, and the generation requires kernel headers, C compiler, etc. These tools might be not available at most users and also not sure if a single go get github.com/pion/turn/v3
would trigger go generate
. Bundling the generated objects is common solution for this; e.g., the cilium/ebpf examples at https://github.com/cilium/ebpf/tree/main/examples contain the eBPF objects too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is very useful for attackers. Attackers can create a PR to update .o
file built with malicious changes along with non-harmful code changes.
This is pretty magical! Great work :) I am in support of adding this. I think people will find this useful |
I added you to the repo @levaitamas Unfortunately these days I don't have much bandwidth to get involved. I would love to support you everywhere I can. If you want me to add other developers so you can work together happy to do that |
Thanks @Sean-Der! I do appreciate adding me to the repo, since that would definitely ease supporting the eBPF offload once it gets integrated. I would recommend adding @rg0now. He has a great understanding of the pion ecosystem, and already made impactful contributions (e.g., support multi-thread UDP). |
Done! That was a major oversight that @rg0now wasn’t in already :( |
CC @stv0g |
Great work! I am also in support of getting this in 👍🏻 |
4ac0f62
to
5304857
Compare
b59fd8f
to
07bfae9
Compare
07bfae9
to
f5c1fbf
Compare
120127b
to
4b776f2
Compare
I was going to make a PR that adds a user configurable callback at these locations to allow configuring external network accelerators, but I see you already did it. Thanks! https://github.com/l7mp/turn/blob/4b776f2d67b2256552f8298f450b4b0640b17183/internal/allocation/allocation.go#L128 |
4b776f2
to
c5bdc92
Compare
c5bdc92
to
3c4f115
Compare
Hi,
Me and @rg0now have been investigating on boosting
pion/turn
performance with eBPF. As a first step, we implemented an eBPF/XDP offload for UDP channel bindings. This way,pion/turn
can offload the channel data processing to the kernel. Below we present our implementation details, early results and call for a discussion to consider eBPF offload inpion/turn
.Implementation details
How does it work?
The XDP offload handles ChannelData messages only. The userspace TURN server is responsible for all the other functionality from building channels to handle requests and etc. The offload mechanisms are activated after a successful channel binding, in the method
Allocation.AddChannelBind
. The userspace TURN server sends peer and client info (5-tuples and channel id) to the XDP program via an eBPF map. From that point the XDP program can detect channel data coming from the peer or from the client. When a channel binding gets removed the corresponding data will be deleted from the eBPF maps and thus there will be no offload for that channel.Changes to pion/turn
New: We introduce a new internal
offload
package, which manages offload mechanisms. Currently, there are two implementations: the XDPOffload that uses XDP, and a NullOffload for testing purposes.Changed: The kernel offload complicates lifecycle management since eBPF/XDP offload outlives TURN server objects. This calls for new public methods in package turn to manage the offload engine's lifetime:
InitOffload
starts the offload engine (e.g., loads the XDP program and creates eBPF maps) andShutdownOffload
removes the offload engine. Note that these methods should be called by the application as shown in theserver_test.go
benchmark.But after everything is set up, channel binding offload management happens in
Allocation.AddChannelBind
andAllocation.DeleteChannelBind
with no change in their usage.eBPF/XDP details
The XDP part consist of a program that describes the packet processing logic to be executed when the network interface receives a packet. The XDP program uses eBPF maps to communicate with the user space TURN server.
Maps: The XDP offload uses the following maps to keep track of connections, store statistics, and to aid traffic redirects between interfaces:
turn_server_downstream_map
turn_server_upstream_map
turn_server_stats_map
turn_server_interface_ip_addresses_map
XDP Program: The XDP program receives all packets as they arrive to the network interface. It filters IPv4/UDP packets (caveat: VLAN and other tunneling options are not supported), and checks whether the packets belong to any channel binding (i.e., checks the 5-tuple and channel-id). If there is a match, the program does the ChannelData handling: updates 5-tuple, adds or removes the ChannelData header, keeps track of statistics, and finally redirects the packet to the corresponding network interface. Other non channel data packets are passed to the network stack for further processing (e.g., channel refresh messages and other STUN/TURN traffic goes to user space TURN server).
Results
CPU profiling
Prior results are promising. The CPU profiling with the benchmark (#298) shows that the
server.ReadLoop()
that took 47.9 sec before, runs for 0.96 sec with the XDP offload.Flame graph w/o the offload:
Flame graph w/ XDP offload:
Microbenchmark with simple-server
Measurements with iperf, turncat (our in-house TURN proxy), and the simple-server example show outstanding (150x!) delay reduction and significant (6x) bandwidth boost.
Measurement setup
Delay results
Bandwidth results
Discussion
bpftool
to dump the map content)bpf_redirect()
that handles packet redirects in eBPF/XDP supports redirects to NIC egress queues in XDP. This prevents supporting scenarios when clients exchange traffic in a server-local 'loop'.lo
interface).