Important
The bpf_powertcp
congestion control is fully functional but requires TCP-INT
to be deployed on your network switches. A description on how to deploy TCP-INT
is unfortunately out of the scope of this repository. You can find a hint in
the TCP-INT repository: Switch Code.
Note
The bpf_rttpowertcp
is fully functional when the network interface(s)
support hardware timestamping. You can check the support
with (as root/with sudo
)
ethtool -T INTERFACE | grep hardware-receive
which should output hardware-receive
.
Following are step-by-step instructions on how to use and experiment with the PowerTCP eBPF implementation. All commands listed here are assumed to be executed in the root folder of this repository.
When loaded into the kernel, the congestion control algorithms are called
bpf_powertcp
and bpf_rttpowertcp
.
- TCP-INT deployed on network switches
- Linux kernel 5.10 or above (ideally 6.0 or above)
bpftool
version 5.15 or aboveclang
version 3.7 or aboveg++
version 10 or above- libbpf version 0.5 or above
llvm-strip
make
The required versions are available starting with Debian 10 (Bullseye) and Ubuntu 22.04 (Jammy Jellyfish).
The installation of the required software is shown in the following.
Details on the kernel requirements
The target kernel must be compiled with CONFIG_DEBUG_INFO_BTF=y
. It usually
is, check with
grep -w CONFIG_DEBUG_INFO_BTF /boot/config-$(uname -r)
For optimal performance, the target kernel can be
patched for sk_pacing_rate
to be writable
from eBPF code. This patch is included in kernel versions 6.0 and following, no
further action is required. If the target kernel is manually patched, enable
the usage of the pacing rate in the eBPF programs by appending
HAVE_WRITABLE_SK_PACING=1
to the below invocation of make
.
After checking out this repository, also checkout TCP-INT which is managed as a Git submodule in the bpf/tcp-int/ subdirectory:
git submodule update --init
The preparation steps need to be executed on both client and server.
-
Install required packages (as root/with
sudo
):apt install 'bpftool|linux-tools-common$' clang g++ gcc libbpf-dev llvm make
Ideally, tune the network interface IFACE for low latency etc. (as root/with
sudo
):apt install ethtool procps tuned ./tools/tune-eth IFACE
-
Build the PowerTCP BPF program and TCP-INT:
make -C bpf/
If you are using a modified TCP-INT P4 application that replaces the
swlat
telemetry field with a timestamp, appendUSE_SWLAT_AS_TIMESTAMP=1
to the above invocation ofmake
.Disable stripping of the object files (for more human-readable
objdump
output) by appendingLLVM_STRIP=/bin/true
to the above invocation ofmake
. -
For
bpf_rttpowertcp
, enable hardware timestamping on the relevant network interface(s) IFACE(s) (as root/withsudo
):./bpf/powertcp enable-hwts IFACE(s)
Close any previously opened screen sessions that were opened this way.
Start iperf
and iperf3
server instances, ready to use PowerTCP, in a screen
session (as root/with sudo
):
./tools/setup-bpf iperf-servers
Beware: You are root user inside the screen session!
Algorithm parameters (see On the client) do not need to be set on the server, they are irrelevant here.
Note
Applications that want to use bpf_powertcp
or bpf_rttpowertcp
must be
executed in the tcp-int cgroup. The setup-bpf
script takes care of this.
On the client, you can use PowerTCP in an interactive session or automatically record traces of the algorithm execution.
Close any previously opened screen sessions that were opened this way.
The setup_bpf
script opens a screen session readily prepared to use PowerTCP.
Applications executed in this screen session are in the tcp-int cgroup, as
required.
You can pass algorithm parameters to setup-bpf
. You should pass at least
hop_bw
and host_bw
, e.g (as root/with sudo
):
./tools/setup-bpf iperf-client tracing host_bw=25000 hop_bw=25000 base_rtt=50
For a list of the available parameters see
./bpf/powertcp -h
Beware: You are root user inside the screen session!
Inside the screen session, you can, e.g,
- run
iperf3
(oriperf
, the options differ)iperf3 -N -C bpf_powertcp -c SERVER_IP iperf3 -N -C bpf_rttpowertcp -c SERVER_IP
- or watch PowerTCP’s trace output
(for CSV output append the option
./bpf/powertcp trace
-C
—or see Record traces) - or watch TCP-INT’s trace output
./bpf/tcp-int/code/src/tools/tcp_int trace
- or quickly setup PowerTCP with different parameters
./bpf/powertcp register -f tracing host_bw=100000 hop_bw=100000 base_rtt=50 gamma=0.7
To record a trace, close any previously opened screen sessions opened for interactive usage.
Record traces (as CSV files) of running iperf
/iperf3
with multiple
combinations of algorithm parameters (as root/with sudo
):
./tools/bpf_tracer iperf3 -N -c SERVER_IP -C bpf_powertcp -- host_bw=25000 hop_bw=20000 base_rtt=50 beta="2 10" gamma="0.5 0.9"
bpf_tracer
takes an iperf
/iperf3
command line followed by PowerTCP
algorithm parameters, separated by a --
:
./tools/bpf_tracer IPERF(3)_CMDLINE -- POWERTCP_PARAMS
IPERF(3)_CMDLINE
must contain a full iperf
/iperf3
client command line;
it must specify the congestion control algorithm to use.
POWERTCP_PARAMS
can contain any of the parameters listed by
./bpf/powertcp -h
. Multiple values can be given for each parameter as a
quoted string.
The above example call produces 4 CSV files:
bpf_powertcp-gamma=0.5 base_rtt=50 hop_bw=20000 beta=10 host_bw=25000.csv
bpf_powertcp-gamma=0.5 base_rtt=50 hop_bw=20000 beta=2 host_bw=25000.csv
bpf_powertcp-gamma=0.9 base_rtt=50 hop_bw=20000 beta=10 host_bw=25000.csv
bpf_powertcp-gamma=0.9 base_rtt=50 hop_bw=20000 beta=2 host_bw=25000.csv