-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Problem description
When opening an SCTP socket on Linux, Erlang/OTP does set its own default SO_SNDBUF
/SO_RCVBUF
values if the respective socket options (sndbuf
and recbuf
) are not given to gen_sctp:open/N
explicitly:
$ sudo strace -e "fd=19" -f -p $(pidof sctp_perf.escript)
...
[pid 3233623] getsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], [4]) = 0
[pid 3233623] getsockopt(19, SOL_IP, IP_TOS, [0], [4]) = 0
[pid 3233623] setsockopt(19, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 3233623] setsockopt(19, SOL_IP, IP_TOS, [0], 4) = 0
[pid 3233623] setsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], 4) = 0
[pid 3233623] getsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], [4]) = 0
[pid 3233623] getsockopt(19, SOL_IP, IP_TOS, [0], [4]) = 0
[pid 3233623] setsockopt(19, SOL_SOCKET, SO_SNDBUF, [65536], 4) = 0 <----- SNDBUF
[pid 3233623] setsockopt(19, SOL_IP, IP_TOS, [0], 4) = 0
[pid 3233623] setsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], 4) = 0
[pid 3233623] getsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], [4]) = 0
[pid 3233623] getsockopt(19, SOL_IP, IP_TOS, [0], [4]) = 0
[pid 3233623] setsockopt(19, SOL_SOCKET, SO_RCVBUF, [1024], 4) = 0 <----- RCVBUF
[pid 3233623] setsockopt(19, SOL_IP, IP_TOS, [0], 4) = 0
[pid 3233623] setsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], 4) = 0
[pid 3233623] getsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], [4]) = 0
[pid 3233623] getsockopt(19, SOL_IP, IP_TOS, [0], [4]) = 0
[pid 3233623] setsockopt(19, SOL_SCTP, SCTP_EVENTS, "\1\1\1\1\1\1\1\0\0\0\0\0\0\0", 14) = 0
[pid 3233623] setsockopt(19, SOL_IP, IP_TOS, [0], 4) = 0
[pid 3233623] setsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], 4) = 0
[pid 3233623] bind(19, {sa_family=AF_INET, sin_port=htons(4242), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
The default SNDBUF
and RCVBUF
values are defined here:
https://github.com/erlang/otp/blame/master/lib/kernel/src/inet_int.hrl#L451
For some reason, the default RCVBUF
size (1024) is much smaller than the default SNDBUF
size (65536), and both are well below modern Linux defaults. Such a small RCVBUF
size becomes problematic when the remote peer is sending large packets, which can easily happen thanks to the Nagle's algorithm. In such cases, data can be dropped or delayed unnecessarily.
To Reproduce
I developed a small SCTP performance testing utility:
https://gitea.osmocom.org/vyanitskiy/erlang-sctp-perf
Just like iperf, this tool includes both client and server modes. The server accepts incoming connections and simply echoes back any received data. The client connects to the server, sends N
DATA chunks in a single burst, and waits to receive them all back. See the README.md for more details and usage examples.
Performance figures
Below are performance figures when running the test with different parameters. In all four test cases, the client was sending 32 DATA chunks (-n 32
) 64 bytes long each (-l 64
).
Default SNDBUF
/RCVBUF
sizes and the Nagle enabled
$ ./sctp_perf.escript server
=INFO REPORT==== 11-Apr-2025::00:50:53.295107 ===
SCTP server sock options: [{recbuf,2304},{sndbuf,131072},{sctp_nodelay,false}]
=NOTICE REPORT==== 11-Apr-2025::00:50:53.307781 ===
SCTP server listening on 127.0.0.1:4242
$ time ./sctp_perf.escript client -n 32 -l 64
=INFO REPORT==== 11-Apr-2025::01:04:45.393203 ===
SCTP client sock options: [{recbuf,2304},{sndbuf,131072},{sctp_nodelay,false}]
=NOTICE REPORT==== 11-Apr-2025::01:04:45.394976 ===
SCTP client connecting to 127.0.0.1:4242
=INFO REPORT==== 11-Apr-2025::01:04:45.395052 ===
CONN(aid=40988, {127,0,0,1}:4242) state: comm_up
=INFO REPORT==== 11-Apr-2025::01:04:45.395081 ===
Sending 32 DATA chunk(s) 64 bytes each...
=INFO REPORT==== 11-Apr-2025::01:04:45.395394 ===
Done sending DATA chunk(s)
=INFO REPORT==== 11-Apr-2025::01:04:49.673529 ===
Got all DATA chunks, we're done
real 0m5.012s
user 0m0.668s
sys 0m0.200s
As can be seen, looping 32 DATA chunks through the echo server took ~4 seconds.
Default SNDBUF
/RCVBUF
sizes and the Nagle disabled (-N
)
$ ./sctp_perf.escript -N server
=INFO REPORT==== 11-Apr-2025::02:15:02.210524 ===
SCTP server sock options: [{recbuf,2304},{sndbuf,131072},{sctp_nodelay,true}]
=NOTICE REPORT==== 11-Apr-2025::02:15:02.212208 ===
SCTP server listening on 127.0.0.1:4242
$ time ./sctp_perf.escript -N client -n 32 -l 64
=INFO REPORT==== 11-Apr-2025::02:17:32.564221 ===
SCTP client sock options: [{recbuf,2304},{sndbuf,131072},{sctp_nodelay,true}]
=NOTICE REPORT==== 11-Apr-2025::02:17:32.566128 ===
SCTP client connecting to 127.0.0.1:4242
=INFO REPORT==== 11-Apr-2025::02:17:32.566916 ===
CONN(aid=41044, {127,0,0,1}:4242) state: comm_up
=INFO REPORT==== 11-Apr-2025::02:17:32.566956 ===
Sending 32 DATA chunk(s) 64 bytes each...
=INFO REPORT==== 11-Apr-2025::02:17:32.567571 ===
Done sending DATA chunk(s)
=INFO REPORT==== 11-Apr-2025::02:17:36.882904 ===
Got all DATA chunks, we're done
=NOTICE REPORT==== 11-Apr-2025::02:17:36.882998 ===
sctp_cli_loop() took 4316.099 ms
real 0m5.067s
user 0m0.698s
sys 0m0.189s
Disabling Nagle does not help to achieve better performance because (AFAIU) the SCTP stack may still batch multiple DATA chunks into a single packet.
Equal SNDBUF
/RCVBUF
sizes (65536) and the Nagle disabled (-N
)
$ ./sctp_perf.escript -NR 65536 server
=INFO REPORT==== 11-Apr-2025::02:18:43.184410 ===
SCTP server sock options: [{recbuf,131072},
{sndbuf,131072},
{sctp_nodelay,true}]
=NOTICE REPORT==== 11-Apr-2025::02:18:43.186433 ===
SCTP server listening on 127.0.0.1:4242
$ time ./sctp_perf.escript -NR 65536 client -n 32 -l 64
=INFO REPORT==== 11-Apr-2025::02:19:18.871551 ===
SCTP client sock options: [{recbuf,131072},
{sndbuf,131072},
{sctp_nodelay,true}]
=NOTICE REPORT==== 11-Apr-2025::02:19:18.873420 ===
SCTP client connecting to 127.0.0.1:4242
=INFO REPORT==== 11-Apr-2025::02:19:18.874380 ===
CONN(aid=41046, {127,0,0,1}:4242) state: comm_up
=INFO REPORT==== 11-Apr-2025::02:19:18.874444 ===
Sending 32 DATA chunk(s) 64 bytes each...
=INFO REPORT==== 11-Apr-2025::02:19:18.875087 ===
Done sending DATA chunk(s)
=INFO REPORT==== 11-Apr-2025::02:19:18.875329 ===
Got all DATA chunks, we're done
=NOTICE REPORT==== 11-Apr-2025::02:19:18.875383 ===
sctp_cli_loop() took 1.011 ms
real 0m0.756s
user 0m0.672s
sys 0m0.220s
The results are looking much better. For the sake of curiosity, below is a test with the Nagle enabled.
Equal SNDBUF
/RCVBUF
sizes (65536) and the Nagle enabled
$ ./sctp_perf.escript -R 65536 server
=INFO REPORT==== 11-Apr-2025::02:27:38.847841 ===
SCTP server sock options: [{recbuf,131072},
{sndbuf,131072},
{sctp_nodelay,false}]
=NOTICE REPORT==== 11-Apr-2025::02:27:38.849536 ===
SCTP server listening on 127.0.0.1:4242
$ time ./sctp_perf.escript -R 65536 client -n 32 -l 64
=INFO REPORT==== 11-Apr-2025::02:27:58.744257 ===
SCTP client sock options: [{recbuf,131072},
{sndbuf,131072},
{sctp_nodelay,false}]
=NOTICE REPORT==== 11-Apr-2025::02:27:58.746185 ===
SCTP client connecting to 127.0.0.1:4242
=INFO REPORT==== 11-Apr-2025::02:27:58.747066 ===
CONN(aid=41052, {127,0,0,1}:4242) state: comm_up
=INFO REPORT==== 11-Apr-2025::02:27:58.747115 ===
Sending 32 DATA chunk(s) 64 bytes each...
=INFO REPORT==== 11-Apr-2025::02:27:58.747511 ===
Done sending DATA chunk(s)
=INFO REPORT==== 11-Apr-2025::02:27:58.949836 ===
Got all DATA chunks, we're done
=NOTICE REPORT==== 11-Apr-2025::02:27:58.949967 ===
sctp_cli_loop() took 202.924 ms
real 0m0.965s
user 0m0.704s
sys 0m0.198s
This is slightly worse, but still significantly quicker than with the default RCVBUF
size (1024).
Expected behavior
Ideally, I would expect Erlang/OTP to respect the OS default values for SNDBUF
/RCVBUF
sizes. If for whatever reason this is not possible (maybe there is a reason to set those explicitly?), would it be possible to increase Erlang's defaults in #sctp_opts.opts
?
Affected versions
- Erlang/OTP 27 [erts-15.2.4] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit:ns]
- Linux 6.12.23-1-lts