Skip to content

gen_sctp: default receive buffer size (1024) is too low #9722

@fixeria

Description

@fixeria

Problem description

When opening an SCTP socket on Linux, Erlang/OTP does set its own default SO_SNDBUF/SO_RCVBUF values if the respective socket options (sndbuf and recbuf) are not given to gen_sctp:open/N explicitly:

$ sudo strace -e "fd=19" -f -p $(pidof sctp_perf.escript)
...
[pid 3233623] getsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], [4]) = 0
[pid 3233623] getsockopt(19, SOL_IP, IP_TOS, [0], [4]) = 0
[pid 3233623] setsockopt(19, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 3233623] setsockopt(19, SOL_IP, IP_TOS, [0], 4) = 0
[pid 3233623] setsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], 4) = 0
[pid 3233623] getsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], [4]) = 0
[pid 3233623] getsockopt(19, SOL_IP, IP_TOS, [0], [4]) = 0
[pid 3233623] setsockopt(19, SOL_SOCKET, SO_SNDBUF, [65536], 4) = 0 <----- SNDBUF
[pid 3233623] setsockopt(19, SOL_IP, IP_TOS, [0], 4) = 0
[pid 3233623] setsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], 4) = 0
[pid 3233623] getsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], [4]) = 0
[pid 3233623] getsockopt(19, SOL_IP, IP_TOS, [0], [4]) = 0
[pid 3233623] setsockopt(19, SOL_SOCKET, SO_RCVBUF, [1024], 4) = 0 <----- RCVBUF
[pid 3233623] setsockopt(19, SOL_IP, IP_TOS, [0], 4) = 0
[pid 3233623] setsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], 4) = 0
[pid 3233623] getsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], [4]) = 0
[pid 3233623] getsockopt(19, SOL_IP, IP_TOS, [0], [4]) = 0
[pid 3233623] setsockopt(19, SOL_SCTP, SCTP_EVENTS, "\1\1\1\1\1\1\1\0\0\0\0\0\0\0", 14) = 0
[pid 3233623] setsockopt(19, SOL_IP, IP_TOS, [0], 4) = 0
[pid 3233623] setsockopt(19, SOL_SOCKET, SO_PRIORITY, [0], 4) = 0
[pid 3233623] bind(19, {sa_family=AF_INET, sin_port=htons(4242), sin_addr=inet_addr("127.0.0.1")}, 16) = 0

The default SNDBUF and RCVBUF values are defined here:

https://github.com/erlang/otp/blame/master/lib/kernel/src/inet_int.hrl#L451

For some reason, the default RCVBUF size (1024) is much smaller than the default SNDBUF size (65536), and both are well below modern Linux defaults. Such a small RCVBUF size becomes problematic when the remote peer is sending large packets, which can easily happen thanks to the Nagle's algorithm. In such cases, data can be dropped or delayed unnecessarily.

To Reproduce

I developed a small SCTP performance testing utility:

https://gitea.osmocom.org/vyanitskiy/erlang-sctp-perf

Just like iperf, this tool includes both client and server modes. The server accepts incoming connections and simply echoes back any received data. The client connects to the server, sends N DATA chunks in a single burst, and waits to receive them all back. See the README.md for more details and usage examples.

Performance figures

Below are performance figures when running the test with different parameters. In all four test cases, the client was sending 32 DATA chunks (-n 32) 64 bytes long each (-l 64).

Default SNDBUF/RCVBUF sizes and the Nagle enabled

$ ./sctp_perf.escript server
=INFO REPORT==== 11-Apr-2025::00:50:53.295107 ===
SCTP server sock options: [{recbuf,2304},{sndbuf,131072},{sctp_nodelay,false}]
=NOTICE REPORT==== 11-Apr-2025::00:50:53.307781 ===
SCTP server listening on 127.0.0.1:4242

$ time ./sctp_perf.escript client -n 32 -l 64
=INFO REPORT==== 11-Apr-2025::01:04:45.393203 ===
SCTP client sock options: [{recbuf,2304},{sndbuf,131072},{sctp_nodelay,false}]
=NOTICE REPORT==== 11-Apr-2025::01:04:45.394976 ===
SCTP client connecting to 127.0.0.1:4242
=INFO REPORT==== 11-Apr-2025::01:04:45.395052 ===
CONN(aid=40988, {127,0,0,1}:4242) state: comm_up
=INFO REPORT==== 11-Apr-2025::01:04:45.395081 ===
Sending 32 DATA chunk(s) 64 bytes each...
=INFO REPORT==== 11-Apr-2025::01:04:45.395394 ===
Done sending DATA chunk(s)
=INFO REPORT==== 11-Apr-2025::01:04:49.673529 ===
Got all DATA chunks, we're done

real    0m5.012s
user    0m0.668s
sys     0m0.200s

As can be seen, looping 32 DATA chunks through the echo server took ~4 seconds.

Default SNDBUF/RCVBUF sizes and the Nagle disabled (-N)

$ ./sctp_perf.escript -N server
=INFO REPORT==== 11-Apr-2025::02:15:02.210524 ===
SCTP server sock options: [{recbuf,2304},{sndbuf,131072},{sctp_nodelay,true}]
=NOTICE REPORT==== 11-Apr-2025::02:15:02.212208 ===
SCTP server listening on 127.0.0.1:4242

$ time ./sctp_perf.escript -N client -n 32 -l 64
=INFO REPORT==== 11-Apr-2025::02:17:32.564221 ===
SCTP client sock options: [{recbuf,2304},{sndbuf,131072},{sctp_nodelay,true}]
=NOTICE REPORT==== 11-Apr-2025::02:17:32.566128 ===
SCTP client connecting to 127.0.0.1:4242
=INFO REPORT==== 11-Apr-2025::02:17:32.566916 ===
CONN(aid=41044, {127,0,0,1}:4242) state: comm_up
=INFO REPORT==== 11-Apr-2025::02:17:32.566956 ===
Sending 32 DATA chunk(s) 64 bytes each...
=INFO REPORT==== 11-Apr-2025::02:17:32.567571 ===
Done sending DATA chunk(s)
=INFO REPORT==== 11-Apr-2025::02:17:36.882904 ===
Got all DATA chunks, we're done
=NOTICE REPORT==== 11-Apr-2025::02:17:36.882998 ===
sctp_cli_loop() took 4316.099 ms

real    0m5.067s
user    0m0.698s
sys     0m0.189s

Disabling Nagle does not help to achieve better performance because (AFAIU) the SCTP stack may still batch multiple DATA chunks into a single packet.

Equal SNDBUF/RCVBUF sizes (65536) and the Nagle disabled (-N)

$ ./sctp_perf.escript -NR 65536 server
=INFO REPORT==== 11-Apr-2025::02:18:43.184410 ===
SCTP server sock options: [{recbuf,131072},
                           {sndbuf,131072},
                           {sctp_nodelay,true}]
=NOTICE REPORT==== 11-Apr-2025::02:18:43.186433 ===
SCTP server listening on 127.0.0.1:4242

$ time ./sctp_perf.escript -NR 65536 client -n 32 -l 64
=INFO REPORT==== 11-Apr-2025::02:19:18.871551 ===
SCTP client sock options: [{recbuf,131072},
                           {sndbuf,131072},
                           {sctp_nodelay,true}]
=NOTICE REPORT==== 11-Apr-2025::02:19:18.873420 ===
SCTP client connecting to 127.0.0.1:4242
=INFO REPORT==== 11-Apr-2025::02:19:18.874380 ===
CONN(aid=41046, {127,0,0,1}:4242) state: comm_up
=INFO REPORT==== 11-Apr-2025::02:19:18.874444 ===
Sending 32 DATA chunk(s) 64 bytes each...
=INFO REPORT==== 11-Apr-2025::02:19:18.875087 ===
Done sending DATA chunk(s)
=INFO REPORT==== 11-Apr-2025::02:19:18.875329 ===
Got all DATA chunks, we're done
=NOTICE REPORT==== 11-Apr-2025::02:19:18.875383 ===
sctp_cli_loop() took 1.011 ms

real    0m0.756s
user    0m0.672s
sys     0m0.220s

The results are looking much better. For the sake of curiosity, below is a test with the Nagle enabled.

Equal SNDBUF/RCVBUF sizes (65536) and the Nagle enabled

$ ./sctp_perf.escript -R 65536 server
=INFO REPORT==== 11-Apr-2025::02:27:38.847841 ===
SCTP server sock options: [{recbuf,131072},
                           {sndbuf,131072},
                           {sctp_nodelay,false}]
=NOTICE REPORT==== 11-Apr-2025::02:27:38.849536 ===
SCTP server listening on 127.0.0.1:4242

$ time ./sctp_perf.escript -R 65536 client -n 32 -l 64
=INFO REPORT==== 11-Apr-2025::02:27:58.744257 ===
SCTP client sock options: [{recbuf,131072},
                           {sndbuf,131072},
                           {sctp_nodelay,false}]
=NOTICE REPORT==== 11-Apr-2025::02:27:58.746185 ===
SCTP client connecting to 127.0.0.1:4242
=INFO REPORT==== 11-Apr-2025::02:27:58.747066 ===
CONN(aid=41052, {127,0,0,1}:4242) state: comm_up
=INFO REPORT==== 11-Apr-2025::02:27:58.747115 ===
Sending 32 DATA chunk(s) 64 bytes each...
=INFO REPORT==== 11-Apr-2025::02:27:58.747511 ===
Done sending DATA chunk(s)
=INFO REPORT==== 11-Apr-2025::02:27:58.949836 ===
Got all DATA chunks, we're done
=NOTICE REPORT==== 11-Apr-2025::02:27:58.949967 ===
sctp_cli_loop() took 202.924 ms

real    0m0.965s
user    0m0.704s
sys     0m0.198s

This is slightly worse, but still significantly quicker than with the default RCVBUF size (1024).

Expected behavior

Ideally, I would expect Erlang/OTP to respect the OS default values for SNDBUF/RCVBUF sizes. If for whatever reason this is not possible (maybe there is a reason to set those explicitly?), would it be possible to increase Erlang's defaults in #sctp_opts.opts?

Affected versions

Metadata

Metadata

Assignees

Labels

bugIssue is reported as a bugteam:PSAssigned to OTP team PS

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions