Prototype using QUIC instead of tcp and proprietary handshakes #1385

evan-forbes · 2024-06-10T14:59:22Z

While not technically purpose built for p2p applications, QUIC streams could replace our existing mechanisms for multiplexing messages and offer significant benefits. Besides removing multiple round trips per the handshake, QUIC also offers the ability to avoid HOL and to make use of more widely used (potentially more battle tested and debugged) software.

AC

Create a prototype that uses QUIC instead of tcp and secret conn. Compare its high level performance (consensus throughput and peer drops) using our network tests and report them here.

evan-forbes · 2024-06-10T15:41:41Z

The only reason why we would want to prototype without using libp2p is to help debug the current issues in comet where we are unable to utilize all of the bandwidth with multiple nodes and latency.

tbc, the goal of this issue is not to write a proprietary p2p stack based on QUIC

rach-id · 2024-12-03T11:23:33Z

Investigation setup

We added support for QUIC streams in tendermint here #1466, using go-quic library.

Then, to benchmark the performance, we implemented a mock reactor that floods the network with raw data and traces the data sent/received. This reactor was enabled both for the QUIC refactor: https://github.com/celestiaorg/celestia-core/tree/quic-bench-reactor and also the native tm stack: https://github.com/celestiaorg/celestia-core/tree/updating-the-mock-reactor-updated.

Then, for better results, we disabled all the reactors except PEX so that the only data sent across the network is the mock reactor data.

After running the experiments, we noticed that the quic connections in the QUIC refactor hang after a while. So, to have more accurate results, we implemented a simple project https://github.com/rach-id/quic-bench that uses go-quic to create a QUIC network where each peer floods its peers with random data. This allowed benchmarking the QUIC performance without even the overhead added by tendermint, protobuf encoding/decoding, peers handling, etc.

To run the benchmarks, we used this branch of congest https://github.com/celestiaorg/congest/tree/tm-quic-benchmarks which creates a network of servers, provisions them, runs the network, then collects the logs.

The plots below were generated using the traces collected from the network using congest and processed using https://github.com/celestiaorg/traces_analysis.

The servers' setup:

16CPU 32GB RAM servers with ~5Gbps download speed and ~2.5Gbps upload (as measured using speedtest-cli)
52 peers distributed across the world. the plots below show the regions.
For the QUIC benchmarks, we increased the udp buffers in all the servers:

sudo sysctl -w net.core.rmem_max=16777216
sudo sysctl -w net.core.wmem_max=16777216
sudo sysctl -w net.core.rmem_default=8388608
sudo sysctl -w net.core.wmem_default=8388608
sudo sysctl -w net.ipv4.udp_mem="8388608 8388608 16777216"
sudo sysctl -w net.ipv4.udp_rmem_min=1638400
sudo sysctl -w net.ipv4.udp_wmem_min=1638400

For the tendermint native stack benchmarks, we used BBR + mptcp in all the servers:

# Load the BBR module
echo "Loading BBR module..."
modprobe tcp_bbr
# Verify if the BBR module is loaded
if lsmod | grep -q "tcp_bbr"; then
  echo "BBR module loaded successfully."
else
  echo "Failed to load BBR module."
  exit 1
fi
# Add BBR to the list of available congestion control algorithms
echo "Updating sysctl settings..."
sysctl -w net.core.default_qdisc=fq
sysctl -w net.ipv4.tcp_congestion_control=bbr

# Enable MPTCP
sysctl -w net.mptcp.enabled=1

# Set the path manager to ndiffports
sysctl -w net.mptcp.mptcp_path_manager=ndiffports

# Specify the number of subflows
SUBFLOWS=16
sysctl -w net.mptcp.mptcp_ndiffports=$SUBFLOWS
# Make the changes persistent across reboots
echo "Making changes persistent..."
echo "net.core.default_qdisc=fq" >> /etc/sysctl.conf
echo "net.ipv4.tcp_congestion_control=bbr" >> /etc/sysctl.conf

Also, we increase the send/receive rate to 100Gbps.

Note: the word validator and peer are used interchangeably here. Wherever a validator is mentioned, it doesn't refer to a tendermint validator but a p2p peer. All experiments were using the mock reactor or the quic bench tool. No actual consensus network was benchmarked in these results.

Findings

After running the different networks for a few minutes, we collected the logs and generated the following plots.