-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype using QUIC instead of tcp and proprietary handshakes #1385
Comments
The only reason why we would want to prototype without using libp2p is to help debug the current issues in comet where we are unable to utilize all of the bandwidth with multiple nodes and latency. tbc, the goal of this issue is not to write a proprietary p2p stack based on QUIC |
Investigation setupWe added support for QUIC streams in tendermint here #1466, using go-quic library. Then, to benchmark the performance, we implemented a mock reactor that floods the network with raw data and traces the data sent/received. This reactor was enabled both for the QUIC refactor: https://github.com/celestiaorg/celestia-core/tree/quic-bench-reactor and also the native tm stack: https://github.com/celestiaorg/celestia-core/tree/updating-the-mock-reactor-updated. Then, for better results, we disabled all the reactors except PEX so that the only data sent across the network is the mock reactor data. After running the experiments, we noticed that the quic connections in the QUIC refactor hang after a while. So, to have more accurate results, we implemented a simple project https://github.com/rach-id/quic-bench that uses go-quic to create a QUIC network where each peer floods its peers with random data. This allowed benchmarking the QUIC performance without even the overhead added by tendermint, protobuf encoding/decoding, peers handling, etc. To run the benchmarks, we used this branch of congest https://github.com/celestiaorg/congest/tree/tm-quic-benchmarks which creates a network of servers, provisions them, runs the network, then collects the logs. The plots below were generated using the traces collected from the network using congest and processed using https://github.com/celestiaorg/traces_analysis. The servers' setup:
Also, we increase the send/receive rate to 100Gbps. Note: the word validator and peer are used interchangeably here. Wherever a validator is mentioned, it doesn't refer to a tendermint validator but a p2p peer. All experiments were using the mock reactor or the quic bench tool. No actual consensus network was benchmarked in these results. FindingsAfter running the different networks for a few minutes, we collected the logs and generated the following plots. Native tendermint stack + BBR + mptcpNative tendermint stack + CUBICNative tendermint stack + RENOTendermint stack QUIC refactor + large UDP buffersQUIC benchmark tool + large UDP buffers + 4 streams per peerQUIC benchmark tool + large UDP buffers + 32 streams per peerQUIC benchmark tool + large UDP buffers + 1 stream per peerInsightsWe can see in the plots that the QUIC benchmark tool yields better results than the Tendermint QUIC refactor. So we will use it as a reference for the QUIC performance instead of the QUIC refactor. The reason being is the QUIC refactor was still a WIP and thus the performance can still be improved. However, the QUIC benchmarking tool uses out of the box go-quic without any changes. This makes it a good reference for performance, as that's the best we can. So, if we compare the QUIC benchmarking tool performance with the native tendermint stack + BBR, we see that in both the worst cases, where we have servers in far places in the world, sometimes in a different continent, and the best cases, where the servers are in the same site, the performance of the native tendermint stack + BBR is better, the bandwidth is better utilized. Additional insights
ConclusionsGiven the above numbers, it seems more reasonable to keep the existing tendermint stack and pair it with BBR to have the best possible performance. Then, we will need to focus on improving the message passing mechanisms to be able to fully utilize the bandwidth. This will be tracked in this issue, #1531. We can consider switching to QUIC at some point if one of the following shows promising:
|
I feel comfortable closing this issue for now |
More insights: comparing different data sizes and streams in QUIC vs TCP + BBR +mptcpThe setups are the same as above. QUIC500 bytes data in 4 streams5mb data in 4 streamsNote: when running the experiment, the CPU usage increases with the number of streams we're opening. opening a new stream every minuteWe see that after opening ~20-40 stream, QUIC keeps performing the same way. if we open <20 streams, the performance is best. This could also be due to the data being sent is increased with every stream. TCP + BBR + mptcp500 bytesApparently, the smaller the data being sent, the more performance we get from the comet's p2p stack. 5mbThere is a memory leak apparently somewhere that makes the nodes OOM after a short while. We will try to fix it in #1548 |
While not technically purpose built for p2p applications, QUIC streams could replace our existing mechanisms for multiplexing messages and offer significant benefits. Besides removing multiple round trips per the handshake, QUIC also offers the ability to avoid HOL and to make use of more widely used (potentially more battle tested and debugged) software.
AC
Create a prototype that uses QUIC instead of tcp and secret conn. Compare its high level performance (consensus throughput and peer drops) using our network tests and report them here.
The text was updated successfully, but these errors were encountered: