Investigate transport layer optimization #12302

saketh-are · 2024-10-24T03:27:58Z

Description

We are interested in increasing the state witness size limit to improve throughput of the protocol. The state witness is produced by some particular node (chunk producer) and needs to be distributed to N~=50 other nodes (chunk validators) in a sub-second timeframe. Distribution occurs via a 2-hop strategy; the witness is broken into N parts, each of which is sent to one recipient which is responsible for forwarding it to all other recipients.

The current worst case witness size is 8 MB. A loose upper bound on what can be sent assuming Gigabit connections, a full second to distribute the witness, and a 2 hop path would be 62.5 MB. We should understand how close to that we can raise the witness size limit without degrading the performance of the protocol.

Avenues for Exploration

There are some promising avenues which may yield significant improvements without the need for complex implementation:

Modifying system tunables to optimize our TCP connections, particularly the maximum buffer sizes for auto-tuning specified in net.ipv4.tcp_wmem/net.ipv4.tcp_rmem and the choice of congestion control algorithm.
Leverage the fact that the chunk producer knows in advance that it will need to transmit a state witness, and send some dummy traffic to warm up the connection's congestion window.

Other ideas which could be valuable but significantly more involved include:

Breaking the parts further into sub-parts and utilizing multiple parallel connections to transmit them.
Having multiple nodes produce and distribute the state witness.
Consider potential benefits of switching to QUIC.

Experimental Setup

We should evaluate the potential approaches in a realistic setting. On a first level, for ease of iteration we can simply provision two GCP compute instances in different regions and test performance of connections between them using simple command line tools (e.g. iperf). We should:

Include pairs of regions which cover the range of latencies observed in the real world, e.g. 40ms, 100ms, 250ms.
Test payloads of size 8 MB (current worst case witness size) and above.
Vary system tunables and observe the impact on performance.
Get a sense of how much benefit we can obtain from "warming up" connections.

From there we can proceed to reproducing the same performance in neard-to-neard tests, as well as end-to-end tests invoking the 2-hop distribution strategy.

The text was updated successfully, but these errors were encountered:

akhi3030 · 2024-10-24T13:47:30Z

Link to outline document: https://docs.nearone.org/doc/tuning-the-transport-layer-draft-OE6px3GuVn

saketh-are added the A-network Area: Network label Oct 24, 2024

saketh-are assigned stedfn Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate transport layer optimization #12302

Investigate transport layer optimization #12302

saketh-are commented Oct 24, 2024 •

edited

Loading

akhi3030 commented Oct 24, 2024

Investigate transport layer optimization #12302

Investigate transport layer optimization #12302

Comments

saketh-are commented Oct 24, 2024 • edited Loading

Description

akhi3030 commented Oct 24, 2024

saketh-are commented Oct 24, 2024 •

edited

Loading