Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate transport layer optimization #12302

Open
saketh-are opened this issue Oct 24, 2024 · 1 comment
Open

Investigate transport layer optimization #12302

saketh-are opened this issue Oct 24, 2024 · 1 comment
Assignees
Labels
A-network Area: Network

Comments

@saketh-are
Copy link
Collaborator

saketh-are commented Oct 24, 2024

Description

We are interested in increasing the state witness size limit to improve throughput of the protocol. The state witness is produced by some particular node (chunk producer) and needs to be distributed to N~=50 other nodes (chunk validators) in a sub-second timeframe. Distribution occurs via a 2-hop strategy; the witness is broken into N parts, each of which is sent to one recipient which is responsible for forwarding it to all other recipients.

The current worst case witness size is 8 MB. A loose upper bound on what can be sent assuming Gigabit connections, a full second to distribute the witness, and a 2 hop path would be 62.5 MB. We should understand how close to that we can raise the witness size limit without degrading the performance of the protocol.

Avenues for Exploration

There are some promising avenues which may yield significant improvements without the need for complex implementation:

  • Modifying system tunables to optimize our TCP connections, particularly the maximum buffer sizes for auto-tuning specified in net.ipv4.tcp_wmem/net.ipv4.tcp_rmem and the choice of congestion control algorithm.
  • Leverage the fact that the chunk producer knows in advance that it will need to transmit a state witness, and send some dummy traffic to warm up the connection's congestion window.

Other ideas which could be valuable but significantly more involved include:

  • Breaking the parts further into sub-parts and utilizing multiple parallel connections to transmit them.
  • Having multiple nodes produce and distribute the state witness.
  • Consider potential benefits of switching to QUIC.

Experimental Setup

We should evaluate the potential approaches in a realistic setting. On a first level, for ease of iteration we can simply provision two GCP compute instances in different regions and test performance of connections between them using simple command line tools (e.g. iperf). We should:

  • Include pairs of regions which cover the range of latencies observed in the real world, e.g. 40ms, 100ms, 250ms.
  • Test payloads of size 8 MB (current worst case witness size) and above.
  • Vary system tunables and observe the impact on performance.
  • Get a sense of how much benefit we can obtain from "warming up" connections.

From there we can proceed to reproducing the same performance in neard-to-neard tests, as well as end-to-end tests invoking the 2-hop distribution strategy.

@saketh-are saketh-are added the A-network Area: Network label Oct 24, 2024
@akhi3030
Copy link
Collaborator

Link to outline document: https://docs.nearone.org/doc/tuning-the-transport-layer-draft-OE6px3GuVn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-network Area: Network
Projects
None yet
Development

No branches or pull requests

3 participants