[stateless validation]: Improve network connection performance on burst traffic #68

saketh-are · 2024-04-17T17:47:57Z

When a chunk validator needs to distribute a state witness of a significant size, we face a situation in which an idle network connection may need to urgently transmit a significant amount of data. Testing shows that our existing network stack does not handle this situation well.

We explored and benchmarked multiple options for addressing this issue, including tuning existing TCP-based connections and switching to QUIC or UDP.

Based on the investigation, TCP tuning along with breaking state witness into parts should deliver the performance we need in the least intrusive manner. For our p2p TCP connections, we need to:

Set the appropriate socket options from nearcore where possible (SO_SNDBUF, SO_RCVBUF)
Convince node operators to set system-wide options as needed (rmem_max, wmem_max, tcp_rmem, tcp_wmem, tcp_slow_start_after_idle)
See how we can drive TIER1 adoption to completion so that communication can happen over direct connections

walnut-the-cat · 2024-04-18T18:23:37Z

@saketh-are 's note (link)

Summarizing the reasoning behind estimated state witness size we can support:

Suppose N = 50 chunk validators will participate in distributing the state witness
We will break the state witness into N parts of equal size, each of which will be sent to one CV to forward to all others
Erasure coding adds 50% overhead to part sizes
2-hop distribution strategy with the 50% redundancy performs acceptably in mainnet for chunk part delivery
With modifications shared above our P2P connections can handle >2 MB bursts of traffic on each connection and deliver them at ping latency. Assume for now the part size is 2 MB
50 participants * 2 MB parts * 2/3 = ~66 MB of data distributed at ping latency

Overall each participant uploads exactly the state witness size + 50%.

We need to differentiate committed rate vs burstable rate here.

Taking 8 MB state witness for example, each node will want to upload 12 MB as quickly as possible. At a 1 Gbps burst rate that's going to be 12 MB / 1 Gbps = ~0.09 s of added latency from the connection speed. 100 Mbps peak rate would be unusable (+1s latency).

I think what will happen is that configuring a low committed rate and a high burstable rate will make the most sense economically for chunk validators.

Worth noting that this bandwidth discussion is fundamentally a flow problem. The only overhead coming from our distribution strategy is the constant factor +50%. If we want to support slower connections we need to reduce the state witness size.

Most ISPs use a five-minute sampling and 95% usage when calculating usage. It allows usage to exceed a specified threshold for brief periods of time without the financial penalty of purchasing a higher committed information rate

The funny thing is assuming typical state witness size of 4 MB and a burst capacity of at least 1 Gbps, each chunk validator should spend 4 MB * 1.5 / (1.3 seconds) / (1 Gbps) = 3.69% which is less than 5% of their time uploading state witness. That means assuming a reasonable burstable rate the state witness is irrelevant when it comes to deciding the committed rate on the connection.

saketh-are · 2024-04-19T22:45:21Z

Current status: we have implemented the TCP configuration changes described in this issue in forknet, where we own the instances and can freely set the system tunables. There we are using SO_SNDBUF = SO_RCVBUF = 2000000.

After we also deploy #67 there, we should observe the impact on the performance of state witness distribution and tune the TCP buffer sizes appropriately. Although setting larger buffers improves the network layer's performance, it also increases the RAM usage of the node.

Once we decide the buffer sizes to configure, we will need to communicate with node operators the need to configure system tunables described in this issue. Since we are able to detect whether the node is configured appropriately from within neard, we can take that opportunity to print an error or warn containing details of how to set the tunables.

walnut-the-cat · 2024-04-25T17:01:42Z

@staffik shared analysis showing that TCP optimization improves performance in ForkNet with MainNet traffic: link Need to confirm if this is 'enough' improvement along with @shreyan-gupta 's work

saketh-are added the project-tracking label Apr 17, 2024

walnut-the-cat mentioned this issue Apr 17, 2024

[ProjectTracking]: Resolve network concerns for stateless validation #61

Open

walnut-the-cat assigned saketh-are Apr 17, 2024

walnut-the-cat mentioned this issue Apr 17, 2024

[ProjectTracking]: Stateless validation Mainnet Release #46

Open

52 tasks

walnut-the-cat closed this as completed May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stateless validation]: Improve network connection performance on burst traffic #68

[stateless validation]: Improve network connection performance on burst traffic #68

saketh-are commented Apr 17, 2024

walnut-the-cat commented Apr 18, 2024 •

edited

Loading

saketh-are commented Apr 19, 2024

walnut-the-cat commented Apr 25, 2024

[stateless validation]: Improve network connection performance on burst traffic #68

[stateless validation]: Improve network connection performance on burst traffic #68

Comments

saketh-are commented Apr 17, 2024

walnut-the-cat commented Apr 18, 2024 • edited Loading

saketh-are commented Apr 19, 2024

walnut-the-cat commented Apr 25, 2024

walnut-the-cat commented Apr 18, 2024 •

edited

Loading