You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are interested in increasing the state witness size limit to improve throughput of the protocol. The state witness is produced by some particular node (chunk producer) and needs to be distributed to N~=50 other nodes (chunk validators) in a sub-second timeframe. Distribution occurs via a 2-hop strategy; the witness is broken into N parts, each of which is sent to one recipient which is responsible for forwarding it to all other recipients.
The current worst case witness size is 8 MB. A loose upper bound on what can be sent assuming Gigabit connections, a full second to distribute the witness, and a 2 hop path would be 62.5 MB. We should understand how close to that we can raise the witness size limit without degrading the performance of the protocol.
Avenues for Exploration
There are some promising avenues which may yield significant improvements without the need for complex implementation:
Modifying system tunables to optimize our TCP connections, particularly the maximum buffer sizes for auto-tuning specified in net.ipv4.tcp_wmem/net.ipv4.tcp_rmem and the choice of congestion control algorithm.
Leverage the fact that the chunk producer knows in advance that it will need to transmit a state witness, and send some dummy traffic to warm up the connection's congestion window.
Other ideas which could be valuable but significantly more involved include:
Breaking the parts further into sub-parts and utilizing multiple parallel connections to transmit them.
Having multiple nodes produce and distribute the state witness.
Consider potential benefits of switching to QUIC.
Experimental Setup
We should evaluate the potential approaches in a realistic setting. On a first level, for ease of iteration we can simply provision two GCP compute instances in different regions and test performance of connections between them using simple command line tools (e.g. iperf). We should:
Include pairs of regions which cover the range of latencies observed in the real world, e.g. 40ms, 100ms, 250ms.
Test payloads of size 8 MB (current worst case witness size) and above.
Vary system tunables and observe the impact on performance.
Get a sense of how much benefit we can obtain from "warming up" connections.
From there we can proceed to reproducing the same performance in neard-to-neard tests, as well as end-to-end tests invoking the 2-hop distribution strategy.
The text was updated successfully, but these errors were encountered:
Description
We are interested in increasing the state witness size limit to improve throughput of the protocol. The state witness is produced by some particular node (chunk producer) and needs to be distributed to N~=50 other nodes (chunk validators) in a sub-second timeframe. Distribution occurs via a 2-hop strategy; the witness is broken into N parts, each of which is sent to one recipient which is responsible for forwarding it to all other recipients.
The current worst case witness size is 8 MB. A loose upper bound on what can be sent assuming Gigabit connections, a full second to distribute the witness, and a 2 hop path would be 62.5 MB. We should understand how close to that we can raise the witness size limit without degrading the performance of the protocol.
Avenues for Exploration
There are some promising avenues which may yield significant improvements without the need for complex implementation:
net.ipv4.tcp_wmem
/net.ipv4.tcp_rmem
and the choice of congestion control algorithm.Other ideas which could be valuable but significantly more involved include:
Experimental Setup
We should evaluate the potential approaches in a realistic setting. On a first level, for ease of iteration we can simply provision two GCP compute instances in different regions and test performance of connections between them using simple command line tools (e.g. iperf). We should:
From there we can proceed to reproducing the same performance in neard-to-neard tests, as well as end-to-end tests invoking the 2-hop distribution strategy.
The text was updated successfully, but these errors were encountered: