Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Empty) block time increases to 2s in a 60 validator network #1525

Closed
DrZoltanFazekas opened this issue Sep 30, 2024 · 5 comments · Fixed by #1813
Closed

(Empty) block time increases to 2s in a 60 validator network #1525

DrZoltanFazekas opened this issue Sep 30, 2024 · 5 comments · Fixed by #1813
Assignees
Labels
Agate Required for mainnet launch

Comments

@DrZoltanFazekas
Copy link
Contributor

The block interval of creating empty blocks increased from 1s average to 2s average in the perftest, a network with

  • 30 validators in asia-southeast1
  • 15 validators in europe-west1
  • 15 validators in europe-west2

The network was stopped to save costs, but @frankmeds can restart it any time someone is ready to investigate this issue.

@DrZoltanFazekas DrZoltanFazekas added the Aventurine Required for proto-mainnet launch label Sep 30, 2024
@DrZoltanFazekas DrZoltanFazekas changed the title Block time increased to 2s in 60 validator network Block time increases to 2s in a 60 validator network Oct 1, 2024
@shawn-zil shawn-zil self-assigned this Oct 2, 2024
@shawn-zil
Copy link
Contributor

shawn-zil commented Oct 7, 2024

Some quick measurements.

iperf3 between europe-west2 and europe-west1:

[  5]   0.00-10.04  sec  1.11 GBytes   948 Mbits/sec                  receiver

                                        "max_rtt":      10002,
                                        "min_rtt":      9609,
                                        "mean_rtt":     9786,

iperf3 between asia-southeast1 and europe-west1 :

[  5]   0.00-10.17  sec   183 MBytes   151 Mbits/sec                  receiver

                                        "max_rtt":      163871,
                                        "min_rtt":      162603,
                                        "mean_rtt":     163059,

iper3 between asia-southeast1 and europe-west2:

[  5]   0.00-10.17  sec   177 MBytes   146 Mbits/sec                  receiver

                                        "max_rtt":      172105,
                                        "min_rtt":      168172,
                                        "mean_rtt":     169405,

@shawn-zil
Copy link
Contributor

shawn-zil commented Oct 7, 2024

The way that gossipsub works is that it transmits the message to at least mesh_n nodes; and the default configuration for mesh_n is 6 nodes.

Given that the validators are distributed with 1:1 ratio between europe* and asia*; and selected randomly without bias; we can assume that gossipsub would normally select 3-nodes in asia* and 3-nodes in europe* for broadcast.

To broadcast a Proposal, the network transmission time could take up to 3 * ~168ms + 3 * ~10ms or ~534ms; assuming messages are sent sequentially.

Given the size of the network is 60 nodes; to ensure that a Proposal reaches enough nodes to form a quorum requires at least 2 perfect (no overlapping nodes) hops i.e. 6 + 6*6 = 42. It is more likely that it would take 3-hops (with some overlap).

If we take the most ideal case, i.e. the messages are transmitted concurrently, and it only requires 2 perfect hops, it could still require ~504ms for the original Proposal and it's Votes to reach the next Leader. In the worst case, i.e. the messages are transmitted sequentially and it requires 3 hops, it could take up to ~1.68s for the same.

If we take the average time between the two, still takes > 1s for network communications. Therefore, it is unsurprising that block times can take 2s for empty blocks.

@shawn-zil
Copy link
Contributor

A positive sign that UDP has a chance to perform well is that it can handle higher bandwidths e.g. from europe* to asia* without loss. However, I was not able to measure the RTT effectively using simple benchmark tools.

[  5]   0.00-10.21  sec   358 MBytes   294 Mbits/sec  0.009 ms  0/266315 (0%)  receiver

@DrZoltanFazekas DrZoltanFazekas added Agate Required for mainnet launch and removed Aventurine Required for proto-mainnet launch labels Oct 23, 2024
@DrZoltanFazekas
Copy link
Contributor Author

Great summary, thank you Shawn. Are the messages transmitted concurrently or sequentially? If concurrently, would increasing the number of peers from 6 to e.g. 10 help ensure that we end up with only 2 hops in the common case without loosing time waiting for the sequential transmission of 4 additional messages?

@shawn-zil
Copy link
Contributor

Great summary, thank you Shawn. Are the messages transmitted concurrently or sequentially? If concurrently, would increasing the number of peers from 6 to e.g. 10 help ensure that we end up with only 2 hops in the common case without loosing time waiting for the sequential transmission of 4 additional messages?

I suspect that if I dig into the code, I will likely find a loop somewhere that iterates thru the peers and transmits the message.

It is possible to configure libp2p to transmit to N peers, but will still randomly select the N peers. There is also a chance that this may worsen things. But I would I would like to try things out on the network - deploy different scenarios and collect data to better characterise things.

@DrZoltanFazekas DrZoltanFazekas changed the title Block time increases to 2s in a 60 validator network (Empty) block time increases to 2s in a 60 validator network Nov 5, 2024
@DrZoltanFazekas DrZoltanFazekas self-assigned this Nov 6, 2024
@shawn-zil shawn-zil linked a pull request Nov 14, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Agate Required for mainnet launch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants