Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try a slightly longer handshake timeout #1883

Closed
wants to merge 1 commit into from
Closed

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Mar 11, 2021

Motivation

The large testnet acceptance tests are failing, because sometimes Zebra only gets 1-3 good peers on testnet.

Solution

Increase the timeout to allow slightly slower peers to connect.

Review

We need to test this change before it gets reviewed.

Related Issues

#1877 Restore large testnet tests

Follow Up Work

We could also:

  • increase the testnet acceptance test timeouts
  • decrease the number of blocks we sync on testnet
  • deploy more testnet peers
  • make the security fixes that will improve DNS seeder accuracy, and re-deploy the foundation DNS seeders
  • just disable the test and leave it disabled

Sometimes Zebra only gets 1-3 good peers on testnet.

Increase the timeout to allow slightly slower peers to connect.
@teor2345 teor2345 added C-bug Category: This is a bug A-rust Area: Updates to Rust code P-High I-integration-fail Continuous integration fails, including build and test failures labels Mar 11, 2021
@teor2345 teor2345 added this to the 2021 Sprint 5 milestone Mar 11, 2021
@teor2345 teor2345 self-assigned this Mar 11, 2021
@dconnolly dconnolly self-requested a review March 11, 2021 04:23
@dconnolly
Copy link
Contributor

Ah boo, still a transient failure

@teor2345
Copy link
Contributor Author

I ran the zebrad acceptance tests --including-ignored with this PR, in a loop on my local machine, until one of the tests failed.

For the first few test runs, I got:

0 successes before a failure due to a block download timeout (from a slow peer)
13 successes before a failure due to DNS seeders providing bad peers (0 good peers)
1 success before a failure due to DNS seeders providing bad peers (3 good peers)

That's about a 20% failure rate, compared to 7% before this change:
#1877 (comment)

So this change isn't going to help, let's try the fixes and DNS seeder redeployment. See #1791 for details.

@teor2345 teor2345 closed this Mar 12, 2021
@teor2345 teor2345 deleted the tune-peer-timeout branch March 21, 2022 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug I-integration-fail Continuous integration fails, including build and test failures
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants