Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a timeout to DNS seeder requests to avoid hangs #1613

Closed
5 tasks
teor2345 opened this issue Jan 22, 2021 · 2 comments · Fixed by #1662
Closed
5 tasks

Add a timeout to DNS seeder requests to avoid hangs #1613

teor2345 opened this issue Jan 22, 2021 · 2 comments · Fixed by #1662
Assignees
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug
Milestone

Comments

@teor2345
Copy link
Contributor

teor2345 commented Jan 22, 2021

Is your feature request related to a problem? Please describe.

When zebrad launches, it asks some DNS seeders for a list of initial peers. But if DNS or the seeder is slow or down, zebrad can hang without any output.

We've observed these hangs in our CI, but we don't know if it's just one seeder that's having issues. If it is, this fix could make our CI more reliable.

Describe the solution you'd like

  • in parse_peers, execute each to_socket_addrs call in a separate spawn_blocking tokio task, wrap each task in a timeout, and wait on them completing
  • log a warning if any of the tasks time out
  • add a timeout argument to parse_peers and initial_peers
  • in add_initial_peers, log a warning if there are no initial peers
  • check if we still need ZEBRAD_SKIP_NETWORK_TESTS on Windows and Linux CI - this fix might help with their unreliable network behaviour

We don't panic on any of these errors, because they are recoverable - zebrad can discover peers from inbound connections, if its address is distributed by a DNS seeder (or manually).

Describe alternatives you've considered

Do nothing - zebrad will hang if there are DNS issues
Panic - these errors are recoverable, and "no initial peers" is a valid config

Additional context

We discovered this issue while troubleshooting bind hangs in #1535

@teor2345 teor2345 added C-bug Category: This is a bug A-rust Area: Updates to Rust code S-needs-triage Status: A bug report needs triage labels Jan 22, 2021
@teor2345 teor2345 added this to the 2021 Sprint 2 milestone Jan 22, 2021
@teor2345
Copy link
Contributor Author

Tentatively adding this task to Sprint 2, because:

  • we see startup hangs in CI on some platforms - this issue will probably fix them (and definitively give us better diagnostics)
  • the fix might allow us to run our full CI on more platforms, which is important, because we've discovered some platform-specific hangs recently
  • a hang at startup with no output is a very negative user experience

@teor2345 teor2345 self-assigned this Jan 22, 2021
@teor2345
Copy link
Contributor Author

I increased the estimate here, because we've added some possible acceptance tests changes.

@teor2345 teor2345 assigned yaahc and oxarbitrage and unassigned teor2345 Jan 25, 2021
@teor2345 teor2345 added P-High and removed S-needs-triage Status: A bug report needs triage labels Jan 28, 2021
teor2345 added a commit to teor2345/zebra that referenced this issue Feb 19, 2021
We disabled these tests pending ZcashFoundation#1613. But the comment incorrectly said
we were waiting for ZcashFoundation#1631.
teor2345 added a commit to teor2345/zebra that referenced this issue Feb 19, 2021
We disabled these tests pending ZcashFoundation#1613. But the comment incorrectly said
we were waiting for ZcashFoundation#1631.
teor2345 added a commit to teor2345/zebra that referenced this issue Feb 19, 2021
We disabled these tests pending ZcashFoundation#1613. But the comment incorrectly said
we were waiting for ZcashFoundation#1631.
teor2345 added a commit that referenced this issue Feb 19, 2021
We disabled these tests pending #1613. But the comment incorrectly said
we were waiting for #1631.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants