Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync service hangs, but inbound service continues to answer requests #1559

Closed
3 tasks
teor2345 opened this issue Jan 6, 2021 · 0 comments · Fixed by #1586 or #1589
Closed
3 tasks

Sync service hangs, but inbound service continues to answer requests #1559

teor2345 opened this issue Jan 6, 2021 · 0 comments · Fixed by #1586 or #1589
Assignees
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug
Milestone

Comments

@teor2345
Copy link
Contributor

teor2345 commented Jan 6, 2021

Version

zebrad 1.0.0-alpha.0 + PR #1531 (commit 13eb4a6)

Platform

Linux ... 5.4.83 #1-NixOS SMP Fri Dec 11 12:23:33 UTC 2020 x86_64 GNU/Linux

Description

Around block 419581, zebrad's sync service stops producing log output, but the inbound service continues to log requests.

I tried this:

Running zebrad start on mainnet with an empty state.

I expected to see this happen:

zebrad syncs to the tip.

Instead, this happened:

zebrad synced to block 419581, then the sync service stopped logging progress. But the inbound service continued to log download_and_verify and GetBlocks or GetHeaders requests.

There were no logs about low numbers of peer connections, so this bug is different to #1435.

Running zebrad start with a state near the tip appears to work fine, but the syncer doesn't download many blocks in that case.

Analysis

This hang happens because there's no timeout on the sync service's verifier.

The syncer has a lookahead limit. When it fills up with pending download and verify futures, the syncer waits for at least one of those futures to complete.

But since there's no timeout on the verifier, if all the pending blocks are awaiting a block we haven't downloaded yet, then the syncer hangs. (And we'll never download that block, because the syncer doesn't have any room left in its queue, and it will never clear the queue, because it's awaiting a completed verify task.)

Suggested Solution

  • Implement a timeout for the verifier. (This change will revert some previous changes, there's been a lot of churn in this area of the code.)
  • Put explicit comments on the download and verify timeouts saying "if you remove this timeout, the syncer will eventually hang".
  • Make sure the inbound service has a download timeout and a verify timeout, with similar comments.

Commands

zebrad start

Related Issues

This bug is like #1435, but #1435 has a low number of connections. This bug happened even when zebrad had a large number of connections.

Logs

Jan 06 19:57:07.463  INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: waiting for pending blocks tips.len=1 in_flight=2372 lookahead_limit=2000
Jan 06 19:57:07.467  INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: extending tips tips.len=1 in_flight=2000 lookahead_limit=2000                                                    
Jan 06 19:57:08.714  INFO {zebrad="dc5e9502"}:sync:extend_tips:checkpoint: zebra_consensus::checkpoint: verified checkpoint range block_count=400 current_range=(Excluded(Height(419181)), Included(Height(419581)))                                                                                                                                                                          Jan 06 19:57:18.536  INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: waiting for pending blocks tips.len=1 in_flight=2372 lookahead_limit=2000                                        
Jan 06 19:57:18.541  INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: extending tips tips.len=1 in_flight=2000 lookahead_limit=2000
Jan 06 19:57:24.705  INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: waiting for pending blocks tips.len=1 in_flight=2372 lookahead_limit=2000
Jan 06 19:57:24.709  INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: extending tips tips.len=1 in_flight=2000 lookahead_limit=2000                                                    
Jan 06 19:57:31.132  INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: waiting for pending blocks tips.len=1 in_flight=2372 lookahead_limit=2000
Jan 06 19:57:31.137  INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: extending tips tips.len=1 in_flight=2000 lookahead_limit=2000
Jan 06 19:57:34.397  INFO {zebrad="dc5e9502"}:sync:extend_tips: zebra_state::service: pruned utxo requests old_len=15797 new_len=15761 prune_count=36 tip=Some((Height(419581), block::Hash("000000000003de545eedfd47b6ae10327fa3b317a6d1eb66a29c3bd6029a243d")))                                                                                                                             Jan 06 19:57:37.166  INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: waiting for pending blocks tips.len=1 in_flight=2447 lookahead_limit=2000
Jan 06 19:59:07.985  INFO {zebrad="dc5e9502"}:peer{addr=13.231.145.68:8233}:run:msg_as_req{msg=inv}:inbound:download_and_verify{hash=00000000010e1e85fbed2d821b18dda40d23e8d190029e6e63d52c45327e6818}: zebra_state::service: pruned utxo requests old_len=20 new_len=15 prune_count=5 tip=Some((Height(419581), block::Hash("000000000003de545eedfd47b6ae10327fa3b317a6d1eb66a29c3bd6029a243d")))                                                                           
...
@teor2345 teor2345 added C-bug Category: This is a bug S-needs-triage Status: A bug report needs triage labels Jan 6, 2021
@teor2345 teor2345 self-assigned this Jan 6, 2021
@teor2345 teor2345 added the A-rust Area: Updates to Rust code label Jan 6, 2021
@mpguerra mpguerra added this to the 2021 Sprint 1 milestone Jan 11, 2021
@mpguerra mpguerra linked a pull request Jan 14, 2021 that will close this issue
@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Feb 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug
Projects
None yet
2 participants