You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Linux ... 5.4.83 #1-NixOS SMP Fri Dec 11 12:23:33 UTC 2020 x86_64 GNU/Linux
Description
Around block 419581, zebrad's sync service stops producing log output, but the inbound service continues to log requests.
I tried this:
Running zebrad start on mainnet with an empty state.
I expected to see this happen:
zebrad syncs to the tip.
Instead, this happened:
zebrad synced to block 419581, then the sync service stopped logging progress. But the inbound service continued to log download_and_verify and GetBlocks or GetHeaders requests.
There were no logs about low numbers of peer connections, so this bug is different to #1435.
Running zebrad start with a state near the tip appears to work fine, but the syncer doesn't download many blocks in that case.
Analysis
This hang happens because there's no timeout on the sync service's verifier.
The syncer has a lookahead limit. When it fills up with pending download and verify futures, the syncer waits for at least one of those futures to complete.
But since there's no timeout on the verifier, if all the pending blocks are awaiting a block we haven't downloaded yet, then the syncer hangs. (And we'll never download that block, because the syncer doesn't have any room left in its queue, and it will never clear the queue, because it's awaiting a completed verify task.)
Suggested Solution
Implement a timeout for the verifier. (This change will revert some previous changes, there's been a lot of churn in this area of the code.)
Put explicit comments on the download and verify timeouts saying "if you remove this timeout, the syncer will eventually hang".
Make sure the inbound service has a download timeout and a verify timeout, with similar comments.
Commands
zebrad start
Related Issues
This bug is like #1435, but #1435 has a low number of connections. This bug happened even when zebrad had a large number of connections.
Logs
Jan 06 19:57:07.463 INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: waiting for pending blocks tips.len=1 in_flight=2372 lookahead_limit=2000
Jan 06 19:57:07.467 INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: extending tips tips.len=1 in_flight=2000 lookahead_limit=2000
Jan 06 19:57:08.714 INFO {zebrad="dc5e9502"}:sync:extend_tips:checkpoint: zebra_consensus::checkpoint: verified checkpoint range block_count=400 current_range=(Excluded(Height(419181)), Included(Height(419581))) Jan 06 19:57:18.536 INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: waiting for pending blocks tips.len=1 in_flight=2372 lookahead_limit=2000
Jan 06 19:57:18.541 INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: extending tips tips.len=1 in_flight=2000 lookahead_limit=2000
Jan 06 19:57:24.705 INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: waiting for pending blocks tips.len=1 in_flight=2372 lookahead_limit=2000
Jan 06 19:57:24.709 INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: extending tips tips.len=1 in_flight=2000 lookahead_limit=2000
Jan 06 19:57:31.132 INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: waiting for pending blocks tips.len=1 in_flight=2372 lookahead_limit=2000
Jan 06 19:57:31.137 INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: extending tips tips.len=1 in_flight=2000 lookahead_limit=2000
Jan 06 19:57:34.397 INFO {zebrad="dc5e9502"}:sync:extend_tips: zebra_state::service: pruned utxo requests old_len=15797 new_len=15761 prune_count=36 tip=Some((Height(419581), block::Hash("000000000003de545eedfd47b6ae10327fa3b317a6d1eb66a29c3bd6029a243d"))) Jan 06 19:57:37.166 INFO {zebrad="dc5e9502"}:sync: zebrad::components::sync: waiting for pending blocks tips.len=1 in_flight=2447 lookahead_limit=2000
Jan 06 19:59:07.985 INFO {zebrad="dc5e9502"}:peer{addr=13.231.145.68:8233}:run:msg_as_req{msg=inv}:inbound:download_and_verify{hash=00000000010e1e85fbed2d821b18dda40d23e8d190029e6e63d52c45327e6818}: zebra_state::service: pruned utxo requests old_len=20 new_len=15 prune_count=5 tip=Some((Height(419581), block::Hash("000000000003de545eedfd47b6ae10327fa3b317a6d1eb66a29c3bd6029a243d")))
...
The text was updated successfully, but these errors were encountered:
Version
zebrad 1.0.0-alpha.0
+ PR #1531 (commit 13eb4a6)Platform
Linux ... 5.4.83 #1-NixOS SMP Fri Dec 11 12:23:33 UTC 2020 x86_64 GNU/Linux
Description
Around block
419581
,zebrad
's sync service stops producing log output, but the inbound service continues to log requests.I tried this:
Running
zebrad start
on mainnet with an empty state.I expected to see this happen:
zebrad
syncs to the tip.Instead, this happened:
zebrad
synced to block419581
, then the sync service stopped logging progress. But the inbound service continued to logdownload_and_verify
andGetBlocks or GetHeaders
requests.There were no logs about low numbers of peer connections, so this bug is different to #1435.
Running
zebrad start
with a state near the tip appears to work fine, but the syncer doesn't download many blocks in that case.Analysis
This hang happens because there's no timeout on the sync service's verifier.
The syncer has a lookahead limit. When it fills up with pending download and verify futures, the syncer waits for at least one of those futures to complete.
But since there's no timeout on the verifier, if all the pending blocks are awaiting a block we haven't downloaded yet, then the syncer hangs. (And we'll never download that block, because the syncer doesn't have any room left in its queue, and it will never clear the queue, because it's awaiting a completed verify task.)
Suggested Solution
Commands
zebrad start
Related Issues
This bug is like #1435, but #1435 has a low number of connections. This bug happened even when
zebrad
had a large number of connections.Logs
The text was updated successfully, but these errors were encountered: