Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zebra hang - no sync progress #1398

Closed
1 task
teor2345 opened this issue Nov 26, 2020 · 9 comments · Fixed by #1425
Closed
1 task

Zebra hang - no sync progress #1398

teor2345 opened this issue Nov 26, 2020 · 9 comments · Fixed by #1425
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug

Comments

@teor2345
Copy link
Contributor

teor2345 commented Nov 26, 2020

Version

zebrad main branch, 2020-11-26.

Platform

Linux ... 5.4.75 #1-NixOS SMP Thu Nov 5 10:43:38 UTC 2020 x86_64 GNU/Linux

Possible Causes

Description

When I run Zebra on mainnet and testnet, it sometimes fails to make progress during its initial sync. This failure to make progress persists after a restart.

In some cases, the logs continually repeat tip exhaustion:

Nov 26 15:38:45.572  INFO sync: zebrad::components::sync: starting sync, obtaining new tips
Nov 26 15:38:45.572  INFO sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(502800) min_locator_height=502701 locators=[Height(502800), Height(502799), Height(502798), Height(502796), Height(502792), Height(502784), Height(502768), Height(502736), Height(502701)]
Nov 26 15:39:00.571  INFO sync: zebrad::components::sync: exhausted prospective tip set
Nov 26 15:39:00.571  INFO sync: zebrad::components::sync: waiting to restart sync timeout=45s
Nov 26 15:39:45.572  INFO sync: zebrad::components::sync: starting sync, obtaining new tips
Nov 26 15:39:45.572  INFO sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(502800) min_locator_height=502701 locators=[Height(502800), Height(502799), Height(502798), Height(502796), Height(502792), Height(502784), Height(502768), Height(502736), Height(502701)]
Nov 26 15:40:00.571  INFO sync: zebrad::components::sync: exhausted prospective tip set
Nov 26 15:40:00.571  INFO sync: zebrad::components::sync: waiting to restart sync timeout=45s
Nov 26 15:40:45.572  INFO sync: zebrad::components::sync: starting sync, obtaining new tips
Nov 26 15:40:45.572  INFO sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(502800) min_locator_height=502701 locators=[Height(502800), Height(502799), Height(502798), Height(502796), Height(502792), Height(502784), Height(502768), Height(502736), Height(502701)]
Nov 26 15:41:00.571  INFO sync: zebrad::components::sync: exhausted prospective tip set
Nov 26 15:41:00.571  INFO sync: zebrad::components::sync: waiting to restart sync timeout=45s
Nov 26 15:41:45.572  INFO sync: zebrad::components::sync: starting sync, obtaining new tips

In other cases, the logs show exhausted tips once, then hang after getting a block locator:

Nov 26 16:47:57.272  INFO listen{addr=0.0.0.0:38233}: zebra_network::peer_set::initialize: Opened Zcash protocol endpoint at 0.0.0.0:38233
Nov 26 16:47:57.629  INFO peer{addr=198.199.112.230:18233}:handle_message_as_request{msg=GetHeaders { known_blocks: zebra_chain::block::hash::Hash, len=32, stop: None }}:inbound:state: zebra_state::service: responding to peer GetBlocks or GetHeaders final_height=Height(131554) response_len=160 chain_tip_height=Height(502800) stop_height=None intersection_height=Some(Height(131394))
Nov 26 16:47:57.659  INFO peer{addr=209.141.47.197:18233}:handle_message_as_request{msg=GetHeaders { known_blocks: zebra_chain::block::hash::Hash, len=32, stop: None }}:inbound:state: zebra_state::service: responding to peer GetBlocks or GetHeaders final_height=Height(131554) response_len=160 chain_tip_height=Height(502800) stop_height=None intersection_height=Some(Height(131394))
...
Nov 26 16:48:05.878  INFO zebrad::commands::start: initializing syncer
Nov 26 16:48:05.879  INFO sync: zebrad::components::sync: starting sync, obtaining new tips
Nov 26 16:48:05.879  INFO sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(502800) min_locator_height=502701 locators=[Height(502800), Height(502799), Height(502798), Height(502796), Height(502792), Height(502784), Height(502768), Height(502736), Height(502701)]
Nov 26 16:48:07.277  INFO sync: zebrad::components::sync: exhausted prospective tip set
Nov 26 16:48:07.277  INFO sync: zebrad::components::sync: waiting to restart sync timeout=45s
Nov 26 16:48:52.278  INFO sync: zebrad::components::sync: starting sync, obtaining new tips
Nov 26 16:48:52.278  INFO sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(502800) min_locator_height=502701 locators=[Height(502800), Height(502799), Height(502798), Height(502796), Height(502792), Height(502784), Height(502768), Height(502736), Height(502701)]

And on mainnet, with a synced tip, zebra gave no info-level log output for 4 hours:

Nov 26 17:41:30.373  INFO peer{addr=[2001:41d0:700:2233::]:8233}:handle_message_as_request{msg=GetHeaders { known_blocks: zebra_chain::block::hash::Hash, len=32, stop: None }}:inbound:state: 
zebra_state::service: responding to peer GetBlocks or GetHeaders final_height=Height(1055216) response_len=37 chain_tip_height=Height(1055216) stop_height=None intersection_height=Some(Height
(1055179))                                                                                                                                                                                     
...                                                                                                                                                           
Nov 26 17:41:30.398  INFO zebrad::commands::start: initializing syncer                                                                                                                         
Nov 26 17:41:30.398  INFO sync: zebrad::components::sync: starting sync, obtaining new tips                                                                                                    
Nov 26 17:41:30.398  INFO sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(1055216) min_locator_height=1055117 locators=[Height(1055216), Height(1055215), He
ight(1055214), Height(1055212), Height(1055208), Height(1055200), Height(1055184), Height(1055152), Height(1055117)]                                                                           
Nov 26 17:41:30.557  INFO peer{addr=47.252.84.50:8233}:handle_message_as_request{msg=GetHeaders { known_blocks: zebra_chain::block::hash::Hash, len=32, stop: None }}:inbound:state: zebra_stat
e::service: responding to peer GetBlocks or GetHeaders final_height=Height(1055216) response_len=37 chain_tip_height=Height(1055216) stop_height=None intersection_height=Some(Height(1055179))
Nov 26 17:41:30.608  INFO peer{addr=47.107.242.15:8233}:handle_message_as_request{msg=GetHeaders { known_blocks: zebra_chain::block::hash::Hash, len=32, stop: None }}:inbound:state: zebra_sta
te::service: responding to peer GetBlocks or GetHeaders final_height=Height(1055216) response_len=37 chain_tip_height=Height(1055216) stop_height=None intersection_height=Some(Height(1055179)
)
...
Nov 26 17:41:31.920  INFO peer{addr=173.212.200.221:8233}:handle_message_as_request{msg=GetHeaders { known_blocks: zebra_chain::block::hash::Hash, len=32, stop: None }}:inbound:state: zebra_s
tate::service: responding to peer GetBlocks or GetHeaders final_height=Height(1055216) response_len=37 chain_tip_height=Height(1055216) stop_height=None intersection_height=Some(Height(105517
9))
Nov 26 17:41:34.357  INFO sync: zebrad::components::sync: exhausted prospective tip set
Nov 26 17:41:34.357  INFO sync: zebrad::components::sync: waiting to restart sync timeout=45s
Nov 26 17:42:19.358  INFO sync: zebrad::components::sync: starting sync, obtaining new tips
Nov 26 17:42:19.358  INFO sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(1055216) min_locator_height=1055117 locators=[Height(1055216), Height(1055215), Height(1055214), Height(1055212), Height(1055208), Height(1055200), Height(1055184), Height(1055152), Height(1055117)]

I also saw a hang on testnet waiting for pending blocks. This hang seems to happen due to a missing UTXO:

Nov 30 13:32:21.870  INFO peer{addr=3.112.190.48:18233}:handle_message_as_request{msg=Inv([Block(block::Hash("0015efbe9e13f4af0f434ccb955137b2f439c3c0d20d87e718ec691ee76c9e83"))])}:inbound:download_and_verify{hash=0015efbe9e13f4af0f434ccb955137b2f439c3c0d20d87e718ec691ee76c9e83}: zebra_state::service: pruned utxo requests old_len=6 new_len=0 prune_count=6 tip=Some((Height(356413), block::Hash("0003a9de9e0828185648851cc2aab8124d8c312284a20330465c66f95a525536")))
Nov 30 13:32:30.290  INFO sync: zebrad::components::sync: starting sync, obtaining new tips
Nov 30 13:32:30.290  INFO sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(356413) min_locator_height=356314 locators=[Height(356413), Height(356412), Height(356411), Height(356409), Height(356405), Height(356397), Height(356381), Height(356349), Height(356314)]
Nov 30 13:32:37.304  INFO sync: zebrad::components::sync: extending tips tips.len=1 in_flight=499 lookahead_limit=2000
Nov 30 13:32:49.859  INFO sync: zebrad::components::sync: extending tips tips.len=1 in_flight=993 lookahead_limit=2000
Nov 30 13:32:57.405  INFO sync: zebrad::components::sync: extending tips tips.len=1 in_flight=1491 lookahead_limit=2000
Nov 30 13:33:10.576  INFO sync: zebrad::components::sync: extending tips tips.len=1 in_flight=1989 lookahead_limit=2000
Nov 30 13:33:11.212  INFO sync: zebrad::components::sync: exhausted prospective tip set
Nov 30 13:33:11.212  INFO sync: zebrad::components::sync: waiting to restart sync timeout=45s
Nov 30 13:33:30.487  INFO peer{addr=159.69.217.179:18233}:handle_message_as_request{msg=Inv([Block(block::Hash("00139e448ad701a705b29c014d10182553c2748c25bd54124e4cf4c2444ae1e3"))])}:inbound:download_and_verify{hash=00139e448ad701a705b29c014d10182553c2748c25bd54124e4cf4c2444ae1e3}: zebra_state::service: pruned utxo requests old_len=14 new_len=0 prune_count=14 tip=Some((Height(356413), block::Hash("0003a9de9e0828185648851cc2aab8124d8c312284a20330465c66f95a525536")))
Nov 30 13:33:56.214  INFO sync: zebrad::components::sync: starting sync, obtaining new tips
Nov 30 13:33:56.214  INFO sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(356413) min_locator_height=356314 locators=[Height(356413), Height(356412), Height(356411), Height(356409), Height(356405), Height(356397), Height(356381), Height(356349), Height(356314)]
Nov 30 13:34:09.390  INFO sync: zebrad::components::sync: extending tips tips.len=1 in_flight=499 lookahead_limit=2000
Nov 30 13:34:17.902  INFO sync: zebrad::components::sync: extending tips tips.len=1 in_flight=995 lookahead_limit=2000
Nov 30 13:34:25.823  INFO sync: zebrad::components::sync: extending tips tips.len=1 in_flight=1493 lookahead_limit=2000
Nov 30 13:34:33.369  INFO sync: zebrad::components::sync: extending tips tips.len=1 in_flight=1991 lookahead_limit=2000
Nov 30 13:34:40.901  INFO sync: zebrad::components::sync: waiting for pending blocks tips.len=1 in_flight=2489 lookahead_limit=2000
Nov 30 13:44:03.155  WARN sync: zebrad::components::sync: error downloading and verifying block e=Block(Transaction(InternalDowncastError("downcast to redjubjub::Error failed, original error: Elapsed(())")))
@teor2345 teor2345 added C-bug Category: This is a bug A-rust Area: Updates to Rust code S-needs-triage Status: A bug report needs triage labels Nov 26, 2020
@teor2345
Copy link
Contributor Author

I've moved this bug to the first alpha release because it's persistent, and it can cause a hang without any output.

These symptoms suggest that it might not be a temporary network bug - so it could also affect mainnet.

@teor2345 teor2345 changed the title Zebra tip exhaustion loop - no progress on testnet Zebra hang - no sync progress Nov 26, 2020
@teor2345
Copy link
Contributor Author

I have now seen this bug on mainnet as well.

@teor2345 teor2345 removed the S-needs-triage Status: A bug report needs triage label Nov 26, 2020
@teor2345
Copy link
Contributor Author

teor2345 commented Nov 26, 2020

We think that this issue might be resolved by #1391, but we need to do further testing to confirm.

@teor2345
Copy link
Contributor Author

This issue isn't resolved by #1391, I am still seeing it on mainnet.

@mpguerra
Copy link
Contributor

When I run Zebra on mainnet and testnet, it sometimes fails to make progress during its initial sync. This failure to make progress persists after a restart.

You mention that a restart doesn't fix this issue, what's the workaround for this? Deleting any pre-existing synced blocks and starting again from the beginning?

@teor2345
Copy link
Contributor Author

teor2345 commented Dec 1, 2020

When I run Zebra on mainnet and testnet, it sometimes fails to make progress during its initial sync. This failure to make progress persists after a restart.

You mention that a restart doesn't fix this issue, what's the workaround for this? Deleting any pre-existing synced blocks and starting again from the beginning?

Deleting blocks and re-syncing does not reliably resolve this issue. It seems to be based on network conditions and potentially peer state.

@rex4539
Copy link
Contributor

rex4539 commented Dec 1, 2020

I can reproduce.

Dec 01 10:42:49.911  INFO sync: zebrad::components::sync: exhausted prospective tip set
Dec 01 10:42:49.911  INFO sync: zebrad::components::sync: waiting to restart sync timeout=45s
Dec 01 10:43:34.918  INFO sync: zebrad::components::sync: starting sync, obtaining new tips
Dec 01 10:43:34.919  INFO sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(234488) min_locator_height=234389 locators=[Height(234488), Height(234487), Height(234486), Height(234484), Height(234480), Height(234472), Height(234456), Height(234424), Height(234389)]

@rex4539
Copy link
Contributor

rex4539 commented Dec 1, 2020

Dec 01 22:23:22.042  INFO {zebrad="a91d0f0b"}:sync: zebrad::components::sync: exhausted prospective tip set
Dec 01 22:23:22.042  INFO {zebrad="a91d0f0b"}:sync: zebrad::components::sync: waiting to restart sync timeout=45s
Dec 01 22:24:07.043  INFO {zebrad="a91d0f0b"}:sync: zebrad::components::sync: starting sync, obtaining new tips
Dec 01 22:24:07.044  INFO {zebrad="a91d0f0b"}:sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(234798) min_locator_height=234699 locators=[Height(234798), Height(234797), Height(234796), Height(234794), Height(234790), Height(234782), Height(234766), Height(234734), Height(234699)]
Dec 01 22:24:15.753  WARN crawl_and_dial: zebra_network::peer_set::set: network request with no peer connections. Hint: check your network connection. Configure fast and reliable initial peers
Dec 01 22:25:30.755  WARN crawl_and_dial: zebra_network::peer_set::set: network request with no peer connections. Hint: check your network connection. Configure fast and reliable initial peers
Dec 01 22:26:45.755  WARN crawl_and_dial: zebra_network::peer_set::set: network request with no peer connections. Hint: check your network connection. Configure fast and reliable initial peers

@teor2345
Copy link
Contributor Author

teor2345 commented Dec 2, 2020

We think this issue is closed by #1425

If it isn't fixed after #1425 merges, we should open a separate issue with newer logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants