Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic: preselected index must be valid in peer_set #1183

Closed
teor2345 opened this issue Oct 19, 2020 · 6 comments · Fixed by #1366
Closed

Panic: preselected index must be valid in peer_set #1183

teor2345 opened this issue Oct 19, 2020 · 6 comments · Fixed by #1366
Assignees
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug

Comments

@teor2345
Copy link
Contributor

Version

zebrad main branch, 19 October 2020.

Platform

Linux ... 5.4.68 #1-NixOS SMP Sat Sep 26 16:03:16 UTC 2020 x86_64 GNU/Linux

Description

While syncing testnet blocks, zebrad panicked with a "preselected index must be valid" error in peer_set::set, inside the span zebrad::components::sync::obtain_tips.

I tried this:

Syncing testnet from genesis.

I expected to see this happen:

A successful sync.

Instead, this happened:

A tip exhaustion cycle, followed by a zebra-network panic.

Error

preselected index must be valid

Metadata

key value
version 3.0.0-alpha.0
location /home/dev/zebra-revhex/zebra-network/src/peer_set/set.rs:381:22

SpanTrace

SpanTrace:
   0: zebrad::components::sync::obtain_tips
             at zebrad/src/components/sync.rs:314
   1: zebrad::components::sync::sync
             at zebrad/src/components/sync.rs:169
@teor2345 teor2345 added C-bug Category: This is a bug A-rust Area: Updates to Rust code S-needs-triage Status: A bug report needs triage labels Oct 19, 2020
@teor2345 teor2345 added this to the First Alpha Release milestone Oct 19, 2020
@teor2345
Copy link
Contributor Author

@hdevalence I think you might understand the invariants in zebra-network?

@teor2345
Copy link
Contributor Author

@hdevalence do you think this ticket is fixed now?

@hdevalence
Copy link
Contributor

No, the error message indicates that there's an invariant being violated inside the peer set's selection logic. I gave it a cursory look when this ticket was filed and didn't see anything obvious, but I haven't seen the error recently, so I think we should deprioritize fixing it until later unless it comes up again.

@mpguerra
Copy link
Contributor

removing from Alpha release milestone based on the above, we can review this if it happens again

@mpguerra mpguerra removed this from the First Alpha Release milestone Nov 23, 2020
@teor2345 teor2345 added this to the First Alpha Release milestone Nov 23, 2020
@teor2345
Copy link
Contributor Author

I've just seen this error on the main branch again

@teor2345
Copy link
Contributor Author

The panic happened on a Testnet sync, but there's nothing stopping it from happening on Mainnet. (The sync and network code doesn't distinguish between Mainnet and Testnet.)

I suspect that we haven't seen this error recently because it's been hidden by other errors that we've just fixed.

Nov 23 21:12:22.110  INFO sync: zebrad::components::sync: waiting to restart sync timeout=45s                                                                                                  
Nov 23 21:13:07.111  INFO sync: zebrad::components::sync: starting sync, obtaining new tips                                                                                                    
Nov 23 21:13:07.111  INFO sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(551200) min_locator_height=551101 locators=[Height(551200), Height(551199), Height
(551198), Height(551196), Height(551192), Height(551184), Height(551168), Height(551136), Height(551101)]                                                                                      
Nov 23 21:13:23.363  INFO sync: zebrad::components::sync: extending tips tips.len=1 in_flight=140 lookahead_limit=2000                                                                         
Nov 23 21:13:24.783  INFO sync: zebrad::components::sync: exhausted prospective tip set                                                                                                        
Nov 23 21:13:24.783  INFO sync: zebrad::components::sync: waiting to restart sync timeout=45s                                                                                                  
Nov 23 21:14:09.784  INFO sync: zebrad::components::sync: starting sync, obtaining new tips                                                                                                    
Nov 23 21:14:09.784  INFO sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(551200) min_locator_height=551101 locators=[Height(551200), Height(551199), Height
(551198), Height(551196), Height(551192), Height(551184), Height(551168), Height(551136), Height(551101)]                                                                                      
Nov 23 21:14:37.679  INFO sync: zebrad::components::sync: extending tips tips.len=1 in_flight=295 lookahead_limit=2000                                                                         
The application panicked (crashed).                                                                                                                                                            
Message:  preselected index must be valid
Location: /home/dev/zebra/zebra-network/src/peer_set/set.rs:381
  
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SPANTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   
   0: zebrad::components::sync::downloads::download_and_verify with hash=block::Hash("000005b0cf28b945393efed7d194f1ef48c1451bcdf2798faf596312e00a2908")
      at zebrad/src/components/sync/downloads.rs:113
   1: zebrad::components::sync::extend_tips
      at zebrad/src/components/sync.rs:381
   2: zebrad::components::sync::sync
      at zebrad/src/components/sync.rs:150

@hdevalence hdevalence self-assigned this Nov 23, 2020
hdevalence added a commit that referenced this issue Nov 24, 2020
Closes #1183.

The peer set maintains a preselected ready service that it can use to
perform power-of-two-choices (p2c) routing of requests.  Ready services
are stored by key (socket address) in an `IndexMap`, and the preselected
service is represented by an `Option<usize>` indexing that map.  This
means that whenever the set of ready services changes (e.g., a service
is removed from the peer set, or a service is taken to be used to
process a request), the preselected index is invalidated.  The original
P2C-only implementation maintained this invariant but did not document
it.

The change to inventory-based routing introduced a bug by failing to
maintain this invariant and appropriately invalidate the preselected
index.  However, this was only noticeable approximately 1/N of the time
on the next request after an inventory-directed request, so the bug
occurred infrequently.  Luckily, the use of `.expect` caused the bug to
be an immediate panic, making it possible to identify by inspecting all
uses of the ready service map.
teor2345 pushed a commit that referenced this issue Nov 24, 2020
Closes #1183.

The peer set maintains a preselected ready service that it can use to
perform power-of-two-choices (p2c) routing of requests.  Ready services
are stored by key (socket address) in an `IndexMap`, and the preselected
service is represented by an `Option<usize>` indexing that map.  This
means that whenever the set of ready services changes (e.g., a service
is removed from the peer set, or a service is taken to be used to
process a request), the preselected index is invalidated.  The original
P2C-only implementation maintained this invariant but did not document
it.

The change to inventory-based routing introduced a bug by failing to
maintain this invariant and appropriately invalidate the preselected
index.  However, this was only noticeable approximately 1/N of the time
on the next request after an inventory-directed request, so the bug
occurred infrequently.  Luckily, the use of `.expect` caused the bug to
be an immediate panic, making it possible to identify by inspecting all
uses of the ready service map.
@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Feb 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants