-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix candidate set address state handling #1709
Fix candidate set address state handling #1709
Conversation
b80c7d5
to
4144363
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
76fe01d
to
40ae4e5
Compare
@dconnolly @yaahc I've made the required design changes from the first review. I'd like to get this reviewed and merged soon, so we can get it into the next alpha. There's a bunch of other refactors, performance improvements, and security fixes we could do. But they're not required to fix the hangs. So I think we should open separate tickets for them, and schedule them in future sprints. |
I've rebased on #1750 to silence a bunch of clippy lints. |
daaafcb
to
cebbe3f
Compare
The CI failure is #1730. |
Design: - Add a `PeerConnectionState` to each `MetaAddr` - Use a single peer set for all peers, regardless of state - Implement time-based liveness as an `AddressBook` method, rather than a `PeerConnectionState` variant - Delete AddressBook.by_state Implementation: - Simplify `AddressBook` changes using `update` and `take` modifier methods - Simplify the `AddressBook` iterator implementation, replacing it with methods that are more obviously correct - Consistently collect peer set metrics Documentation: - Expand and update the peer set documentation We can optimise later, but for now we want simple code that is more obviously correct.
cebbe3f
to
f563a8f
Compare
Testnet is unstable, we'll fix that in #1222 |
Merging because @dconnolly approved and asked for a rebase |
Motivation
The
CandidateSet
address books:disconnected
,gossiped
, andfailed
, are not disjoint. But the module documentation says they should be.The overall peer state handling is also a bit broken and inconsistent - it looks like peers can get stuck in particular states (perhaps related to #1633 or #1435).
Solution
Design:
AddressBook
by deleting duplicate indexes, rather than maintaining complex runtime invariantsPeerConnectionState
to eachMetaAddr
AddressBook
method, rather than aPeerConnectionState
variantImplementation:
AddressBook
iterator implementation, replacing it with methods that are more obviously correctAddressBook
changes usingupdate
andtake
modifier methodsDocumentation:
We can optimise later, but for now we want simple code that is more obviously correct.
The code in this pull request has:
Review
I don't know if @yaahc or @dconnolly is more familiar with this code, but it's a big enough change that everyone should probably take a look.
This rewrite might solve a whole bunch of our network problems, so it's a high priority. But the review is not urgent.
Related Issues
Closes #1707.
Closes #1435, according to our testing - Zebra hasn't hung over several weeks of cumulative node runtime.
Follow Up Work
Fix the
Client
state handling bug (#1599)Performance/Security:
Security:
target peer size/3
Network health:
Responded
peersCorrectness:
AssumedDead
andNeverAttempted
states. (Updates can only beAttemptPending
,AssumedAlive
, orFailed
.)NeverAttempted
state.Refactors:
AssumedDead
state, viaupdate
andtake
inner type methodsMetaAddr
state into a template parameter, which defaults to the outer state enumFeatures: