-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Security: Limit reconnection rate to individual peers #2275
Conversation
I still need to write proptests to ensure:
|
ccdb6a8
to
2954ba3
Compare
Reconnection Rate Limit the reconnection rate to each individual peer by applying the liveness cutoff to the attempt, responded, and failure time fields. If any field is recent, the peer is skipped. The new liveness cutoff skips any peers that have recently been attempted or failed. (Previously, the liveness check was only applied if the peer was in the `Responded` state, which could lead to repeated retries of `Failed` peers, particularly in small address books.) Reconnection Order Zebra prefers more useful peer states, then the earliest attempted, failed, and responded times, then the most recent gossiped last seen times. Before this change, Zebra took the most recent time in all the peer time fields, and used that time for liveness and ordering. This led to confusion between trusted and untrusted data, and success and failure times. Unlike the previous order, the new order: - tries all peers in each state, before re-trying any peer in that state, and - only checks the the gossiped untrusted last seen time if all other times are equal.
2954ba3
to
318a495
Compare
@@ -155,19 +185,22 @@ impl AddressBook { | |||
); | |||
|
|||
if let Some(updated) = updated { | |||
// If a node that we are directly connected to has changed to a client, | |||
// remove it from the address book. | |||
if updated.is_direct_client() && previous.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't remove peers that are recently live.
If we get that removed peer as a gossiped or alternate address, we'll reconnect to it within the liveness interval. (The proptests discovered this bug.)
(But it's ok to ignore specific addresses or peers that were never attempted, because there's no risk of reconnecting to them.)
// Prioritise older attempt times, so we try all peers in each state, | ||
// before re-trying any of them. This avoids repeatedly reconnecting to | ||
// peers that aren't working. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a core part of the security fix: try older peers first, to reduce rapid reconnections to the same peer.
/// Is this address ready for a new outbound connection attempt? | ||
pub fn is_ready_for_attempt(&self) -> bool { | ||
self.last_known_info_is_valid_for_outbound() | ||
&& !self.was_recently_live() | ||
&& !self.was_recently_attempted() | ||
&& !self.was_recently_failed() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a core part of the security fix: skip peers that have recently been attempted, responded, or failed.
|
||
fn arbitrary_with(_args: Self::Parameters) -> Self::Strategy { | ||
any::<u64>() | ||
.prop_map(PeerServices::from_bits_truncate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, derive(Arbitrary)
was putting any u64
value in these bits, which caused spurious errors.
/// themselves. It detects bugs in [`MetaAddr`]s, even if there are | ||
/// compensating bugs in the [`CandidateSet`] or [`AddressBook`]. | ||
// | ||
// TODO: write a similar test using the AddressBook and CandidateSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra test in this TODO isn't a high priority, because outbound connection fairness itself isn't a high priority.
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
239fb70
to
82fee33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already looking good! Just a few ideas in case any of them are useful 👍
@jvff I've just changed the order of the constants, feel free to merge once the release is out. |
Motivation
Zebra's peer liveness check is only applied to peers in the
Responded
state. This can lead to repeated retries ofFailed
peers, particularly in small address books.Zebra takes the most recent time from all the peer time fields, and uses that time for its retry order. This makes Zebra retry some peers multiple times, before retrying other peers. (And in general, we don't want to confuse trusted and untrusted data, or success and failure times.)
Specifications
Unfortunately, there are no Zcash or Bitcoin specifications for peer reconnection rate-limits, or reconnection order.
Designs
Here is Zebra's current peer liveness timeout:
zebra/zebra-network/src/constants.rs
Lines 35 to 46 in 96a1b66
Solution
Reconnection Rate
Limit the reconnection rate to each individual peer by applying the liveness cutoff to the attempt, responded, and failure time fields. If any field is recent, the peer is skipped.
This new liveness cutoff skips any peers that have recently been connected, attempted or failed, regardless of their current state.
This change should close #1848.
Reconnection Order
Changes:
Unlike the previous order, the new order:
Review
@jvff can review this change.
This change is important, but it doesn't seem to be currently causing any issues on the network.
Reviewer Checklist
Related Work
This PR is based on #2273, it should automatically rebase on
main
once that PR merges.This PR is part of a series of MetaAddr refactors. After this PR merges, we can close #1849.