Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security: Limit reconnection rate to individual peers #2275

Merged
merged 22 commits into from
Jun 18, 2021

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Jun 10, 2021

Motivation

Zebra's peer liveness check is only applied to peers in the Responded state. This can lead to repeated retries of Failed peers, particularly in small address books.

Zebra takes the most recent time from all the peer time fields, and uses that time for its retry order. This makes Zebra retry some peers multiple times, before retrying other peers. (And in general, we don't want to confuse trusted and untrusted data, or success and failure times.)

Specifications

Unfortunately, there are no Zcash or Bitcoin specifications for peer reconnection rate-limits, or reconnection order.

Designs

Here is Zebra's current peer liveness timeout:

/// We expect to receive a message from a live peer at least once in this time duration.
///
/// This is the sum of:
/// - the interval between connection heartbeats
/// - the timeout of a possible pending (already-sent) request
/// - the timeout for a possible queued request
/// - the timeout for the heartbeat request itself
///
/// This avoids explicit synchronization, but relies on the peer
/// connector actually setting up channels and these heartbeats in a
/// specific manner that matches up with this math.
pub const LIVE_PEER_DURATION: Duration = Duration::from_secs(60 + 20 + 20 + 20);

Solution

Reconnection Rate

Limit the reconnection rate to each individual peer by applying the liveness cutoff to the attempt, responded, and failure time fields. If any field is recent, the peer is skipped.

This new liveness cutoff skips any peers that have recently been connected, attempted or failed, regardless of their current state.

This change should close #1848.

Reconnection Order

Changes:

  • make Zebra prefers peers in more useful states: responded, never attempted, failed, attempt pending
  • if the states are equal, prefer the earliest attempted time, then earliest failed, then earliest responded, then the most recent gossiped last seen time

Unlike the previous order, the new order:

  • tries all peers in each state, before re-trying any peer in that state, and
  • only checks the the gossiped untrusted last seen time if all other times are equal.

Review

@jvff can review this change.

This change is important, but it doesn't seem to be currently causing any issues on the network.

Reviewer Checklist

  • Code implements Specs and Designs
  • Tests for Expected Behaviour
  • Tests for Errors

Related Work

This PR is based on #2273, it should automatically rebase on main once that PR merges.

This PR is part of a series of MetaAddr refactors. After this PR merges, we can close #1849.

@teor2345 teor2345 added A-rust Area: Updates to Rust code P-Medium C-security Category: Security issues I-remote-node-overload Zebra can overload other nodes on the network A-network Area: Network protocol updates or fixes labels Jun 10, 2021
@teor2345 teor2345 added this to the 2021 Sprint 11 - Zcon2 milestone Jun 10, 2021
@teor2345 teor2345 requested a review from jvff June 10, 2021 05:16
@teor2345 teor2345 self-assigned this Jun 10, 2021
@teor2345
Copy link
Contributor Author

teor2345 commented Jun 10, 2021

I still need to write proptests to ensure:

  • regardless of the changes applied to a MetaAddr, it never gets tried more than once per LIVE_PEER_DURATION
    • we'll need to prefer later times to earlier times to make this property hold
  • all disconnected MetaAddrs in a particular state are retried once, before any are retried twice
    • there might be some exceptions to this property, the tests should show us what they are

teor2345 added 7 commits June 15, 2021 13:34
Reconnection Rate

Limit the reconnection rate to each individual peer by applying the
liveness cutoff to the attempt, responded, and failure time fields.
If any field is recent, the peer is skipped.

The new liveness cutoff skips any peers that have recently been attempted
or failed. (Previously, the liveness check was only applied if the peer
was in the `Responded` state, which could lead to repeated retries of
`Failed` peers, particularly in small address books.)

Reconnection Order

Zebra prefers more useful peer states, then the earliest attempted,
failed, and responded times, then the most recent gossiped last seen
times.

Before this change, Zebra took the most recent time in all the peer time
fields, and used that time for liveness and ordering. This led to
confusion between trusted and untrusted data, and success and failure
times.

Unlike the previous order, the new order:
- tries all peers in each state, before re-trying any peer in that state,
  and
- only checks the the gossiped untrusted last seen time
  if all other times are equal.
@teor2345 teor2345 force-pushed the limit-addr-reconnection-rate branch from 2954ba3 to 318a495 Compare June 15, 2021 03:34
@teor2345 teor2345 marked this pull request as draft June 15, 2021 03:39
@@ -155,19 +185,22 @@ impl AddressBook {
);

if let Some(updated) = updated {
// If a node that we are directly connected to has changed to a client,
// remove it from the address book.
if updated.is_direct_client() && previous.is_some() {
Copy link
Contributor Author

@teor2345 teor2345 Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't remove peers that are recently live.

If we get that removed peer as a gossiped or alternate address, we'll reconnect to it within the liveness interval. (The proptests discovered this bug.)

(But it's ok to ignore specific addresses or peers that were never attempted, because there's no risk of reconnecting to them.)

Comment on lines +748 to +750
// Prioritise older attempt times, so we try all peers in each state,
// before re-trying any of them. This avoids repeatedly reconnecting to
// peers that aren't working.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a core part of the security fix: try older peers first, to reduce rapid reconnections to the same peer.

Comment on lines +459 to +465
/// Is this address ready for a new outbound connection attempt?
pub fn is_ready_for_attempt(&self) -> bool {
self.last_known_info_is_valid_for_outbound()
&& !self.was_recently_live()
&& !self.was_recently_attempted()
&& !self.was_recently_failed()
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a core part of the security fix: skip peers that have recently been attempted, responded, or failed.


fn arbitrary_with(_args: Self::Parameters) -> Self::Strategy {
any::<u64>()
.prop_map(PeerServices::from_bits_truncate)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, derive(Arbitrary) was putting any u64 value in these bits, which caused spurious errors.

/// themselves. It detects bugs in [`MetaAddr`]s, even if there are
/// compensating bugs in the [`CandidateSet`] or [`AddressBook`].
//
// TODO: write a similar test using the AddressBook and CandidateSet
Copy link
Contributor Author

@teor2345 teor2345 Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra test in this TODO isn't a high priority, because outbound connection fairness itself isn't a high priority.

zebra-network/src/address_book.rs Outdated Show resolved Hide resolved
zebra-chain/src/serialization/date_time.rs Outdated Show resolved Hide resolved
zebra-network/src/meta_addr.rs Outdated Show resolved Hide resolved
zebra-network/src/meta_addr.rs Outdated Show resolved Hide resolved
zebra-network/src/meta_addr.rs Outdated Show resolved Hide resolved
zebra-network/src/meta_addr/tests/prop.rs Outdated Show resolved Hide resolved
teor2345 and others added 2 commits June 16, 2021 17:29
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
@teor2345 teor2345 force-pushed the limit-addr-reconnection-rate branch from 239fb70 to 82fee33 Compare June 16, 2021 09:22
@teor2345 teor2345 requested a review from jvff June 16, 2021 11:00
Copy link
Contributor

@jvff jvff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already looking good! Just a few ideas in case any of them are useful 👍

zebra-chain/src/serialization/date_time.rs Outdated Show resolved Hide resolved
zebra-chain/src/serialization/date_time.rs Show resolved Hide resolved
zebra-network/src/meta_addr.rs Show resolved Hide resolved
zebra-network/src/meta_addr/tests/prop.rs Outdated Show resolved Hide resolved
@teor2345
Copy link
Contributor Author

@jvff I've just changed the order of the constants, feel free to merge once the release is out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-network Area: Network protocol updates or fixes A-rust Area: Updates to Rust code C-security Category: Security issues I-remote-node-overload Zebra can overload other nodes on the network
Projects
None yet
3 participants