Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security: MetaAddr refactor, gossip time, connection order, reconnection rate fixes (see tickets) #2160

Closed
wants to merge 5 commits into from

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented May 17, 2021

Motivation

The current implementations of AddressBook and MetaAddr make it hard to do security fixes.

See the individual tickets #1849, #1868, #1876, and #1848.

Solution

Track multiple last used times for each peer:

Create a new MetaAddrChange type to make AddressBook changes:

  • Ignore invalid state changes
    • ignore updates to the untrusted last seen time (but update the services field)
    • if we get a gossiped or alternate change for a seed peer, use the last seen and services info
    • once a peer has responded, don't go back to the NeverResponded... states
  • Update the address book metrics

Optimise getting the next connection address from the address book

The code in this pull request has:

  • Documentation Comments
  • Property Tests, Integration Tests, and Manual Integration Testing

Review

@dconnolly or @jvff should review this PR.

This PR depends on PRs #2179 and #2182. It will need a manual rebase once they merge to main.

Related Issues

Closes #1849
Fixes #1868
Fixes #1876
Fixes #1848

I needed to change the buggy code for #1868, #1876 and #1848 as part of the refactor. So I replaced it with fixed code.

Follow Up

Do the other security fixes enabled by this fix.

@teor2345 teor2345 added C-bug Category: This is a bug A-rust Area: Updates to Rust code C-cleanup Category: This is a cleanup P-Medium C-security Category: Security issues I-remote-node-overload Zebra can overload other nodes on the network I-invalid-data Zebra relies on invalid or untrusted data, or sends invalid data A-network Area: Network protocol updates or fixes labels May 17, 2021
@teor2345 teor2345 requested a review from a team May 17, 2021 06:43
@teor2345 teor2345 self-assigned this May 17, 2021
@teor2345 teor2345 force-pushed the meta-addr-refactor branch 2 times, most recently from f377a6c to f6b3bf1 Compare May 17, 2021 07:21
@teor2345
Copy link
Contributor Author

teor2345 commented May 17, 2021

I'll leave this PR in draft until its dependencies merge, then rebase it onto main.

(Also I should test it locally for a few days.)

@teor2345 teor2345 changed the title Security: MetaAddr refactor, gossip time, connection order, and reconnection rate fixes Security: MetaAddr refactor, gossip time, connection order, and reconnection rate fixes (see tickets) May 17, 2021
@teor2345 teor2345 force-pushed the meta-addr-refactor branch from f6b3bf1 to f904146 Compare May 17, 2021 07:46
@teor2345 teor2345 force-pushed the meta-addr-refactor branch 5 times, most recently from b99bfd2 to 8f17081 Compare May 17, 2021 12:08
@teor2345 teor2345 force-pushed the meta-addr-refactor branch from dc12a2f to 25a8eea Compare May 17, 2021 23:47
@teor2345 teor2345 changed the title Security: MetaAddr refactor, gossip time, connection order, and reconnection rate fixes (see tickets) Security: MetaAddr refactor, gossip time, connection order, reconnection rate, and initial peer hang fixes (see tickets) May 18, 2021
@teor2345 teor2345 changed the title Security: MetaAddr refactor, gossip time, connection order, reconnection rate, and initial peer hang fixes (see tickets) Security: MetaAddr refactor, gossip time, connection order, reconnection rate, and initial peer hang (Credit: Equilibrium) fixes (see tickets) May 18, 2021
@teor2345 teor2345 force-pushed the meta-addr-refactor branch 5 times, most recently from 4971d65 to 3f519ff Compare May 24, 2021 07:44
@teor2345

This comment has been minimized.

jvff added a commit to jvff/zebra that referenced this pull request May 25, 2021
The method will be added by ZcashFoundation#2160.

Co-authored-by: teor <teor@riseup.net>
jvff added a commit to jvff/zebra that referenced this pull request May 25, 2021
The method will be added by ZcashFoundation#2160.

Co-authored-by: teor <teor@riseup.net>
jvff added a commit to jvff/zebra that referenced this pull request May 25, 2021
The method will be added by ZcashFoundation#2160.

Co-authored-by: teor <teor@riseup.net>
@teor2345 teor2345 force-pushed the meta-addr-refactor branch from 7077c4e to ae4a882 Compare May 25, 2021 02:57
@teor2345 teor2345 changed the base branch from meta-addr-refactor-base to main May 25, 2021 03:03
Security: Spawn a separate task for each initial handshake

This fix prevents hangs and deadlocks during initialization, particularly
when there are a small number of valid peers in the initial peer config
(or from the DNS seeders).

Security: Correctly handle the minimum peer connection interval

Previously, if we hadn't had a connection for a while, we'd allow a lot
of connections all at once, until we'd caught up.

Security: sleep MIN_PEER_CONNECTION_INTERVAL between initial handshakes

This prevents denial of service if the local network is constrained, and
the seeders return a large number of peers.

Only wait for ready handshakes

Drain all waiting handshakes when enough have succeeded

Refactor MetaAddr to enable security fixes

Track multiple last used times for each peer:
- Add separate untrusted_last_seen, attempt, success, and failed time fields (#1868, #1876, #1848)
- Add the new fields to the peer states, so they only appear in states where they are valid
- Insert initial seed peers in the AddressBook in the correct states

Create a new MetaAddrChange type for AddressBook changes:
- Ignore invalid state changes
    - Ignore updates to the untrusted last seen time (but update the services field)
    - If we get a gossiped or alternate change for a seed peer, use the last seen and services info
    - Once a peer has responded, don't go back to the NeverResponded... states
- Update the address book metrics

- Optimise getting the next connection address from the address book

Do an extra crawl for each handshake on startup

And whenever there aren't many recently live peers.

Remove duplicate initial crawl code

This change uses the candidate set for initial seed peers,
gossiped peers, and alternate peers.

It significantly reduces the complexity of the initialization code.
(By about 200 lines.)

Apply readiness timeout to each fanout

Also get the fanout limit from the number of recently live peers.

Launch each CandidateSet fanout in its own task

Spawn each `CandidateSet::update` in its own task

Move `CandidateSet::next` into the handshake task

Move all crawler awaits and threaded locks into spawned tasks

In this commit:
- Move sending PeerSet changes into a spawned task
- Move the locking in `CandidateSet::report_failed` into a spawned task

Increase the peer set buffer size for concurrent fanouts

Launch sync fanouts concurrently, with peer set readiness timeouts

Wait for seed peers before the first crawl

WIP: Add a timeout to crawl addr requests

This is a workaround for a zcashd response rate-limit.

Move AddressBook::lock() onto a blocking thread

Process all ready timestamp changes each time the task runs

Wait for the initial crawl before launching the syncer

Security: Limit unverified blocks to avoid memory DoS

Also document the security implications of changing these limits.

Drop early inbound requests to avoid load shedding during network setup

Stop closing connections when the inbound service is overloaded

SECURITY: Make buffer sizes dynamically depend on the config

This change significantly increases the inbound buffer size, increasing memory
denial of service risks. However, users can reduce the buffer size using
existing related config options.

These risks are documented under the relevant configs.

Treat `TryRecvError::Closed` in `Inbound::poll_ready` as a fatal error

Also:
- handle errors in service readiness the same as errors in requests

Closes #1655
@teor2345 teor2345 force-pushed the meta-addr-refactor branch from d078fdc to 230bc6c Compare May 25, 2021 03:08
@teor2345
Copy link
Contributor Author

We've split the high-priority fixes out of this PR into other PRs.

Most of the rest of the changes are obsoleted or high risk. Some have been turned into tickets.

@teor2345 teor2345 closed this Jun 24, 2021
@teor2345 teor2345 deleted the meta-addr-refactor branch March 21, 2022 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-network Area: Network protocol updates or fixes A-rust Area: Updates to Rust code C-bug Category: This is a bug C-cleanup Category: This is a cleanup C-security Category: Security issues I-invalid-data Zebra relies on invalid or untrusted data, or sends invalid data I-remote-node-overload Zebra can overload other nodes on the network
Projects
None yet
1 participant