Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security: Zebra should stop gossiping failure times as last_seen times #1868

Closed
3 tasks done
teor2345 opened this issue Mar 9, 2021 · 1 comment · Fixed by #2273
Closed
3 tasks done

Security: Zebra should stop gossiping failure times as last_seen times #1868

teor2345 opened this issue Mar 9, 2021 · 1 comment · Fixed by #2273
Assignees
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug C-security Category: Security issues I-invalid-data Zebra relies on invalid or untrusted data, or sends invalid data I-remote-node-overload Zebra can overload other nodes on the network

Comments

@teor2345
Copy link
Contributor

teor2345 commented Mar 9, 2021

Motivation

Zebra updates its internal last_seen time when it fails to connect to other nodes. Then it sends that time on to other nodes who request peer addresses. But the last_seen time should only be updated on a successful connection to a peer.

Solution

This fix depends on #1849.

Zebra should use an accurate time in gossiped MetaAddrs.

Zebra should set the last_seen time in gossiped MetaAddrs to:

Alternatives

This is a critical security issue, so we must do something.

This seems like the simplest design that will fix this issue. #1867 already limits how long we'll send untrusted_last_seen to other peers.

Context

zcashd is not vulnerable to this issue - it ignores last_seen times from peers.

@teor2345 teor2345 added C-bug Category: This is a bug A-rust Area: Updates to Rust code S-needs-triage Status: A bug report needs triage P-High C-security Category: Security issues I-invalid-data Zebra relies on invalid or untrusted data, or sends invalid data labels Mar 9, 2021
@teor2345 teor2345 changed the title Zebra should stop gossiping failiure times as last_seen times Zebra should stop gossiping failure times as last_seen times Mar 9, 2021
@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Mar 10, 2021
@teor2345 teor2345 added I-remote-node-overload Zebra can overload other nodes on the network P-Critical and removed P-High labels Mar 16, 2021
@teor2345
Copy link
Contributor Author

teor2345 commented May 7, 2021

Moving this to sprint 10, so Zebra can clear out the bad addresses created by the bug in #2120.

@teor2345 teor2345 changed the title Zebra should stop gossiping failure times as last_seen times Security: Zebra should stop gossiping failure times as last_seen times May 7, 2021
teor2345 added a commit that referenced this issue May 21, 2021
Track multiple last used times for each peer:
- Add separate untrusted_last_seen, attempt, success, and failed time fields (#1868, #1876, #1848)
- Add the new fields to the peer states, so they only appear in states where they are valid
- Insert initial DNS seeder peers in the AddressBook in the correct states

Create a new MetaAddrChange type for AddressBook changes:
- Ignore invalid state changes
    - Ignore updates to the untrusted last seen time (but update the services field)
    - If we get a gossiped or alternate change for a seed peer, use the last seen and services info
    - Once a peer has responded, don't go back to the NeverResponded... states
- Update the address book metrics

- Optimise getting the next connection address from the address book
teor2345 added a commit that referenced this issue May 21, 2021
Track multiple last used times for each peer:
- Add separate untrusted_last_seen, attempt, success, and failed time fields (#1868, #1876, #1848)
- Add the new fields to the peer states, so they only appear in states where they are valid
- Insert initial DNS seeder peers in the AddressBook in the correct states

Create a new MetaAddrChange type for AddressBook changes:
- Ignore invalid state changes
    - Ignore updates to the untrusted last seen time (but update the services field)
    - If we get a gossiped or alternate change for a seed peer, use the last seen and services info
    - Once a peer has responded, don't go back to the NeverResponded... states
- Update the address book metrics

- Optimise getting the next connection address from the address book
teor2345 added a commit that referenced this issue May 21, 2021
Track multiple last used times for each peer:
- Add separate untrusted_last_seen, attempt, success, and failed time fields (#1868, #1876, #1848)
- Add the new fields to the peer states, so they only appear in states where they are valid
- Insert initial seed peers in the AddressBook in the correct states

Create a new MetaAddrChange type for AddressBook changes:
- Ignore invalid state changes
    - Ignore updates to the untrusted last seen time (but update the services field)
    - If we get a gossiped or alternate change for a seed peer, use the last seen and services info
    - Once a peer has responded, don't go back to the NeverResponded... states
- Update the address book metrics

- Optimise getting the next connection address from the address book
teor2345 added a commit that referenced this issue May 21, 2021
Track multiple last used times for each peer:
- Add separate untrusted_last_seen, attempt, success, and failed time fields (#1868, #1876, #1848)
- Add the new fields to the peer states, so they only appear in states where they are valid
- Insert initial seed peers in the AddressBook in the correct states

Create a new MetaAddrChange type for AddressBook changes:
- Ignore invalid state changes
    - Ignore updates to the untrusted last seen time (but update the services field)
    - If we get a gossiped or alternate change for a seed peer, use the last seen and services info
    - Once a peer has responded, don't go back to the NeverResponded... states
- Update the address book metrics

- Optimise getting the next connection address from the address book
teor2345 added a commit that referenced this issue May 22, 2021
Track multiple last used times for each peer:
- Add separate untrusted_last_seen, attempt, success, and failed time fields (#1868, #1876, #1848)
- Add the new fields to the peer states, so they only appear in states where they are valid
- Insert initial seed peers in the AddressBook in the correct states

Create a new MetaAddrChange type for AddressBook changes:
- Ignore invalid state changes
    - Ignore updates to the untrusted last seen time (but update the services field)
    - If we get a gossiped or alternate change for a seed peer, use the last seen and services info
    - Once a peer has responded, don't go back to the NeverResponded... states
- Update the address book metrics

- Optimise getting the next connection address from the address book
teor2345 added a commit that referenced this issue May 25, 2021
This fix prevents hangs and deadlocks during initialization, particularly
when there are a small number of valid peers in the initial peer config
(or from the DNS seeders).

Security: Correctly handle the minimum peer connection interval

Previously, if we hadn't had a connection for a while, we'd allow a lot
of connections all at once, until we'd caught up.

Security: sleep MIN_PEER_CONNECTION_INTERVAL between initial handshakes

This prevents denial of service if the local network is constrained, and
the seeders return a large number of peers.

Only wait for ready handshakes

Drain all waiting handshakes when enough have succeeded

Refactor MetaAddr to enable security fixes

Track multiple last used times for each peer:
- Add separate untrusted_last_seen, attempt, success, and failed time fields (#1868, #1876, #1848)
- Add the new fields to the peer states, so they only appear in states where they are valid
- Insert initial seed peers in the AddressBook in the correct states

Create a new MetaAddrChange type for AddressBook changes:
- Ignore invalid state changes
    - Ignore updates to the untrusted last seen time (but update the services field)
    - If we get a gossiped or alternate change for a seed peer, use the last seen and services info
    - Once a peer has responded, don't go back to the NeverResponded... states
- Update the address book metrics

- Optimise getting the next connection address from the address book

Do an extra crawl for each handshake on startup

And whenever there aren't many recently live peers.

Remove duplicate initial crawl code

This change uses the candidate set for initial seed peers,
gossiped peers, and alternate peers.

It significantly reduces the complexity of the initialization code.
(By about 200 lines.)

Apply readiness timeout to each fanout

Also get the fanout limit from the number of recently live peers.

Launch each CandidateSet fanout in its own task

Spawn each `CandidateSet::update` in its own task

Move `CandidateSet::next` into the handshake task

Move all crawler awaits and threaded locks into spawned tasks

In this commit:
- Move sending PeerSet changes into a spawned task
- Move the locking in `CandidateSet::report_failed` into a spawned task

Increase the peer set buffer size for concurrent fanouts

Launch sync fanouts concurrently, with peer set readiness timeouts

Wait for seed peers before the first crawl

WIP: Add a timeout to crawl addr requests

This is a workaround for a zcashd response rate-limit.

Move AddressBook::lock() onto a blocking thread

Process all ready timestamp changes each time the task runs

Wait for the initial crawl before launching the syncer

Security: Limit unverified blocks to avoid memory DoS

Also document the security implications of changing these limits.

Drop early inbound requests to avoid load shedding during network setup

Stop closing connections when the inbound service is overloaded

SECURITY: Make buffer sizes dynamically depend on the config

This change significantly increases the inbound buffer size, increasing memory
denial of service risks. However, users can reduce the buffer size using
existing related config options.

These risks are documented under the relevant configs.

Treat `TryRecvError::Closed` in `Inbound::poll_ready` as a fatal error

Also:
- handle errors in service readiness the same as errors in requests

Closes #1655
teor2345 added a commit that referenced this issue May 25, 2021
Security: Spawn a separate task for each initial handshake

This fix prevents hangs and deadlocks during initialization, particularly
when there are a small number of valid peers in the initial peer config
(or from the DNS seeders).

Security: Correctly handle the minimum peer connection interval

Previously, if we hadn't had a connection for a while, we'd allow a lot
of connections all at once, until we'd caught up.

Security: sleep MIN_PEER_CONNECTION_INTERVAL between initial handshakes

This prevents denial of service if the local network is constrained, and
the seeders return a large number of peers.

Only wait for ready handshakes

Drain all waiting handshakes when enough have succeeded

Refactor MetaAddr to enable security fixes

Track multiple last used times for each peer:
- Add separate untrusted_last_seen, attempt, success, and failed time fields (#1868, #1876, #1848)
- Add the new fields to the peer states, so they only appear in states where they are valid
- Insert initial seed peers in the AddressBook in the correct states

Create a new MetaAddrChange type for AddressBook changes:
- Ignore invalid state changes
    - Ignore updates to the untrusted last seen time (but update the services field)
    - If we get a gossiped or alternate change for a seed peer, use the last seen and services info
    - Once a peer has responded, don't go back to the NeverResponded... states
- Update the address book metrics

- Optimise getting the next connection address from the address book

Do an extra crawl for each handshake on startup

And whenever there aren't many recently live peers.

Remove duplicate initial crawl code

This change uses the candidate set for initial seed peers,
gossiped peers, and alternate peers.

It significantly reduces the complexity of the initialization code.
(By about 200 lines.)

Apply readiness timeout to each fanout

Also get the fanout limit from the number of recently live peers.

Launch each CandidateSet fanout in its own task

Spawn each `CandidateSet::update` in its own task

Move `CandidateSet::next` into the handshake task

Move all crawler awaits and threaded locks into spawned tasks

In this commit:
- Move sending PeerSet changes into a spawned task
- Move the locking in `CandidateSet::report_failed` into a spawned task

Increase the peer set buffer size for concurrent fanouts

Launch sync fanouts concurrently, with peer set readiness timeouts

Wait for seed peers before the first crawl

WIP: Add a timeout to crawl addr requests

This is a workaround for a zcashd response rate-limit.

Move AddressBook::lock() onto a blocking thread

Process all ready timestamp changes each time the task runs

Wait for the initial crawl before launching the syncer

Security: Limit unverified blocks to avoid memory DoS

Also document the security implications of changing these limits.

Drop early inbound requests to avoid load shedding during network setup

Stop closing connections when the inbound service is overloaded

SECURITY: Make buffer sizes dynamically depend on the config

This change significantly increases the inbound buffer size, increasing memory
denial of service risks. However, users can reduce the buffer size using
existing related config options.

These risks are documented under the relevant configs.

Treat `TryRecvError::Closed` in `Inbound::poll_ready` as a fatal error

Also:
- handle errors in service readiness the same as errors in requests

Closes #1655
@teor2345 teor2345 self-assigned this May 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug C-security Category: Security issues I-invalid-data Zebra relies on invalid or untrusted data, or sends invalid data I-remote-node-overload Zebra can overload other nodes on the network
Projects
None yet
2 participants