-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Security: Spawn a separate task for each initial handshake, Credit: Equilibrium #2181
Conversation
@dconnolly / @jvff these changes could be causing the macOS failures I'm seeing on #2160. So I've marked macOS stable as a required CI check. We can go back to just using Linux once #2160 is all split up and merged. |
Thinking about this some more, we might want to leave macOS enabled. Linux and Windows have the larger network tests turned off, because they were unreliable in CI. (After NU5, we should check if they're still unreliable - maybe all our fixes have made them better.) |
c1aa7ba
to
344d118
Compare
344d118
to
52850e1
Compare
This fix prevents hangs and deadlocks during initialization, particularly when there are a small number of valid peers in the initial peer config (or from the DNS seeders).
Previously, if we hadn't had a connection for a while, we'd allow a lot of connections all at once, until we'd caught up.
This prevents denial of service if the local network is constrained, and the seeders return a large number of peers.
52850e1
to
9328a88
Compare
// next time off our sleep time. Otherwise, use the current time. | ||
self.next_peer_sleep_until = max(self.next_peer_sleep_until, Instant::now()); | ||
self.next_peer_sleep_until += constants::MIN_PEER_CONNECTION_INTERVAL; | ||
sleep_until(current_deadline).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security: Correctly use MIN_PEER_CONNECTION_INTERVAL in CandidateSet
- Make sure the next deadline is at least
MIN_PEER_CONNECTION_INTERVAL
in the future - Replace the
Sleep
future with anInstant
// Each `FuturesUnordered` can hold one `Buffer` reservation for an | ||
// indefinite period. We can use `FuturesUnordered` without filling the | ||
// underlying `Handshake` buffer, because we immediately drive this single | ||
// `FuturesUnordered` to completion, and handshakes have a short timeout. | ||
let mut handshakes = FuturesUnordered::new(); | ||
// TODO: replace with *success_count_tx.borrow() in Tokio 1.6 | ||
let mut success_count = 0; | ||
for candidate in initial_meta_addr { | ||
// 1. Spawn a task to handshake with a new peer | ||
let next_peer_sleep_until = Instant::now() + constants::MIN_PEER_CONNECTION_INTERVAL; | ||
let hs_join = tokio::spawn(dial(candidate, outbound_connector.clone())) | ||
.map(move |res| match res { | ||
Ok(crawler_action) => crawler_action, | ||
Err(e) => { | ||
panic!( | ||
"panic during initial handshake with {:?}: {:?} ", | ||
candidate, e | ||
); | ||
} | ||
}) | ||
.instrument(Span::current()); | ||
handshakes.push(Box::pin(hs_join)); | ||
|
||
// 2. Check if any peers have finished their handshakes | ||
// If we've had enough successes, wait for all the handshakes to finish | ||
while let Some(handshake_action) = if success_count < peerset_initial_target_size { | ||
handshakes.next().now_or_never().flatten() | ||
} else { | ||
handshakes.next().await | ||
} { | ||
use CrawlerAction::*; | ||
match handshake_action { | ||
HandshakeConnected { peer_set_change } => { | ||
success_count += 1; | ||
if let Change::Insert(ref addr, _) = peer_set_change { | ||
debug!( | ||
?addr, | ||
?success_count, | ||
?peerset_initial_target_size, | ||
"successfully dialed initial peer" | ||
); | ||
} else { | ||
unreachable!("unexpected handshake result: all changes should be Insert"); | ||
} | ||
// the peer set is handled by an independent task, so this send | ||
// shouldn't hang | ||
peerset_tx.send(Ok(peer_set_change)).await?; | ||
} | ||
HandshakeFailed { failed_addr, error } => { | ||
if success_count <= constants::GET_ADDR_FANOUT { | ||
// this creates verbose logs, but it's better than just hanging on | ||
// startup with no output | ||
info!(addr = ?failed_addr.addr, | ||
?error, | ||
?success_count, | ||
?peerset_initial_target_size, | ||
"an initial peer connection failed"); | ||
} else { | ||
// switch to debug when we have enough peers | ||
debug!(addr = ?failed_addr.addr, | ||
?error, | ||
?success_count, | ||
?peerset_initial_target_size, | ||
"an initial peer connection failed"); | ||
} | ||
} | ||
DemandCrawl | DemandDrop | DemandHandshake { .. } | TimerCrawl { .. } => { | ||
unreachable!("unexpected CrawlerAction: should be handshake result") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security: Spawn a separate task for each initial handshake
// 3. If we haven't had enough successes, sleep for a while | ||
if success_count < peerset_initial_target_size { | ||
tokio::time::sleep_until(next_peer_sleep_until).await; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security: Limit the initial peer connection rate, part 2
let mut success_count = 0; | ||
for candidate in initial_meta_addr { | ||
// 1. Spawn a task to handshake with a new peer | ||
let next_peer_sleep_until = Instant::now() + constants::MIN_PEER_CONNECTION_INTERVAL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security: Limit the initial peer connection rate, part 1
while let Some(handshake_action) = if success_count < peerset_initial_target_size { | ||
handshakes.next().now_or_never().flatten() | ||
} else { | ||
handshakes.next().await | ||
} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code isn't pretty, but it's hard to refactor, because it has a lot of futures and channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this approved! 🎉
(I didn't mark it as approved to avoid auto-merging)
#2160 removes initial handshakes entirely, and just sends all the seed addresses to the candidate set. Then there is a bit of special handling for initial crawls in the crawler. So this PR isn't needed any more. |
Motivation
This fix prevents hangs and deadlocks during initialization, particularly when there are a small number of valid peers in the initial peer config (or from the DNS seeders).
It also cleans up some variable names and error handling, so it's easier to understand what the code is doing.
Reported by Equilibrium.
Solution
dial
functionMIN_PEER_CONNECTION_INTERVAL
inCandidateSet
Also:
HandshakeFailed
crawler actionThe code in this pull request has:
Review
I'd like @dconnolly to review this security fix. Or @jvff might like to take a look.
It's blocking the rest of #2160, so it would be great to have it reviewed before my Monday.
Related Issues
Part of #2163
Blocks #2160
Follow Up Work
Rest of #2160