Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

peer: Implement stall detection. #518

Merged
merged 1 commit into from
Oct 23, 2015

Conversation

davecgh
Copy link
Member

@davecgh davecgh commented Oct 20, 2015

Requires #445

This is an alternative approach to #501 that is a more general solution and also fixes some concurrency-related issues with that PR as a side effect.

This pull request implements stall detection logic at the peer level to detect and disconnect peers that are either not following the protocol in regards to expected response messages or have otherwise stalled. This
is accomplished by setting deadlines for each message type which expects a response and periodically checking them while properly taking into account processing time.

There are an increasing number of nodes on the network which claim to be full nodes, but don't actually properly implement the entire p2p protocol even though they implement it enough to cause properly implemented nodes to make data requests to which they never respond.

Since btcd currently only syncs new blocks via single sync peer and, prior to this commit only had very basic stall detection, this could lead to a situation where the block download became stalled indefinitely due to one of these misbehaving peers. This commit fixes that issue since the stalled peer will now be detected and disconnected which leads to a new sync peer being selected.

This logic will also fit nicely with the future multi-peer sync model which is on the roadmap and for which infrastructure work is underway.

Fixes #486 and #229 .

Review on Reviewable

@davecgh
Copy link
Member Author

davecgh commented Oct 21, 2015

Updated to improve log message on stalled peer and made it debug level.

@davecgh davecgh force-pushed the stalldetection branch 4 times, most recently from 92965ce to cbbe3a8 Compare October 23, 2015 15:27
This commit implements stall detection logic at the peer level to detect
and disconnect peers that are either not following the protocol in
regards to expected response messages or have otherwise stalled.  This
is accomplished by setting deadlines for each message type which expects
a response and periodically checking them while properly taking into
account processing time.

There are an increasing number of nodes on the network which claim to be
full nodes, but don't actually properly implement the entire p2p
protocol even though they implement it enough to cause properly
implemented nodes to make data requests to which they never respond.

Since btcd currently only syncs new blocks via single sync peer and,
prior to this commit only had very basic stall detection, this could
lead to a situation where the block download became stalled indefinitely
due to one of these misbehaving peers.  This commit fixes that issue
since the stalled peer will now be detected and disconnected which leads
to a new sync peer being selected.

This logic will also fit nicely with the future multi-peer sync model
which is on the roadmap and for which infrastructure work is underway.

Fixes btcsuite#486 and fixes btcsuite#229.
@toddfries
Copy link
Contributor

I used to have 'orphan block only' symptoms multiple times per day, but definately every 12-24hrs at the minimum... with this pull request, my btcd has not skipped a beat since. Why is this not merged yet? ;-) ;-) ;-)

@dajohi
Copy link
Member

dajohi commented Oct 23, 2015

No issues in hours with dozens of hosts. OK

@jrick
Copy link
Member

jrick commented Oct 23, 2015

Reads like a hack but solves the immediate problem. Although it's not a fool proof solution (deadlines can be overwritten for example) it will be updated when multipeer syncing is added. ok

@conformal-deploy conformal-deploy merged commit cbbe3a8 into btcsuite:master Oct 23, 2015
@davecgh davecgh deleted the stalldetection branch October 23, 2015 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BlockManager stops syncing during initial sync
5 participants