-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix premature disconnections from seeds #5057
Fix premature disconnections from seeds #5057
Conversation
Remove unnecessary setPeerType calls. ConnectionState is handling that. Only PeerManager does the setting of isSeedNode as we do not have the required dependency in ConnectionState.
Fix typo
At INITIAL_DATA_EXCHANGE we sort by lastInitialDataExchangeMessageTimeStamp
…alDataMsgTimeStamp
In the 3rd attempt we filter for INITIAL_DATA_EXCHANGE peers. Before we excluded 2 types and as PEER have been already filtered earlier we would look up for SEED_NODE. This was only called by non-seedNodes.
We handle the connections by INITIAL_DATA_EXCHANGE which cover the seed nodes as well. Do have an parallel routine is risky and make things more complex.
We handle it in ConnectionState by counting requests and responses and adding a timer
it does not do the getBlocksRequest.
Add ConnectionStatistics Print statistics of all live connections periodically
Also track BundleOfEnvelopes
to be accessible to the statistics log.
Concept ACK. Described changes make sense. I've only scanned the code changes themselves, though, so not a real review. Thanks for turning this around so quickly, @chimp1984! |
Improve statistic logging
It reports "Wanted but not invoked:" but when debugging into it it is called. So seems to be some mock setup issue
Do check for closed socket after blocking read. Move throttle code after blocking read.
I deployed it to 3 seed nodes. Observing how they behave before extending to other seeds. I added new statistics logging per connection. That helps to analyse the peer management and see where are bottlenecks.
|
remove dev log
This can happen from the main shut down routine when a timeout gets triggered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
Will hold off with merging until test nodes have run for a bit
@chimp1984 Regarding the Codacy complaint. Do you want to keep the extra if statement to improve readability? |
Yes, I prefer to keep it as it is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
Fixes issues with disconnections while peer is still in the initital data request phase.
We have several Request/Response cycles at startup:
We set the PeerType to INITIAL_DATA_REQUEST but we had reset the PeerType to PEER once seed sent the GetDataResponse for the GetUpdatedDataRequest. This was too early as the PeerManager at connection management disconnects the low-prio connection (PEER) if there are too many connections.
This has led to the problem that the seed closed the connection before the peer can start the GetBlocksRequest.
Also the last GetDataResponse was at risk to not arrive as the connection might got closed too fast.
We changed that model now so that we count the requests and responses of this bootstrap phasse and if we reach the expected number we assume we are done and reset the PeerType to PEER. As we cannot rely on that we also start a timer to reset after 4 minutes.
When we exceed 20 connections at a normal peer or 34 at seed nodes (with maxConnections 20) we start to disconnect also peers with type INITIAL_DATA_EXCHANGE. We sort by oldest date when we sent or received a msg, so increase the chance that this peer is already done with bootstrapping.
We also removed the removeSuperfluousSeedNodes method which was a bit dangerous from the app side as it was protected only by the allowDisconnectSeedNodes flag which was handled by the GetBlockHandlers, but there was risk that we disconnect before we can request the StateHashes.
We have to take care that seed nodes do not get too weakly connected to healty nodes as they would miss then broadcast messages if all their connections would be occupied by bootstrapping nodes. I doubt that the changes here would have any impact on that behaviour but as its pretty complex we have to be careful with deployment.
I will test it on a small group of seed nodes first and add some more statistics to see which types of connections the seed has.