-
Notifications
You must be signed in to change notification settings - Fork 823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peers all drops when a bad peer try to connect #3346
Comments
I will try to reproduce but in the meantime, could you post |
It makes sense. Also as Kusama (and attached parachains) is having issues at the moment, block production is unstable |
I think it would deserve a log INFO line when dropping all the peers as it is quite a drastic event. |
Using TRACE, I get:
|
When checkiong the timing:
It seems the peers were dropped after 30s which doesn't seem a lot for a parachain considering Kusama current issues.
|
You're right, the eviction event should've been logged as INFO or even WARN. I don't think there is an upside to increasing the delay because I will prepare a fix which will bring back the original version of eviction where it tracked the peers individually. This will result in a more unstable peer count but it will not cause all of the peers to get dropped at the same time. |
Previously `SyncingEngine` would use `last_notification_io` to track when it last sent or received a block announcement. If there had been no notification I/O for 30 seconds, `SyncingEngine` considered itself stalled and evicted all peers which gave `ProtocolController` a chance to establish new outbound connections. The problem with this approach was that if `SyncingEngine` actually got stalled, peer count would suddenly drop to zero after which it would slowly climb back to the desired amount of peers. Start tracking the notification I/O of peers individually and evict them as they're deemed idle which prevents the peer count from suddenly dropping to zero and instead makes `SyncingEngine` continuously look for more active peers. This might make the peer count more unstable on average if there are a lot of inactive peers in the network. Fixes #3346
Previously `SyncingEngine` would use `last_notification_io` to track when it last sent or received a block announcement. If there had been no notification I/O for 30 seconds, `SyncingEngine` considered itself stalled and evicted all peers which gave `ProtocolController` a chance to establish new outbound connections. The problem with this approach was that if `SyncingEngine` actually got stalled, peer count would suddenly drop to zero after which it would slowly climb back to the desired amount of peers. Start tracking the notification I/O of peers individually and evict them as they're deemed idle which prevents the peer count from suddenly dropping to zero and instead makes `SyncingEngine` continuously look for more active peers. This might make the peer count more unstable on average if there are a lot of inactive peers in the network. Fixes #3346
Previously `SyncingEngine` would use `last_notification_io` to track when it last sent or received a block announcement. If there had been no notification I/O for 30 seconds, `SyncingEngine` considered itself stalled and evicted all peers which gave `ProtocolController` a chance to establish new outbound connections. The problem with this approach was that if `SyncingEngine` actually got stalled, peer count would suddenly drop to zero after which it would slowly climb back to the desired amount of peers. Start tracking the notification I/O of peers individually and evict them as they're deemed idle which prevents the peer count from suddenly dropping to zero and instead makes `SyncingEngine` continuously look for more active peers. This might make the peer count more unstable on average if there are a lot of inactive peers in the network. Fixes #3346
@altonen I understand your point. I think we need to split the logic based on parachain vs relaychain behavior. |
We ran a burn-in for the proposed fix in Kusama AssetHub and it didn't work quite as well as I had hoped so I'm re-evaluating our approach. |
@girazoki yes that should be fixed by the mentioned pr, as the logic was removed. Aaro is also not working anymore for us. Going to close the issue as it should be solved. |
thank you! |
Is there an existing issue?
Experiencing problems? Have you tried our Stack Exchange first?
Description of bug
Running a regular node (Moonriver, v0.35.0, based on polkadot-sdk 1.3.0), which maintains 40 peers (configured 20 in, 20 outs), when a bad node connect with a bad genesis, all the peers get dropped:
Steps to reproduce
Running a moonbeam v0.35.0 on chain moonriver will likely reproduce the issue
The text was updated successfully, but these errors were encountered: