-
Notifications
You must be signed in to change notification settings - Fork 768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kusama: Beefy gossip validator is processing messages too slowly #3390
Comments
I am guessing this is a non-issue, 1000 validators gossiping, multiple messages getting enqueued without being processed during node restart. Maybe we should increase the limit for this warning on Kusama to Besides seeing this shortly after restart, are you experiencing any other issues? Also, is your node finalizing BEEFY blocks? do you see logs like |
From my side, beside seeing similar message after a restart, all seems to be good, it's finalizing BEEFY blocks:
|
Why? If its CPU usage, we do not check the BLS signature when gossiping, right? |
happens only during node restart while gossiped messages are piling up during restart (BEEFY voter task starts late in the process while network gossip subsystem starts early). Once node restart process is complete and BEEFY voter/worker task gets crunching, it seems it consumes the pending gossip messages and carries on nicely.
For completeness of sanity checks, please also check node RAM usage over time. The bad scenario to invalidate here is that the gossip messages keep piling up faster than being consumed - in which case the RAM usage would reflect this very visibly over a, say, 24h window. |
As Paulo is, seeing finalised BEEFY blocks
|
I had the same issue with the v1.7.1 and not only at the restart of the node |
On both KSM nodes I updated, I received the error |
Ram & cpu seem ok but network usage has doubled since the activation of beefy |
GRANDPA was designed so you only need to pay attention to the current round. In principle BEEFY should continue this, so maybe this requires some look ahead that realizes older messsages could now be discarded. Alternatively BEEFY rounds could simply be run less frequently, so then even though this happens it winds up irrelevant.
Do we know if BEEFY caused this? I'd think network usage doubling sounds more like async backing, but not really sure. |
You are right, a lot of things happening recently (runtime upgrade, new version, beefy), so it could also be the async backing kicking in. |
…on conditions (#3435) As part of BEEFY worker/voter initialization the task waits for certain chain and backend conditions to be fulfilled: - BEEFY consensus enabled on-chain & GRANDPA best finalized higher than on-chain BEEFY genesis block, - backend has synced headers for BEEFY mandatory blocks between best BEEFY and best GRANDPA. During this waiting time, any messages gossiped on the BEEFY topic for current chain get enqueued in the gossip engine, leading to RAM bloating and output warning/error messages when the wait time is non-negligible (like during a clean sync). This PR adds logic to pump the gossip engine while waiting for other things to make sure gossiped messages get consumed (practically discarded until worker is fully initialized). Also raises the warning threshold for enqueued messages from 10k to 100k. This is in line with the other gossip protocols on the node. Fixes #3390 --------- Signed-off-by: Adrian Catangiu <adrian@parity.io>
Fixed by #3435 and will be released with node version |
…on conditions (#3435) As part of BEEFY worker/voter initialization the task waits for certain chain and backend conditions to be fulfilled: - BEEFY consensus enabled on-chain & GRANDPA best finalized higher than on-chain BEEFY genesis block, - backend has synced headers for BEEFY mandatory blocks between best BEEFY and best GRANDPA. During this waiting time, any messages gossiped on the BEEFY topic for current chain get enqueued in the gossip engine, leading to RAM bloating and output warning/error messages when the wait time is non-negligible (like during a clean sync). This PR adds logic to pump the gossip engine while waiting for other things to make sure gossiped messages get consumed (practically discarded until worker is fully initialized). Also raises the warning threshold for enqueued messages from 10k to 100k. This is in line with the other gossip protocols on the node. Fixes #3390 --------- Signed-off-by: Adrian Catangiu <adrian@parity.io>
…on conditions (paritytech#3435) As part of BEEFY worker/voter initialization the task waits for certain chain and backend conditions to be fulfilled: - BEEFY consensus enabled on-chain & GRANDPA best finalized higher than on-chain BEEFY genesis block, - backend has synced headers for BEEFY mandatory blocks between best BEEFY and best GRANDPA. During this waiting time, any messages gossiped on the BEEFY topic for current chain get enqueued in the gossip engine, leading to RAM bloating and output warning/error messages when the wait time is non-negligible (like during a clean sync). This PR adds logic to pump the gossip engine while waiting for other things to make sure gossiped messages get consumed (practically discarded until worker is fully initialized). Also raises the warning threshold for enqueued messages from 10k to 100k. This is in line with the other gossip protocols on the node. Fixes paritytech#3390 --------- Signed-off-by: Adrian Catangiu <adrian@parity.io>
Validators on Kusama report the following:
The text was updated successfully, but these errors were encountered: