-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forest is getting stuck during sync #3990
Labels
Network
Libp2p and PubSub stuff
Priority: 2 - High
Very important and should be addressed ASAP
Ready
Issue is ready for work and anyone can freely assign it to themselves
Comments
LesnyRumcajs
added
Priority: 2 - High
Very important and should be addressed ASAP
Network
Libp2p and PubSub stuff
Ready
Issue is ready for work and anyone can freely assign it to themselves
labels
Feb 26, 2024
Same happened recently - on two distinct servers. This may indicate a general condition in the network that Forest is just not able to handle.
Added the metrics to DO bucket |
4 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Network
Libp2p and PubSub stuff
Priority: 2 - High
Very important and should be addressed ASAP
Ready
Issue is ready for work and anyone can freely assign it to themselves
Issue summary
Over the last few days, Forest was getting consistently stuck in the snapshot service. It happens during this part of the script, namely while the script is waiting for the sync. During the timeout window (six hours), Forest either manages to recover alone or is killed.
While the steps are always the same and the service is always created from scratch, it heavily depends on the network conditions.
Take, for example, this forest.out.txt from 2024-02-26. It repeatedly manages to bootstrap, then stalls for around 10 minutes on evaluating the network head and bootstraps again. My understanding is that, once bootstrapped, should quickly switch to FOLLOW mode.
It may be relevant or not, but the bootstrapping failed a few times during this execution, e.g,.
Overall, even if we end up with a few bad peers, the entire mainnet should have around 3000 peers the last time I checked. Do we end up with 500 bad ones?
Other information and links
This is happening on forest v0.16.6
The text was updated successfully, but these errors were encountered: