-
Notifications
You must be signed in to change notification settings - Fork 20.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to keep up with pending state heal #26687
Comments
erata: ancient db is on a network connected storage but doesnt seem to cause any issue bc ancient read/writes are almost zero
|
Before the two week downtime, I'm assuming the node was not finished synced? Which means, that aborting the sync while it is performing state-heal, and then continuing two weeks later, all the snap data will be bitrotted. And the impact is that you'll be forced to basically do a fast-sync over the snap protocol, and that's going to be a pita. |
Was the node synced before you turned the machine off? If you were halfway through a sync (or all the way through really, but not yet fully synced) and stopped in between, all the old data will bitrot like crazy in 2 weeks. In that case, just resync from zero (keep the ancients to avoid redownloading the chain part). |
the node was synced before yes. it was likely in a boot loop due to insufficient space. there was an unclean shutdown 2 weeks ago at the start of the down time |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
eventually completed state heal -- i think my machine can barely keep up with state heals and the "random walk" like behavior in the pending graph eventually was able to wander to zero and complete |
hi I'm running windows with prysm and in the same boat - finally got past state download in progress and now it's chain download in progress and state healing in progress and jumps around in a range from 10 min ETA to 20 min ETA - I have a fast 2tb ssd , 20gb ram , i5 and been validating since genesis with no issues at all. this is the longest I've been down after a power failure. I updated everything to latest versions and cleared out chaindata to start from scratch.. one thing I did notice is the geth chaindata folder has 121,000 files now even after starting from scratch. Before I cleared it, it only had 55,000 files.. I'm wondering if that many files is slowing down the ssd. any help is appreciated. Thanks in advance. |
System information
Geth version:
instance=Geth/v1.10.26-stable-e5eb32ac/linux-amd64/go1.18.8
CL client & version: prysm:stable
OS & Version:
Linux name 5.19.0-31-generic #32-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 20 15:20:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Expected behaviour
"State heal in progress" eventually completes
Actual behaviour
"State heal in progress" continues indefinitely
Steps to reproduce the behaviour
Here is a graph of the
pending
"State heal in progress" field in the log line over timelots of memory available for system disk Buffer
some disk statistics. iowait is ~50%
geth syncing dashboard snapshot https://snapshots.raintank.io/dashboard/snapshot/j57U07jPZBxmA5wxR2bM7PkBcfNCKIpx
system dashboard snapshot https://snapshots.raintank.io/dashboard/snapshot/duM8SNGtvhRkU3e9UO9jDsF4BT3J2n77
here is a small section of logs https://gist.github.com/kumavis/889eb03156fa7cc54935917b2539f10f
let me know what additional data can help
The text was updated successfully, but these errors were encountered: