-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
geth 1.13.11 cannot sync with prysm 4.2.0 or above 4.2.0 #13557
Comments
This seems like a geth issue. In your geth logs, it seems to take hundreds of milliseconds to process a single block, sometimes more than one second. It should be much faster than that. Typical processing is less than 40ms to 150ms. When I've seen this in the past, I deleted the geth db and resynced geth. It's possible that there was an improper shutdown and state healing is ongoing in the background? See ethereum/go-ethereum#28855 (comment) |
Thank you. But why did beacon 4.1.1 work fine. I just replaced beacon version to 4.1.1 and didn't do anything else. Before trying 4.1.1, restarting the machine or replacing RC version didn't help. |
I did an upgrade from v4.0.8 to v4.2.1 on a Goerli node for Dencun and all went fine. So I decided to perform the same upgrade for a mainnet node (running Erigon). I can confirm that after the upgrade, the node cannot keep up to the head, and in prysm I get the same error as above:
I then downgraded to v4.1.1 and the node synced up to head without any issues. I did notice when it was in v4.2.1, the CPU was constantly hitting 100%, but with v4.1.1 it was more under control (30-50%). I also noticed the active peers was around 70 vs 45, which I believe is from this PR: but it probably has nothing to do with the issue. |
For those running into this issue, this PR should hopefully fix the issue. We will tag a rc soon and if all is well this will make it to our next release. |
We have a rc here: If this goes well, it will be our next release. You can give it a try to see if it resolves your issue |
Hi @nisdas I am testing the RC, and it seems to have resolved the issue! It used to trickle each payload one at a time, but now it is pushing a whole bunch through for the node to catch up. CPU is also back to normal, great work! I have also tested this on the Goerli node and all good there too. I did get this in the logs on startup, but everything seems operational after that...
|
@keithchew Do you have any error level logs prior to that one? It should have printed at least one log immediately before that to explain why it was unable to prune a directory. |
@prestonvanloon you are right, sorry about that, here are the 2 errors above it:
|
Thanks @keithchew. The unable to prune directory issue is something we are debugging on your log is very helpful. It shouldn't be a problem at runtime and you could ignore it for now. The workaround is to delete the directory |
@keithchew Following up on this. We did find another bug where blobs were not being saved properly. The issue you mentioned #13557 (comment) has been resolved in #13648 Edit: #13648 stops the issue from happening again, but does not clear bad blobs from disk. Delete your disk and resync or delete any zero byte ssz files from your blobs directory to stop the log messages. |
Describe the bug
At first, everything was normal, but it couldn't reboot. Once restarted and resynchronized, geth couldn't catch up. I tried the RC version of the beacon, but it didn't work either. Finally, I rolled back to version 4.1.1, and everything was fine again.
beacon shows some errors
geth stucks at age 12m. Although geth gets stuck, the geth console command (eth.syncing) shows that synchronization is completed.
Has this worked before in a previous version?
🔬 Minimal Reproduction
1.stop beacon and geth
2.start to sync again
3.geth stucks
Error
Platform(s)
Linux (x86)
What version of Prysm are you running? (Which release)
4.2.0
Anything else relevant (validator index / public key)?
No response
The text was updated successfully, but these errors were encountered: