-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Rococo validator using paritydb stuck on startup with "Missing table, starting reindex" #13179
Comments
Stuck in the sense that it did not print anything anymore? Does it still respond to RPC? |
Yes it does not print anything anymore or respond to RPC. |
Can we get a copy of the database? |
Reports that I've requested in https://github.com/paritytech/devops/issues/1539 would be useful here |
Here is the
|
Ok, it looks like the database has a lot of unprocessed commits in the queue, that it tries to enact on startup. The node should start eventually. Still, a copy database snapshot would be nice.
|
Here's also a copy of the db, compressed with zstd: https://storage.googleapis.com/rococo-blockstore-backups/rococo-paritydb-archive.tar.zst |
@arkpar this particular node is stuck since 4 days and 6 hours and has minimal cpu usage. |
After examining the database: it seems like there's been a collision in the STATE column. I.e. a lot of similar keys have been inserted which blew up the index to 4TiB. Normally this should not be happening because STATE column only contains trie nodes, so the keys should be uniformly distributed. I'm going to sync rococo to try and reproduce this and get back in few days. |
I have several Rococo validator which remains stuck after startup. The only way to fix the issue is to restore the database from a backup.
Version: 0.9.37
Relevant flags:
I also noticed that just before the node became stuck like this it experienced huge cpu/memory usage which caused several OOM kills. Maybe the database was corrupted by OOM kills ?
The text was updated successfully, but these errors were encountered: