Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop support for databases repair attempt. #666

Closed
ytrezq opened this issue Aug 23, 2022 · 12 comments · Fixed by #667
Closed

Drop support for databases repair attempt. #666

ytrezq opened this issue Aug 23, 2022 · 12 comments · Fixed by #667

Comments

@ytrezq
Copy link

ytrezq commented Aug 23, 2022

This comes from openethereum/openethereum#264.
Currently, if the database is corrupted, kvdb-Rocksdb will attempt to repair it using the code from leveldb.

As the code only works on single column databases, it’s no longer effective and just lur users into possible rescue which won’t happen. This should be removed and replaced with an option to trim the database (from the beginning to the specified block possibly the latest valid block in the case of sst corruption).

@ordian
Copy link
Member

ordian commented Aug 23, 2022

Hey @ytrezq

Here are some questions and thoughts:

  1. What code works on single column databases?
  2. What does number of columns have to do with db corruption and attempting to repair it?
  3. What do you by block? Block as in blockchain? kvdb-rocksdb is a generic key-value store and knows nothing about its content.
  4. kvdb-rocksdb is a library, you're free to implement a function to open the db and delete it on error if you want.

@ytrezq
Copy link
Author

ytrezq commented Aug 23, 2022

@ordian the problem is kvdb-rocksdb attempt repairs automatically https://github.com/paritytech/parity-common/blob/master/kvdb-rocksdb/src/lib.rs#L326 regardless of the number of columns which results in deleting all columns in the Manifest and thus doing more harm than good.
So at next restart this code is triggered https://github.com/paritytech/parity-common/blob/master/kvdb-rocksdb/src/lib.rs#L337

@ordian
Copy link
Member

ordian commented Aug 23, 2022

@ytrezq
Copy link
Author

ytrezq commented Aug 23, 2022

@ordain this is a well known issue upstream. The underlying logic comes from the leveldb era.

Those functions are no longer in use at Facebook.

@ordian
Copy link
Member

ordian commented Aug 23, 2022

@ytrezq could you be more specific which issue is it? Do you have a link or a repro test case?
I could only find this limitation: https://github.com/facebook/rocksdb/wiki/RocksDB-Repairer#limitations

If the column family is created recently and not persisted in sst files by a flush, then it will be dropped during the repair process. With this limitation repair would might even damage a healthy db if its column families are not flushed yet.

Which is mentioned in facebook/rocksdb#5073.

The reason I ask is there are tests for RepairDB for multiple columns: https://github.com/facebook/rocksdb/blob/b16655a547c3a44f8dcbe09614ef7ebb8daa83ac/db/repair_test.cc#L306.

And if those functions are no longer in use at Meta, do you happen to know what they use instead?

@ytrezq
Copy link
Author

ytrezq commented Aug 23, 2022

@ordian : Delete the manifest file of a properly shut down https://github.com/paritytech/polkadot database ; let the database to be rebuilt at next run ; Then, https://github.com/paritytech/parity-common/blob/master/kvdb-rocksdb/src/lib.rs#L337 is triggered beacause the database contains only 1 column.

@ordian
Copy link
Member

ordian commented Aug 24, 2022

The database won't be repaired if the MANIFEST file is missing:

Backend error: IO error: No such file or directory: While opening a file for sequentially reading: dev/chains/dev/db/full/MANIFEST-000008: No such file or directory

When deleting both CURRENT and MANIFEST-* files, the db opens with no problems.

@ytrezq
Copy link
Author

ytrezq commented Aug 24, 2022

@ordian : by no problems, do you mean it opens immediately or it does attempt to recover first ?

@ordian
Copy link
Member

ordian commented Aug 24, 2022

Sorry, my bad. The recovery attempt only happens if CORRUPTED file is present. Indeed running

$ polkadot --validator --dev -d dev --pruning=archive
^Ctrl-C
$ touch dev/chains/dev/db/full/CORRUPTED
$ polkadot --validator --dev -d dev --pruning=archive
Backend error: Invalid argument: Column families not opened: col9, col8, col5, col4, col3, col2, col1, col0

results in a corrupted manifest file. I'll submit a PR to remove the repairer attempt. Thanks for letting us know and bearing with me :)

@ytrezq
Copy link
Author

ytrezq commented Aug 24, 2022

@ordian : better. In the case of a corrupt crc32 on an entry, rewrite the crc or delete the entry so Polkadot or Openethereum can use the database again.

@ytrezq
Copy link
Author

ytrezq commented Aug 24, 2022

@ordian : an even better approach would be to bring multi column supports upstream.

@ordian
Copy link
Member

ordian commented Aug 24, 2022

@ytrezq our even better approach is replace RocksDB with https://github.com/paritytech/parity-db long-term ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants