Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RepairDB() drops unflushed non-default column families #5073

Open
simpkins opened this issue Mar 14, 2019 · 5 comments
Open

RepairDB() drops unflushed non-default column families #5073

simpkins opened this issue Mar 14, 2019 · 5 comments
Labels
up-for-grabs Up for grabs

Comments

@simpkins
Copy link
Contributor

Calling RepairDB() on a valid DB that was shut down cleanly can drop all data in non-default column families if they were not explicitly flushed.

Expected behavior

RepairDB() does not lose data if called on a database that does not actually have any corruption that needs to be repaird.

Actual behavior

RepairDB() drops non-default column families that do not have data that was flushed into an SST.

Steps to reproduce the behavior

Failing test case added here: simpkins@bf36f6d

@ajkr
Copy link
Contributor

ajkr commented Mar 14, 2019

@simpkins - There should be a "lost/" directory containing the WALs and MANIFESTs that were deleted by running RepairDB.

@simpkins
Copy link
Contributor Author

Yes, the data that was dropped is archived into lost/ and can be manually recovered by a human afterwards, but it still seems like a bug that calling RepairDB() on a valid DB removes this data and requires manual recovery afterwards.

Note that this behavior is the same even if you call RepairDB() with an explicit list of all column families that should be present.

@maysamyabandeh
Copy link
Contributor

Thanks @simpkins for the report. I confirm that I can reproduce the issue using the test that you provided.
Apparently the assumption of repair db has been that it would be run only after a disaster so the recovery process has taken the liberty to favor simplicity. It currently discards the manifest, where the column families are registered, and recreate a new manifest by inspecting the existing SST files. The WAL is processed afterwards, when the insertion for new column families are ignored as they are not represented in the recovered manifest.
I am not sure if we should clarify this behavior in the API/docs or spent time to make repair work for this case (and perhaps gets more complicated).

@maysamyabandeh
Copy link
Contributor

I clarified this issue in the wiki: https://github.com/facebook/rocksdb/wiki/RocksDB-Repairer#limitations until it is properly fixed.

@maysamyabandeh maysamyabandeh added the up-for-grabs Up for grabs label Mar 20, 2019
@ajkr
Copy link
Contributor

ajkr commented Mar 20, 2019

One thing we talked about doing years ago is having a RepairDBRollback(), so users who don't like the effects of RepairDB() can rollback to the previous state without having to manually copy files. This isn't quite as good as preventing RepairDB() from making the DB worse in the first place, but might be a compromise to consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
up-for-grabs Up for grabs
Projects
None yet
Development

No branches or pull requests

3 participants