-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify WAL-disabled crash-recovery consistency globally #11613
Comments
I would like to work on this if you could give me some pointers on how to fix this and where to start. |
I think this was necessary because some of the db_stress invocations produced by whitebox crash test mainly/only crash in write functions. When WAL is disabled there are very few calls to write functions since writes to SST files are buffered and other writes (e.g., MANIFEST) are infrequent. So db_stress might end up never crashing itself. We do have a 15 minute timeout for it to be killed externally, though consistently hitting that would make it essentially a blackbox crash test with a longer than usual interval: Line 930 in cf95821
The two options listed imply full |
Couldn't find any issue after specifying |
Thanks a lot @ajkr. I also read the updated TODO. To solve first problem -- crash tests are not crashing in 15 minutes, can we add more kill points to the code? Would it make sense to add more kill points to For the second one, what I understand is there is another issue wal-disabled=1 and reopen>0 is not supported. Why that is the case and what needs to be done to support this? A relative question, it looks like repopen gracefully closes the DB. Which crash scenarios it helps to cover? Thanks a lot for your help. |
Yes that sounds good to me.
The restriction was added 10 years ago (3827403) and might not still be relevant. Feel free to remove it. I suppose the scenario covered by |
Expected behavior
Crash Recovery Consistency needs to be verified globally (whitebox and blackbox tests and txns) when using Rocks in WAL disabled setting.
Actual behavior
According to this PR (#9338), currently disabling WAL with whitebox tests is not possible. We should be able to test consistency of the recovered data on WAL disabled use cases, where unflushed writes are expected to be lost.
Steps to reproduce the behavior
There is a TODO on the code that disables running whitebox tests with WAL disabled setting:
see tools/db_crashtest.py
The text was updated successfully, but these errors were encountered: