-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track WAL in MANIFEST: add option track_and_verify_wals_in_manifest #7275
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Cheng-Chang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
options/db_options.h
Outdated
@@ -22,6 +22,7 @@ struct ImmutableDBOptions { | |||
bool create_missing_column_families; | |||
bool error_if_exists; | |||
bool paranoid_checks; | |||
bool check_wal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIt: per our team discussion, maybe it's better to name it should_check_wal
? cc @pdillinger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further reflection (re naming in #7214), I am more critical of re-using verb member variable names in classes (as accessors) than having them in structs to begin with. This struct already has a lot of verb names. ¯_(ツ)_/¯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I do think it's worth thinking about a better name. verify_wals
? verify_wals_with_manifest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think check_wal
delivers the meaning, there are other "checks", such as "paranoid_checks", "check_sst_file_size", etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riversand963 @pdillinger does track_and_verify_wals_in_manifest
sound good to you?
include/rocksdb/options.h
Outdated
// No matter whether this is true or false, the WAL information are always | ||
// tracked by MANIFEST. | ||
// Default: false | ||
bool check_wal = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't this be true by default if we always pay the ongoing cost of tracking, and the only check is during recovery?
When / why would someone want to use false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user does not even sync WALs, then the user might skip checking the WALs since there is no guarantee of WAL's existence or completeness on recovery.
@@ -374,6 +374,14 @@ struct DBOptions { | |||
// Default: true | |||
bool paranoid_checks = true; | |||
|
|||
// If true, check on-disk WALs against WAL information stored in MANIFEST | |||
// during recovery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"check" is not very descriptive to me. How about something like "If true, fail recovery if there is an inconsistency between WAL information in MANIFEST and actual WALs on disk"?
What sort of corruption might this catch that's not otherwise caught? (Don't we always want to report corruption?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we quietly ignore extra/untracked WALs or fail in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing or corrupted WAL will be reported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still not clear to me how extra/untracked WALs will be handled. Can a DB switch between using track_and_verify_wals_in_manifest=true and false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, switches between true and false can happen. The verification will only verify those tracked WALs.
@riversand963 actually, I think Peter's suggestion makes sense. Since we always track WAL in MANIFEST, it doesn't make too much sense to disable checking WALs during recovery. This PR may be unnecessary. What's your thought? |
It sounds a good idea to always check WAL. |
After offline discussion, we'll keep this |
@cheng-chang has updated the pull request. You must reimport the pull request before landing. |
@cheng-chang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Cheng-Chang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@cheng-chang has updated the pull request. You must reimport the pull request before landing. |
@cheng-chang has updated the pull request. You must reimport the pull request before landing. |
@cheng-chang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Cheng-Chang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@cheng-chang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Cheng-Chang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@cheng-chang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Cheng-Chang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though the new name is kind of long, I think it's clearer and therefore better.
I don't know this area well enough to endorse or reject the overall strategy, but my previous concerns are sufficiently addressed. Some additional comments / clarification requested.
@@ -374,6 +374,14 @@ struct DBOptions { | |||
// Default: true | |||
bool paranoid_checks = true; | |||
|
|||
// If true, check on-disk WALs against WAL information stored in MANIFEST | |||
// during recovery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still not clear to me how extra/untracked WALs will be handled. Can a DB switch between using track_and_verify_wals_in_manifest=true and false?
// during recovery. | ||
// | ||
// Default: false | ||
bool track_and_verify_wals_in_manifest = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this option is not currently connected to anything (we don't have diff stacks unfortunately), please drop in something like
// FIXME(cheng): This option is part of a work in progress and does not yet work
in case through some mixup or unexpected absence from work, a release only has some of your changes
@cheng-chang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Cheng-Chang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@cheng-chang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Cheng-Chang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@Cheng-Chang merged this pull request in 12b78e4. |
…acebook#7275) Summary: This option determines whether WALs will be tracked in MANIFEST and verified on recovery. Pull Request resolved: facebook#7275 Test Plan: db_options_test options_test Reviewed By: pdillinger Differential Revision: D23181418 Pulled By: cheng-chang fbshipit-source-id: 5dd1cdc166f3dfc1c93c094df4a2f7734e3b4547
This option determines whether WALs will be tracked in MANIFEST and verified on recovery.
Test Plan:
db_options_test
options_test