Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to dump wal seqno gaps #13014

Closed
wants to merge 5 commits into from

Conversation

jowlyzhang
Copy link
Contributor

@jowlyzhang jowlyzhang commented Sep 13, 2024

Add an option --only_print_seqno_gaps for wal dump to help with debugging. This option will check the continuity of sequence numbers in WAL logs, assuming seq_per_batch is false. --walfile option now also takes a directory, and it will check all WAL logs in the directory in chronological order.

When a gap is found, we can further check if it's related to operations like external file ingestion.

Test plan:
Manually tested

Copy link
Contributor

@ltamasi ltamasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement @jowlyzhang ! LGTM overall, just some minor questions/comments

tools/ldb_cmd.cc Outdated Show resolved Hide resolved
std::string WALPicker::GetNextWAL() {
assert(Valid());
std::string ret;
if (wal_file_iter_ != log_files_.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check seems redundant, since we assert(Valid()); above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check is helpful if that assert line is not compiled in the build.

tools/ldb_cmd.cc Outdated Show resolved Hide resolved
Comment on lines +2940 to +2951
DumpWalFile(options, dir_or_file, print_header, print_values,
only_print_seqno_gaps, is_write_committed, ucmps, exec_state,
&prev_batch_seqno, &prev_batch_count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: what happens here if dir_or_file is an existing but empty directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! The error message is not clear, I have added checks for it to have a legit log file name.

tools/ldb_cmd.cc Show resolved Hide resolved
Comment on lines +3088 to +3107
row << sequence_number << ",";
row << batch_count << ",";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: so what the new option does is print the batch after the gap, right? Would it make sense to print some info about the start of the gap (prev_batch_seqno->value() + prev_batch_count->value()) as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good point. I have updated it print a line like this:
Prev batch sequence number: 163356, prev batch count: 1, 164172,1,112,78,PUT(6) : 0x00000000000224C9000000000000012B00000000000000EF18E44F7BD5721300

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang merged this pull request in 1238120.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants