Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly set smallest key of subcompaction output #4723

Closed

Conversation

abhimadan
Copy link
Contributor

Summary: It is possible to see a situation like the following when
subcompactions are enabled:

  1. A subcompaction boundary is set to [b, e).
  2. The first output file in a subcompaction has c@20 as its smallest key
  3. The range tombstone [a, d)@30 is encountered.
  4. The tombstone is written to the range-del meta block and the new
    smallest key is set to b@0 (since no keys in this subcompaction's
    output can be smaller than b).
  5. A key b@10 in a lower level will now reappear, since it is not
    covered by the truncated start key b@0.

In general, unless the smallest data key in a file has a seqnum of 0, it
is not safe to truncate a tombstone at the start key to have a seqnum of
0, since it can expose keys with a seqnum greater than 0 but less than
the tombstone's actual seqnum.

To fix this, when the lower bound of a file is from the subcompaction
boundaries, we now set the seqnum of an artificially extended smallest
key to the tombstone's seqnum. This is safe because subcompactions
operate over disjoint sets of keys, and the subcompactions that can
experience this problem are not the first subcompaction (which is
unbounded on the left).

Furthermore, there is now an assertion to detect the described anomalous
case.

Test Plan: run the following command a few times:

make db_stress && TEST_TMPDIR=/dev/shm ./db_stress --max_background_compactions=8 --subcompactions=0 --memtablerep=skip_list --acquire_snapshot_one_in=10000 --delpercent=4 --delrangepercent=1 --snapshot_hold_ops=100000 --allow_concurrent_memtable_write=1 --compact_files_one_in=10000 --clear_column_family_one_in=0 --writepercent=35 --readpercent=25 --write_buffer_size=1048576 --max_bytes_for_level_base=4194304 --target_file_size_base=1048576 --column_families=1 --compact_range_one_in=10000 --open_files=-1 --max_key=10000000 --prefixpercent=25 --ops_per_thread=1000000

Summary: It is possible to see a situation like the following when
subcompactions are enabled:
1. A subcompaction boundary is set to `[b, e)`.
2. The first output file in a subcompaction has `c@20` as its smallest key
3. The range tombstone `[a, d)@30` is encountered.
4. The tombstone is written to the range-del meta block and the new
   smallest key is set to `b@0` (since no keys in this subcompaction's
   output can be smaller than `b`).
5. A key `b@10` in a lower level will now reappear, since it is not
   covered by the truncated start key `b@0`.

In general, unless the smallest data key in a file has a seqnum of 0, it
is not safe to truncate a tombstone at the start key to have a seqnum of
0, since it can expose keys with a seqnum greater than 0 but less than
the tombstone's actual seqnum.

To fix this, when the lower bound of a file is from the subcompaction
boundaries, we now set the seqnum of an artificially extended smallest
key to the tombstone's seqnum. This is safe because subcompactions
operate over disjoint sets of keys, and the subcompactions that can
experience this problem are not the first subcompaction (which is
unbounded on the left).

Furthermore, there is now an assertion to detect the described anomalous
case.

Test Plan: run the following command a few times:
```
make db_stress && TEST_TMPDIR=/dev/shm ./db_stress --max_background_compactions=8 --subcompactions=0 --memtablerep=skip_list --acquire_snapshot_one_in=10000 --delpercent=4 --delrangepercent=1 --snapshot_hold_ops=100000 --allow_concurrent_memtable_write=1 --compact_files_one_in=10000 --clear_column_family_one_in=0 --writepercent=35 --readpercent=25 --write_buffer_size=1048576 --max_bytes_for_level_base=4194304 --target_file_size_base=1048576 --column_families=1 --compact_range_one_in=10000 --open_files=-1 --max_key=10000000 --prefixpercent=25 --ops_per_thread=1000000
```

Reviewers:

Subscribers:

Tasks:

Tags:
@abhimadan
Copy link
Contributor Author

Note that this bug is pretty obscure, since it only affects users of both subcompactions and DeleteRange.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhimadan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clear description, it was really helpful for refreshing my memory about the problem. It's interesting that subcompaction boundaries can be chosen such that a range tombstone spans multiple. I think it requires the boundary to be chosen based on a file endpoint in the other level (i.e., the one that doesn't have the range tombstone).

// lower_bound. We also know that smaller subcompactions exist, because
// otherwise the subcompaction woud be unbounded on the left. As a
// result, we know that no other files on the output level will contain
// keys at lower_bound. Therefore, it is safe to use the tombstone's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible though that another output level file's end key is at lower_bound with kMaxSeqnum, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point, that's definitely possible. I'll adjust this to say that "real" keys can't be at lower_bound in other output files.

#ifndef NDEBUG
SequenceNumber smallest_ikey_seqnum = kMaxSequenceNumber;
if (meta->smallest.size() > 0) {
GetInternalKeySeqno(meta->smallest.Encode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops forgot to assign the result to smallest_ikey_seqnum

@facebook-github-bot
Copy link
Contributor

@abhimadan has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhimadan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@abhimadan abhimadan deleted the lower-subcompact-trunc-fix branch December 10, 2018 20:40
@abhimadan abhimadan mentioned this pull request Jan 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants