-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage
: compaction enhancements
#23380
storage
: compaction enhancements
#23380
Conversation
Replace `internal::get_tombstone_delete_horizon()` with `internal::is_past_tombstone_delete_horizon()`. This change makes the code a little bit cleaner, and potentially reduces the number of operations in `should_keep()` related functions, as the current timestamp only needs to be evaluated once for every tombstone record in a segment with a clean compaction timestamp. Also remove `should_remove_tombstone_record()`, since its logic is now trivial.
e3fc594
to
2b3857e
Compare
Force push to:
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54716#0192085b-a525-4008-93ae-ef3c61748c59 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54716#0192085b-a528-4008-9518-61df56f770e7 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54786#01920d19-c098-4a03-9a7d-58fde00f7d0c ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54996#0192226e-4d2f-44ef-a40e-14c2c198216d |
src/v/storage/segment_utils.cc
Outdated
if (s->finished_self_compaction() || !s->has_compactible_offsets(cfg)) { | ||
if ( | ||
(s->finished_self_compaction() || !s->has_compactible_offsets(cfg)) | ||
&& !may_have_removable_tombstones(s, cfg)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very important for allowing self compaction to remove tombstones, even if the segment has already been through a round of self compaction.
if (seg->finished_self_compaction()) { | ||
continue; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removed code here was a superfluous check that we perform in self_compact_segment()
anyways. move continue
logic after call by checking for result.did_compact() == false
.
@@ -197,6 +197,11 @@ class disk_log_impl final : public log { | |||
_last_compaction_window_start_offset = o; | |||
} | |||
|
|||
const std::optional<model::offset>& | |||
get_last_compaction_window_start_offset() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For testing.
src/v/storage/disk_log_impl.cc
Outdated
for (const auto& seg : segs) { | ||
if (internal::may_have_removable_tombstones(seg, cfg)) { | ||
filtered_buf.emplace_back(seg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of how this means the returned range is no longer contiguous segments. Can we put the onus for checking this on callers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed behaviour of find_sliding_range()
to once again return a contiguous range of segments- however, this range may include segments already marked as being cleanly compacted/having finished window compaction, and the onus is on the caller to filter them as they see fit.
if ( | ||
_last_compaction_window_start_offset.has_value() | ||
&& (seg->offsets().get_base_offset() | ||
>= _last_compaction_window_start_offset.value())) { | ||
// Force clean segment production by compacting down to the | ||
// start of the log before considering new segments in the | ||
// compaction window. | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems possible that segments could have been removed such that the first segment falls above _last_compaction_window_start_offset. In that case, does this always return empty and we get stuck never compacting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should reset _last_compaction_window_start_offset at the top of this method. Then we wouldn't need to worry about resetting it at compaction time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout. Added a new sanity check to the top of find_sliding_range()
to reset the _last_compaction_window_start_offset
if it is <= _segs.front()->offsets().get_base_offset()
.
However, the code which resets it in sliding_window_compact
is still required- we want to compare idx_start_offset
to the base offset of the first segment in our sliding window range (which is not necessarily the front segment in the log) as an indicator of whether or not we can reset the start offset and allow new segments into the range.
@@ -116,6 +116,25 @@ ss::future<model::offset> build_offset_map( | |||
cfg.asrc->check(); | |||
} | |||
auto seg = *iter; | |||
if (seg->index().has_clean_compact_timestamp()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just making sure, is there anywhere in code comments that describes what it means to be clean compacted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future changes for sliding window compaction scheduling, it is helpful to have a flag that indicates whether a segment may contain tombstone records or not. Add a new `bool` field, `may_have_tombstone_records` to the `index_state`, and mark/unmark its value during segment deduplication and data copying. This field is `true` by default, which can lead to false-positives. A segment is only considered to not have tombstone records after proven by de-duplication/ segment data copying in the compaction process.
Adds a helper function that indicates whether or not a segment may have tombstones eligible for deletion. This can return false-positives, since any segment that has not yet been through the compaction process is assumed to potentially have tombstones until proven otherwise.
2b3857e
to
7249b56
Compare
// | ||
// If there are new segments that have not been compacted, we can't make | ||
// this claim, and compact everything again. | ||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may look a bit scary to remove this chunk of code, but this is functionally performing the same "clean-compacted" check we now perform here (since, logically, is_clean_compacted
is defined this way here.)
We want to leave these segments in the compaction range so that they can be self-compacted in the case that they contain tombstones, and then filter them out afterwards before proceeding to sliding window compaction.
storage:
compaction enhancementsstorage
: compaction enhancements
7249b56
to
3ac5b64
Compare
This commit does two things, which seemed easier to combine into one commit: 1. The behavior of sliding window compaction is changed, such that newly added/closed segments are ignored until the current "round" of sliding window compaction cleanly compacts all segments down to the start of the range. This allows for sliding window compaction to avoid a situation where clean segments are not being produced due to a high ingress rate or key cardinality (this situation would prevent timely tombstone removal). `_last_compaction_window_offset` must reach the base offset of the first segment in the currently active window before new segments will be considered for compaction. 2. Segments that are cleanly compacted or have been through a round of window compaction are considered in the sliding window range. However, it would be a no-op to actually perform window compaction over these segments. Self-compaction is performed to remove tombstones on segments that may contain them, and all cleanly compacted segments are removed before sliding window compaction occurs. This allows for timely tombstone removal by avoiding the situation in which a partition which is no longer being produced to can still trigger tombstone removal. Both of these changes improve the rate at which tombstone removal occurs, and help prevent clean segment/tombstone removal "starvation".
Cleanly compacted segments do not need to have their keys added to the compaction offset map, since the deduplication process considers unindexed keys to be valid records to keep. By not indexing these segments, the compaction process can use less memory and cleanly compact down to the start of the log faster.
3ac5b64
to
2626003
Compare
Force push to:
|
PR Summary
Minor tombstone additions/tweaks:
internal::get_tombstone_delete_horizon()
withinternal::is_past_tombstone_delete_horizon()
for clean-up of logic and reduced comparisons.segment::may_have_tombstone_records()
bitflag.may_have_removable_tombstones()
function.A number of optimizations to compaction, both in general and for tombstones:
_last_compaction_window_offset
must reach the base offset of the first segment in the log before new segments will be considered in the compaction window.Backports Required
Release Notes
Improvements