-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[persist] Bound the number of runs a batch builder generates #30231
Conversation
fba5f11
to
dbfbc77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First three (erm meant four) commits lgtm, but that's as far as I got today.
One high-level question I have if it's there's any sort of dyncfg rollout we could do here to derisk. I can see how it would be hard with the refactoring though. Maybe (handwave) a variant in WritingRuns
?
I think there's a couple options for keeping the existing behaviour:
Either option seems fundamentally fine. 1 is probably slightly more conservative but also more "surprising". I'll see what ends up feeling less gross. |
4dc9941
to
d814b58
Compare
This is intended to be a noop, but sets up the structure for a second way to track and merge runs.
Instead of maintaining one long run, this maintains a list of runs and compacts them together when they get too long.
This was causing inline writes to be ~disabled for normal user batches, because we generate the compaction config there now too. For the new "compacting" path we want to allow inline writes in general, but when parts are compacted together we still want to flush those out... so it makes the most sense to shift the override to right before it applies.
I've made the number of runs per compaction directly configurable via a dyncfg, falling back to the old implementation when it's set < 2. I've applied all the suggestions (I think!) and tried to make most of the commits behaviour-preserving. Only the last (non-bugfix) commit includes a functional change, and that's behind the disabled flag. Hopefully that makes it a somewhat easier review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed to the best of my ability and I think it looks good! Would maybe like to talk about this one in our 1:1 tomorrow to make sure I totally grok everything that's going on
/// - `finish` will return at most `K` elements. | ||
/// - The "depth" of the merge tree - the number of merges any particular element may undergo - | ||
/// is `O(log N)`. | ||
pub struct MergeTree<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat! Love having this logic contained in a single struct and well documented
In #30094, we added the ability to spill very long runs out to external state, to limit the amount of data we needed to keep in memory at once. However, there is another way that state can grow large: by appending individual batches with many runs. (Those runs will ~eventually get compacted together, but the state will be large in the meantime.)
This PR adds a new limit to the number of runs that a batch builder will produce. It does this by triggering background compactions whenever the number of runs grows large: so a source that is accumulating a large snapshot, for example, will start consolidating the data in the background as it's being built up.
Motivation
https://github.com/MaterializeInc/database-issues/issues/8401
Tips for reviewer
Should review fine commit-by-commit.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.