Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[persist] Bound the number of runs a batch builder generates #30231

Merged
merged 8 commits into from
Nov 21, 2024

Conversation

bkirwi
Copy link
Contributor

@bkirwi bkirwi commented Oct 28, 2024

In #30094, we added the ability to spill very long runs out to external state, to limit the amount of data we needed to keep in memory at once. However, there is another way that state can grow large: by appending individual batches with many runs. (Those runs will ~eventually get compacted together, but the state will be large in the meantime.)

This PR adds a new limit to the number of runs that a batch builder will produce. It does this by triggering background compactions whenever the number of runs grows large: so a source that is accumulating a large snapshot, for example, will start consolidating the data in the background as it's being built up.

Motivation

https://github.com/MaterializeInc/database-issues/issues/8401

Tips for reviewer

Should review fine commit-by-commit.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@bkirwi bkirwi force-pushed the merge-tree branch 6 times, most recently from fba5f11 to dbfbc77 Compare October 31, 2024 19:05
Copy link
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First three (erm meant four) commits lgtm, but that's as far as I got today.

One high-level question I have if it's there's any sort of dyncfg rollout we could do here to derisk. I can see how it would be hard with the refactoring though. Maybe (handwave) a variant in WritingRuns?

src/persist-client/src/batch.rs Outdated Show resolved Hide resolved
src/persist-client/src/batch.rs Outdated Show resolved Hide resolved
src/persist-client/src/batch.rs Outdated Show resolved Hide resolved
src/persist-client/src/batch.rs Outdated Show resolved Hide resolved
@bkirwi
Copy link
Contributor Author

bkirwi commented Nov 11, 2024

One high-level question I have if it's there's any sort of dyncfg rollout we could do here to derisk. I can see how it would be hard with the refactoring though.

I think there's a couple options for keeping the existing behaviour:

  1. On main we have a bit of a hack where we generate a single unordered "run" for user batches, then split it up at the end. We could preserve this in the new version too.
  2. Configuring the number-of-runs-before-merging gives behaviour that's not super dissimilar with how things work today.

Either option seems fundamentally fine. 1 is probably slightly more conservative but also more "surprising". I'll see what ends up feeling less gross.

@bkirwi bkirwi force-pushed the merge-tree branch 7 times, most recently from 4dc9941 to d814b58 Compare November 14, 2024 01:20
This is intended to be a noop, but sets up the structure for a second
way to track and merge runs.
Instead of maintaining one long run, this maintains a list of runs and
compacts them together when they get too long.
This was causing inline writes to be ~disabled for normal user batches,
because we generate the compaction config there now too.

For the new "compacting" path we want to allow inline writes in general,
but when parts are compacted together we still want to flush those
out... so it makes the most sense to shift the override to right before
it applies.
@bkirwi bkirwi marked this pull request as ready for review November 14, 2024 22:59
@bkirwi bkirwi requested a review from a team as a code owner November 14, 2024 22:59
@bkirwi
Copy link
Contributor Author

bkirwi commented Nov 14, 2024

I've made the number of runs per compaction directly configurable via a dyncfg, falling back to the old implementation when it's set < 2.

I've applied all the suggestions (I think!) and tried to make most of the commits behaviour-preserving. Only the last (non-bugfix) commit includes a functional change, and that's behind the disabled flag. Hopefully that makes it a somewhat easier review!

Copy link
Member

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed to the best of my ability and I think it looks good! Would maybe like to talk about this one in our 1:1 tomorrow to make sure I totally grok everything that's going on

/// - `finish` will return at most `K` elements.
/// - The "depth" of the merge tree - the number of merges any particular element may undergo -
/// is `O(log N)`.
pub struct MergeTree<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat! Love having this logic contained in a single struct and well documented

src/persist-client/src/batch.rs Show resolved Hide resolved
@bkirwi bkirwi merged commit 077ce5c into MaterializeInc:main Nov 21, 2024
80 checks passed
@bkirwi bkirwi deleted the merge-tree branch November 21, 2024 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants