Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc-compaction: split compaction jobs across keyspace #8921

Closed
Tracked by #9114 ...
skyzh opened this issue Sep 4, 2024 · 0 comments · Fixed by #9611
Closed
Tracked by #9114 ...

gc-compaction: split compaction jobs across keyspace #8921

skyzh opened this issue Sep 4, 2024 · 0 comments · Fixed by #9611
Assignees
Labels
c/storage/pageserver Component: storage: pageserver

Comments

@skyzh
Copy link
Member

skyzh commented Sep 4, 2024

I would feel more confident to run the compaction on prod tenants if we can horizontally split the keyspace and do a partition at a time. In this way,

  • We don't need to download all the layers to disk before compaction.
  • We can schedule compaction jobs in parallel.
  • It's fine to run the compaction partially, interleaving with L0 compaction.
@skyzh skyzh self-assigned this Sep 4, 2024
@skyzh skyzh added the c/storage/pageserver Component: storage: pageserver label Sep 4, 2024
skyzh added a commit that referenced this issue Oct 29, 2024
…#9134)

part of #8921,
#9114

## Summary of changes

We start the partial compaction implementation with the image layer
partial generation. The partial compaction API now takes a key range. We
will only generate images for that key range for now, and remove layers
fully included in the key range after compaction.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
skyzh added a commit that referenced this issue Nov 11, 2024
The final patch for partial compaction, part of
#9114, close
#8921 (note that we didn't
implement parallel compaction or compaction scheduler for partial
compaction -- currently this needs to be scheduled by using a Python
script to split the keyspace, and in the future, automatically split
based on the key partitioning when the pageserver wants to trigger a
gc-compaction)

## Summary of changes

* Update the layer selection algorithm to use the same selection as full
compaction (everything intersect/below gc horizon)
* Update the layer selection algorithm to also generate a list of delta
layers that need to be rewritten
* Add the logic to rewrite delta layers and add them back to the layer
map
* Update test case to do partial compaction on deltas

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant