Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compaction: accurately prioritize shared sstables for compaction #2893

Closed
itsbilal opened this issue Sep 11, 2023 · 1 comment
Closed

compaction: accurately prioritize shared sstables for compaction #2893

itsbilal opened this issue Sep 11, 2023 · 1 comment

Comments

@itsbilal
Copy link
Member

Currently, virtual sstables aren't prioritized for compaction at all (see #2892 for the simpler, non-shared case). However there's an additional dimension to consider for compacting shared virtual sstables, which is the proportion of a backing sstable that's referenced by other Pebble instances (possibly on other nodes). This proportion can be lazily updated on the marker files placed on shared storage, and on occasion Pebble can read these marker files on a sweep to update its own estimate of how much of an sstable is referenced by other nodes. A file that has a low reference proportion even when summing up reference-percentage-points across all nodes should be prioritized for compaction.

Some examples of how this could be implemented:

  1. if two Pebbles reference the entirety of a file, 200% of the file is referenced and we can deprioritize it for compaction picking within that level, preferring other files instead
  2. If two Pebbles reference 10% of a file each, 20% of the file is referenced in total and both Pebbles can prioritize compacting it away by boosting its overlapping size in pickCompactionSeedFile by 5x (i.e. Size/ReferencedSum) or 2.5x (i.e. the previous factor divided by 2) or so.

See the comment at #2538 (review) for more context on this issue.

@itsbilal
Copy link
Member Author

itsbilal commented Oct 3, 2023

Closing as this is tracked in #2598 instead.

@itsbilal itsbilal closed this as not planned Won't fix, can't repro, duplicate, stale Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

1 participant