-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: don't prioritize compaction of pinned range tombstones #872
Comments
So the compensated size would only apply if the range tombstone can actually have an effect on the records in lower level sstables? This sounds similar to the logic needed for deletion-only compactions. I think this would be a worthwhile enhancement. |
I wonder in practice how frequently range tombstones fall into the same snapshot stripe as keys they drop, but not into the last snapshot stripe. It would be a minimal change to only compensate files whose sequence numbers are less than the oldest snapshot. If we wanted to get more granular, we could add more granular accounting similar to deletion-only hints into the file metadata's table stats. |
Yeah. It would be interesting to experiment with that and see if it has any negative effect on the |
When replicating a range, are all the keys that Cockroach reads under a range prefix? Have we ever considered Pebble-level 'range snapshots' to provide snapshot semantics for only a key range? When initializing a compaction's snapshot list, we could skip any 'range snapshots' that have ranges disjoint from the compaction's key space. The goal would be to drop range tombstones and the data they cover faster during rebalancing during IMPORT, in hopes of being able to ingest more tables directly into L6. |
Unfortunately no. Each CRDB range comes with 3 distinct key ranges: the range-id local key range, the range local key range, and the user key range. See
I've had this idea, though never explored it in detail. I'm not even sure if I've ever written it down. The part I worried about was what to do for iteration on a "range" snapshot. What semantics would we use if you iterate outside of the range the snapshot was defined on. One possibility is that the "range" snapshot provides an implicit Lower and Upper bound on iteration. This idea might be a non-starter due to the 3 key ranges CRDB needs to read from to replicate a Range. |
When you create a pebble iterator from a snapshot, you could be required to provide one of the ranges provided when the snapshot was created. You create one pebble iterator per snapshotted-range. In CockroachDB, we could hide that detail by writing an iterator implementation that steps between the individual pebble iterators. It seems like most of the benefit of snapshotting just a key range is in the user key range, where the bulk of the data lives. If the extra pebble iterators pose a problem, we could snapshot all of the local range and just the necessary part of the user key range? |
We have marked this issue as stale because it has been inactive for |
With the introduction of min-overlapping ratio heuristic #707, Pebble started prioritizing compaction of range tombstones by inflating start-level file sizes by an estimate of data covered by range tombstones.
Prioritizing the compaction of range tombstones has a few benefits:
But if an open snapshot prevents a range tombstone from dropping keys, these first two benefits do not apply. Additionally, if the output level is L6, these compactions may have a negative effect of cementing tombstones into the bottommost level (#517 (comment)) where they're only cleared by low-priority elision-only compactions.
We might want to improve prioritization of elision-only compactions. However, seeing as these compactions add write amplification that otherwise could've been avoided, maybe we should try to avoid prioritizing compactions that are unlikely to reclaim disk space. This could be done through using uncompensated file sizes during compaction picking under some conditions, like when a start level file's largest sequence number does not fall in the last snapshot stripe.
The text was updated successfully, but these errors were encountered: