-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: compaction improvements meta issue #552
Comments
We have marked this issue as stale because it has been inactive for |
We have marked this issue as stale because it has been inactive for |
Worth keeping this meta issue around as well as the sub-issues it links to. |
There are various proposals for improvements to compaction heuristics, policies, and mechanics. This issue is an entry point to all of those sub-issues.
Generalized "Move" Compactions
A compaction normally takes all of its input tables and constructs a set of new output tables. There is an exception to this behavior: move compactions take the input tables and "move" them (via metadata operations) directly to the output level. Move compactions can only be performed if there are no overlapping tables in the output. The two issues below both play with the idea of generalizing move compactions so that we can reuse input sstables and thus significantly lower the IO for certain compactions.
#518: perf: generalized "move" compactions
#389: perf: skip output level sstables with no data overlap with input level
L0 Partitioning / User-defined Partitions
When a set of memtables is flushed to L0, only a single sstable is created. This leads to a problem: an L0 sstable often overlaps a significant portion (all!) of the sstables in Lbase. This in turn leads to problems with L0->Lbase compactions starving Lbase->Lbase+1 compactions. One solution to this problem is to partition L0 sstables. As described in #517, the intention is provide a callback so that user's can request a split point between two user-keys. In addition to the in-progress PRs below, we'd need to change the L0 invariant checks, and enhance the L0 compaction picking heuristics.
#517: perf: support user-defined "guards" for sstable boundaries
#546: internal/manifest: add Version.L0Sublevels#536: db: add support for user-defined sstable partitioning
Multi-level Compactions
Compactions currently involve two levels: a start level and an output level. The slight nuance to this statement is that each sstable from L0 is considered a distinct level. The idea behind multi-level compactions is to avoid adding more compaction work in the future. If we're compacting to level N, we can estimate the increase in size of N and if it would exceed its size threshold, we should actually compaction to level N+1. The balanced-rent-or-buy compaction strategy described in #48 takes this to an extreme and demonstrates much better write amplification than other compaction strategies in simulation.
#136: perf: investigate N-level compactions (N > 2)
#48: perf: better compaction heuristics/policy
"Notched" Multi-output-level Compactions
In addition to allowing inputs from multiple levels, a compaction can write outputs to multiple levels. This idea was suggested in the context of addressing the L0->Lbase compaction problem.
#136: perf: investigate N-level compactions (N > 2)
Read-triggered Compactions
In Pebble, compactions are currently only triggered due to writes. The deficiency with this trigger is that the LSM will settle into a shape which is optimized for handling future writes, but not for reads. In a read-mostly workload, we would like to perform additional compactions to reduce the number of levels in order to speed up reads. LevelDB contained a read-based compaction trigger that did exactly this.
#29: perf: read-based compaction heuristicRange-tombstone Compaction Heuristics
The heuristics for deciding which file within a level to compact do not perform any special consideration for range tombstones. A range tombstone can delete a large swath of data, making it desirable to quickly compact sstables containing range tombstones in order to reclaim space and to avoid rewriting data lying beneath the range tombstone which will only be deleted by a later compaction.
#319: db: incorporate range tombstones into compaction heuristicsDeletion-only Compactions
If an sstable is entirely covered by a range tombstone in a higher level and the range of seqnums in the sstable and the range tombstone are in the same snapshot stripe, the sstable can be deleted. This is better than a normal compaction as the sstable does not have to be read.
#720: perf: investigate deletion-only compactionsPipeline Decompression and Compression During Compaction
Block decompression and compression show up prominently in CPU profiles taken during compaction. We could speed up the wall time of compactions by parallelizing this work.
#599: perf: pipeline compression during compaction#1722: perf: pipeline decompression during compaction
Jira issue: PEBBLE-176
The text was updated successfully, but these errors were encountered: