Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: compaction improvements meta issue #552

Open
petermattis opened this issue Feb 25, 2020 · 3 comments
Open

perf: compaction improvements meta issue #552

petermattis opened this issue Feb 25, 2020 · 3 comments

Comments

@petermattis
Copy link
Collaborator

petermattis commented Feb 25, 2020

There are various proposals for improvements to compaction heuristics, policies, and mechanics. This issue is an entry point to all of those sub-issues.

Generalized "Move" Compactions

A compaction normally takes all of its input tables and constructs a set of new output tables. There is an exception to this behavior: move compactions take the input tables and "move" them (via metadata operations) directly to the output level. Move compactions can only be performed if there are no overlapping tables in the output. The two issues below both play with the idea of generalizing move compactions so that we can reuse input sstables and thus significantly lower the IO for certain compactions.

#518: perf: generalized "move" compactions
#389: perf: skip output level sstables with no data overlap with input level

L0 Partitioning / User-defined Partitions

When a set of memtables is flushed to L0, only a single sstable is created. This leads to a problem: an L0 sstable often overlaps a significant portion (all!) of the sstables in Lbase. This in turn leads to problems with L0->Lbase compactions starving Lbase->Lbase+1 compactions. One solution to this problem is to partition L0 sstables. As described in #517, the intention is provide a callback so that user's can request a split point between two user-keys. In addition to the in-progress PRs below, we'd need to change the L0 invariant checks, and enhance the L0 compaction picking heuristics.

#517: perf: support user-defined "guards" for sstable boundaries
#546: internal/manifest: add Version.L0Sublevels
#536: db: add support for user-defined sstable partitioning

Multi-level Compactions

Compactions currently involve two levels: a start level and an output level. The slight nuance to this statement is that each sstable from L0 is considered a distinct level. The idea behind multi-level compactions is to avoid adding more compaction work in the future. If we're compacting to level N, we can estimate the increase in size of N and if it would exceed its size threshold, we should actually compaction to level N+1. The balanced-rent-or-buy compaction strategy described in #48 takes this to an extreme and demonstrates much better write amplification than other compaction strategies in simulation.

#136: perf: investigate N-level compactions (N > 2)
#48: perf: better compaction heuristics/policy

"Notched" Multi-output-level Compactions

In addition to allowing inputs from multiple levels, a compaction can write outputs to multiple levels. This idea was suggested in the context of addressing the L0->Lbase compaction problem.

#136: perf: investigate N-level compactions (N > 2)

Read-triggered Compactions

In Pebble, compactions are currently only triggered due to writes. The deficiency with this trigger is that the LSM will settle into a shape which is optimized for handling future writes, but not for reads. In a read-mostly workload, we would like to perform additional compactions to reduce the number of levels in order to speed up reads. LevelDB contained a read-based compaction trigger that did exactly this.

#29: perf: read-based compaction heuristic

Range-tombstone Compaction Heuristics

The heuristics for deciding which file within a level to compact do not perform any special consideration for range tombstones. A range tombstone can delete a large swath of data, making it desirable to quickly compact sstables containing range tombstones in order to reclaim space and to avoid rewriting data lying beneath the range tombstone which will only be deleted by a later compaction.

#319: db: incorporate range tombstones into compaction heuristics

Deletion-only Compactions

If an sstable is entirely covered by a range tombstone in a higher level and the range of seqnums in the sstable and the range tombstone are in the same snapshot stripe, the sstable can be deleted. This is better than a normal compaction as the sstable does not have to be read.

#720: perf: investigate deletion-only compactions

Pipeline Decompression and Compression During Compaction

Block decompression and compression show up prominently in CPU profiles taken during compaction. We could speed up the wall time of compactions by parallelizing this work.

#599: perf: pipeline compression during compaction
#1722: perf: pipeline decompression during compaction

Jira issue: PEBBLE-176

@github-actions
Copy link

github-actions bot commented Mar 2, 2022

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it
in 10 days to keep the issue queue tidy. Thank you for your
contribution to Pebble!

Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it
in 10 days to keep the issue queue tidy. Thank you for your
contribution to Pebble!

@petermattis
Copy link
Collaborator Author

Worth keeping this meta issue around as well as the sub-issues it links to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

2 participants