Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: investigate alignment of key bounds in compactions #2156

Closed
sumeerbhola opened this issue Dec 1, 2022 · 4 comments · Fixed by #2316
Closed

db: investigate alignment of key bounds in compactions #2156

sumeerbhola opened this issue Dec 1, 2022 · 4 comments · Fixed by #2316

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Dec 1, 2022

Compactions from one non-overlapping level to another have a shape constraint in that the higher level's key span cannot be wider than the lower level. In compactions out of L0, there can be multiple sub-levels involved and the same key span constraint has to apply to all the successive (sub)levels from low to high, e.g.

    L0.2     e--------o
    L0.1    d-----------q
    L0.0  c----------------r
    L2   b------------------t

We are also planning to implement multi-level compactions across non-overlapping levels #136

There is an efficiency loss with this constraint in that if we are rewriting (r,t] and [b,c) in L2, we have wasted that rewrite cost by leaving keys in those key spans at the higher levels.

We should first investigate how much data we are leaving behind, that overlaps with the lowest level's key span, for common workloads (ycsb/kv0/tpcc). It is possible that the current grandparent based splitting logic is resulting in fairly aligned files.

@jbowens
Copy link
Collaborator

jbowens commented Dec 1, 2022

It is possible that the grandparent based splitting logic is resulting in aligned files.

Relatedly, I think there might be gains to be had by tuning the grandparent splitting. The grandparent splitting uses 10 times the output level's target file size as the max overlap:

// maxGrandparentOverlapBytes is the maximum bytes of overlap with level+1
// before we stop building a single file in a level-1 to level compaction.
func maxGrandparentOverlapBytes(opts *Options, level int) uint64 {
	return uint64(10 * opts.Level(level).TargetFileSize)
}

This seems like this would only result in a Li file being aligned with a Li+1 ~every five Li+1 files, given the doubling of the target file size. It's not clear to me what negative consequence there is to splitting more aggressively to align with grandparents? We don't want too small of files, but maybe a heuristic that takes into account the current output file's size could prevent that.

@sumeerbhola
Copy link
Collaborator Author

Li+1 files being narrower in the key span by 5x seems desirable. When Li is full, and we pick one file from that level to compact down, it is good that the next lower level has narrower files since then we don't have to expand the compaction bounds by much. I may be missing something.

@jbowens
Copy link
Collaborator

jbowens commented Dec 5, 2022

That makes sense. I didn't have a well articulated idea, just a sense that only considering the grandparent bounds in this way is leaving an opportunity to avoid unnecessary overlap with the grandparent. I haven't given the RocksDB blog post that @irfansharif posted a thorough read, but it seems like a similar concept.

@sumeerbhola sumeerbhola changed the title db: investigate virtual sstables to align key bounds in compactions db: investigate alignment of key bounds in compactions Dec 6, 2022
@jbowens jbowens self-assigned this Jan 18, 2023
@jbowens
Copy link
Collaborator

jbowens commented Jan 20, 2023

A prototype shows a ~12% w-amp improvement on a small kv0 workload:

                         │ old-kvshort.txt │         new-kvshort-3.txt         │
                         │      wamp       │    wamp     vs base               │
Replay/kv0short/WriteAmp       5.587 ± 12%   4.891 ± 4%  -12.44% (p=0.002 n=6)

and fewer compactions:

                                 │ old-kvshort.txt │         new-kvshort-3.txt          │
                                 │   compactions   │ compactions  vs base               │
Replay/kv0short/CompactionCounts       452.0 ± 28%    393.5 ± 9%  -12.94% (p=0.002 n=6)

                                 │ old-kvshort.txt │         new-kvshort-3.txt          │
                                 │     default     │   default    vs base               │
Replay/kv0short/CompactionCounts       451.5 ± 28%   393.5 ± 10%  -12.85% (p=0.002 n=6)

jbowens added a commit to jbowens/pebble that referenced this issue Jan 22, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Jan 22, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Jan 31, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 2, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 2, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 2, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 3, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 3, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 6, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 8, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit to jbowens/pebble that referenced this issue Feb 8, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
jbowens added a commit that referenced this issue Feb 8, 2023
Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by #2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots #1810).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants