-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sstable: new property collector API #3784
base: master
Are you sure you want to change the base?
sstable: new property collector API #3784
Conversation
We simplify the property collector API: an instance now only needs to maintain a single state, not three (data block, index, table). The sstable writer uses multiple separate instances in parallel. The instance for index blocks uses a new `AddCollected` API which adds computed properties from the data blocks. Similarly, the table properties are calculated in a separate instance using `AddCollected` with the properties from the index blocks and the range key block. This new interface will allow re-deriving index and table properties for a partial file download which copies whole data blocks. In the future, this would also allow copying entire data blocks during compactions.
ada8da1
to
b38f48b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 8 files at r1.
Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @RaduBerinde)
sstable/block_property.go
line 48 at r1 (raw file):
// derivable just from the properties of the data blocks it contains. Similarly, // the table properties must be derivable just from the properties of the index // blocks and range block.
I'm trying to think through whether we might regret this loss of power and expressibility. Even if we did eventually come to want to use some property that can't be hierarchical, I suppose it wouldn't be much effort to work it back in.
sstable/block_property.go
line 125 at r1 (raw file):
// error). SupportsSuffixReplacement() can be used to check if this method is // implemented. AddCollectedWithSuffixReplacement(oldProp []byte, newSuffix []byte) error
It seems we could further simplify the suffix replacement bit by making it a stateless function like
type BlockPropertySuffixTransform func(buf []byte, oldProp, newSuffix []byte) []byte
It feels a bit off to force BlockPropertyCollector
s to know about suffix replacement through SupportsSuffixReplacement
and AddCollectedWithSuffixReplacement
. Instead, the caller initiating the suffix replacement could supply a map of collector names to transforms. We error out if a block property exists for which there is no supplied transform.
sstable/writer.go
line 2047 at r1 (raw file):
// Add range key properties. rangeKeyProp := w.blockPropCollectors[i].rangeKeyBlock.Finish(scratch) if err := tableCollector.AddCollected(rangeKeyProp); err != nil {
Is this right? My recollection is that range key properties and point key properties are separate. Eg, a table-level block property should not reflect the state of the contained range keys. The reason is because the point and range key iterators are separate, so the decision of whether or not to examine a particular file should be too. For this reason IterOptions
takes separate block-property filters for point keys and range keys:
Lines 133 to 145 in 368636f
// PointKeyFilters can be used to avoid scanning tables and blocks in tables | |
// when iterating over point keys. This slice represents an intersection | |
// across all filters, i.e., all filters must indicate that the block is | |
// relevant. | |
// | |
// Performance note: When len(PointKeyFilters) > 0, the caller should ensure | |
// that cap(PointKeyFilters) is at least len(PointKeyFilters)+1. This helps | |
// avoid allocations in Pebble internal code that mutates the slice. | |
PointKeyFilters []BlockPropertyFilter | |
// RangeKeyFilters can be usefd to avoid scanning tables and blocks in tables | |
// when iterating over range keys. The same requirements that apply to | |
// PointKeyFilters apply here too. | |
RangeKeyFilters []BlockPropertyFilter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @jbowens)
sstable/writer.go
line 2047 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Is this right? My recollection is that range key properties and point key properties are separate. Eg, a table-level block property should not reflect the state of the contained range keys. The reason is because the point and range key iterators are separate, so the decision of whether or not to examine a particular file should be too. For this reason
IterOptions
takes separate block-property filters for point keys and range keys:Lines 133 to 145 in 368636f
// PointKeyFilters can be used to avoid scanning tables and blocks in tables // when iterating over point keys. This slice represents an intersection // across all filters, i.e., all filters must indicate that the block is // relevant. // // Performance note: When len(PointKeyFilters) > 0, the caller should ensure // that cap(PointKeyFilters) is at least len(PointKeyFilters)+1. This helps // avoid allocations in Pebble internal code that mutates the slice. PointKeyFilters []BlockPropertyFilter // RangeKeyFilters can be usefd to avoid scanning tables and blocks in tables // when iterating over range keys. The same requirements that apply to // PointKeyFilters apply here too. RangeKeyFilters []BlockPropertyFilter
Hm.. but there is no separate place to store the range-key derived properties, is there? We don't write out props for the range key block currently..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @jbowens)
sstable/block_property.go
line 125 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
It seems we could further simplify the suffix replacement bit by making it a stateless function like
type BlockPropertySuffixTransform func(buf []byte, oldProp, newSuffix []byte) []byte
It feels a bit off to force
BlockPropertyCollector
s to know about suffix replacement throughSupportsSuffixReplacement
andAddCollectedWithSuffixReplacement
. Instead, the caller initiating the suffix replacement could supply a map of collector names to transforms. We error out if a block property exists for which there is no supplied transform.
This mapping would have to be provided through the Options, yes? Feels more clunky than requiring it innthe implementation.. but I'll experiment with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @RaduBerinde)
sstable/writer.go
line 2047 at r1 (raw file):
Previously, RaduBerinde wrote…
Hm.. but there is no separate place to store the range-key derived properties, is there? We don't write out props for the range key block currently..
Hmmm... maybe range key block property filters have just never worked. I know we don't use them in Cockroach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @jbowens)
sstable/block_property.go
line 48 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I'm trying to think through whether we might regret this loss of power and expressibility. Even if we did eventually come to want to use some property that can't be hierarchical, I suppose it wouldn't be much effort to work it back in.
Yes.. I could imagine a case where the property is too large to store at the data block level but is reasonable for the table level. But if we want to be able to derive properties for partially downloaded sst files, we will need this stricter framework.
sstable/writer.go
line 2047 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Hmmm... maybe range key block property filters have just never worked. I know we don't use them in Cockroach
The RangeKeyFilters code use the table properties (newRangeKeyIter -> checkAndIntersectFilters -> sstable.IntersectsTable), so they should work (but of course the properties will include point keys).
It looks like we stopped using them in cockroachdb/cockroach#86260 but I'm not sure I quite understand the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @RaduBerinde)
sstable/writer.go
line 2047 at r1 (raw file):
Previously, RaduBerinde wrote…
The RangeKeyFilters code use the table properties (newRangeKeyIter -> checkAndIntersectFilters -> sstable.IntersectsTable), so they should work (but of course the properties will include point keys).
It looks like we stopped using them in cockroachdb/cockroach#86260 but I'm not sure I quite understand the problem.
The MVCCIncrementalIterator relies on very subtle semantics to keep the time-bound iterator and non-bounded iterator in sync. I believe the issue is that the user-observed fragmentation varies (because we're hiding some range keys), which breaks the assumptions that keep the two iterators in lock step.
In any case, I don't think we want to combine point and range key properties since we'll only be reducing the effectiveness of each.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @jbowens)
sstable/writer.go
line 2047 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
The MVCCIncrementalIterator relies on very subtle semantics to keep the time-bound iterator and non-bounded iterator in sync. I believe the issue is that the user-observed fragmentation varies (because we're hiding some range keys), which breaks the assumptions that keep the two iterators in lock step.
In any case, I don't think we want to combine point and range key properties since we'll only be reducing the effectiveness of each.
This is great, thanks. So in the short term, we can just not compute range key properties. And in a follow-up I will add the ability to compute them and store them in the range key block handle.
sstable: new property collector API
We simplify the property collector API: an instance now only needs to
maintain a single state, not three (data block, index, table). The
sstable writer uses multiple separate instances in parallel. The
instance for index blocks uses a new
AddCollected
API which addscomputed properties from the data blocks. Similarly, the table
properties are calculated in a separate instance using
AddCollected
with the properties from the index blocks and the range key block.
This new interface will allow re-deriving index and table properties
for a partial file download which copies whole data blocks. In the
future, this would also allow copying entire data blocks during
compactions.
sstable: use sync.Pool for interval block collectors