sstable: new property collector API #3784

RaduBerinde · 2024-07-22T19:30:32Z

sstable: new property collector API

We simplify the property collector API: an instance now only needs to
maintain a single state, not three (data block, index, table). The
sstable writer uses multiple separate instances in parallel. The
instance for index blocks uses a new AddCollected API which adds
computed properties from the data blocks. Similarly, the table
properties are calculated in a separate instance using AddCollected
with the properties from the index blocks and the range key block.

This new interface will allow re-deriving index and table properties
for a partial file download which copies whole data blocks. In the
future, this would also allow copying entire data blocks during
compactions.

sstable: use sync.Pool for interval block collectors

cockroach-teamcity · 2024-07-22T19:30:40Z

This change is

We simplify the property collector API: an instance now only needs to maintain a single state, not three (data block, index, table). The sstable writer uses multiple separate instances in parallel. The instance for index blocks uses a new `AddCollected` API which adds computed properties from the data blocks. Similarly, the table properties are calculated in a separate instance using `AddCollected` with the properties from the index blocks and the range key block. This new interface will allow re-deriving index and table properties for a partial file download which copies whole data blocks. In the future, this would also allow copying entire data blocks during compactions.

jbowens

Reviewed 5 of 8 files at r1.
Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @RaduBerinde)

sstable/block_property.go line 48 at r1 (raw file):

// derivable just from the properties of the data blocks it contains. Similarly,
// the table properties must be derivable just from the properties of the index
// blocks and range block.

I'm trying to think through whether we might regret this loss of power and expressibility. Even if we did eventually come to want to use some property that can't be hierarchical, I suppose it wouldn't be much effort to work it back in.

sstable/block_property.go line 125 at r1 (raw file):

	// error). SupportsSuffixReplacement() can be used to check if this method is
	// implemented.
	AddCollectedWithSuffixReplacement(oldProp []byte, newSuffix []byte) error

It seems we could further simplify the suffix replacement bit by making it a stateless function like

type BlockPropertySuffixTransform func(buf []byte, oldProp, newSuffix []byte) []byte

It feels a bit off to force BlockPropertyCollectors to know about suffix replacement through SupportsSuffixReplacement and AddCollectedWithSuffixReplacement. Instead, the caller initiating the suffix replacement could supply a map of collector names to transforms. We error out if a block property exists for which there is no supplied transform.

sstable/writer.go line 2047 at r1 (raw file):

				// Add range key properties.
				rangeKeyProp := w.blockPropCollectors[i].rangeKeyBlock.Finish(scratch)
				if err := tableCollector.AddCollected(rangeKeyProp); err != nil {

Is this right? My recollection is that range key properties and point key properties are separate. Eg, a table-level block property should not reflect the state of the contained range keys. The reason is because the point and range key iterators are separate, so the decision of whether or not to examine a particular file should be too. For this reason IterOptions takes separate block-property filters for point keys and range keys:

pebble/options.go

Lines 133 to 145 in 368636f

    
           // PointKeyFilters can be used to avoid scanning tables and blocks in tables 
        
           // when iterating over point keys. This slice represents an intersection 
        
           // across all filters, i.e., all filters must indicate that the block is 
        
           // relevant. 
        
           // 
        
           // Performance note: When len(PointKeyFilters) > 0, the caller should ensure 
        
           // that cap(PointKeyFilters) is at least len(PointKeyFilters)+1. This helps 
        
           // avoid allocations in Pebble internal code that mutates the slice. 
        
           PointKeyFilters []BlockPropertyFilter 
        
           // RangeKeyFilters can be usefd to avoid scanning tables and blocks in tables 
        
           // when iterating over range keys. The same requirements that apply to 
        
           // PointKeyFilters apply here too. 
        
           RangeKeyFilters []BlockPropertyFilter

RaduBerinde

Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @jbowens)

sstable/writer.go line 2047 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Is this right? My recollection is that range key properties and point key properties are separate. Eg, a table-level block property should not reflect the state of the contained range keys. The reason is because the point and range key iterators are separate, so the decision of whether or not to examine a particular file should be too. For this reason IterOptions takes separate block-property filters for point keys and range keys:

pebble/options.go

Lines 133 to 145 in 368636f

// PointKeyFilters can be used to avoid scanning tables and blocks in tables

// when iterating over point keys. This slice represents an intersection

// across all filters, i.e., all filters must indicate that the block is

// relevant.

//

// Performance note: When len(PointKeyFilters) > 0, the caller should ensure

// that cap(PointKeyFilters) is at least len(PointKeyFilters)+1. This helps

// avoid allocations in Pebble internal code that mutates the slice.

PointKeyFilters []BlockPropertyFilter

// RangeKeyFilters can be usefd to avoid scanning tables and blocks in tables

// when iterating over range keys. The same requirements that apply to

// PointKeyFilters apply here too.

RangeKeyFilters []BlockPropertyFilter

Hm.. but there is no separate place to store the range-key derived properties, is there? We don't write out props for the range key block currently..

RaduBerinde

Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @jbowens)

sstable/block_property.go line 125 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

It seems we could further simplify the suffix replacement bit by making it a stateless function like
type BlockPropertySuffixTransform func(buf []byte, oldProp, newSuffix []byte) []byte
It feels a bit off to force BlockPropertyCollectors to know about suffix replacement through SupportsSuffixReplacement and AddCollectedWithSuffixReplacement. Instead, the caller initiating the suffix replacement could supply a map of collector names to transforms. We error out if a block property exists for which there is no supplied transform.

This mapping would have to be provided through the Options, yes? Feels more clunky than requiring it innthe implementation.. but I'll experiment with it.

jbowens

Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @RaduBerinde)

sstable/writer.go line 2047 at r1 (raw file):

Previously, RaduBerinde wrote…

Hm.. but there is no separate place to store the range-key derived properties, is there? We don't write out props for the range key block currently..

Hmmm... maybe range key block property filters have just never worked. I know we don't use them in Cockroach

RaduBerinde

Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @jbowens)

sstable/block_property.go line 48 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I'm trying to think through whether we might regret this loss of power and expressibility. Even if we did eventually come to want to use some property that can't be hierarchical, I suppose it wouldn't be much effort to work it back in.

Yes.. I could imagine a case where the property is too large to store at the data block level but is reasonable for the table level. But if we want to be able to derive properties for partially downloaded sst files, we will need this stricter framework.

sstable/writer.go line 2047 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Hmmm... maybe range key block property filters have just never worked. I know we don't use them in Cockroach

The RangeKeyFilters code use the table properties (newRangeKeyIter -> checkAndIntersectFilters -> sstable.IntersectsTable), so they should work (but of course the properties will include point keys).

It looks like we stopped using them in cockroachdb/cockroach#86260 but I'm not sure I quite understand the problem.

jbowens

Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @RaduBerinde)

sstable/writer.go line 2047 at r1 (raw file):

Previously, RaduBerinde wrote…

The RangeKeyFilters code use the table properties (newRangeKeyIter -> checkAndIntersectFilters -> sstable.IntersectsTable), so they should work (but of course the properties will include point keys).

It looks like we stopped using them in cockroachdb/cockroach#86260 but I'm not sure I quite understand the problem.

The MVCCIncrementalIterator relies on very subtle semantics to keep the time-bound iterator and non-bounded iterator in sync. I believe the issue is that the user-observed fragmentation varies (because we're hiding some range keys), which breaks the assumptions that keep the two iterators in lock step.

In any case, I don't think we want to combine point and range key properties since we'll only be reducing the effectiveness of each.

RaduBerinde

Reviewable status: 1 of 8 files reviewed, 3 unresolved discussions (waiting on @jbowens)

sstable/writer.go line 2047 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

The MVCCIncrementalIterator relies on very subtle semantics to keep the time-bound iterator and non-bounded iterator in sync. I believe the issue is that the user-observed fragmentation varies (because we're hiding some range keys), which breaks the assumptions that keep the two iterators in lock step.

In any case, I don't think we want to combine point and range key properties since we'll only be reducing the effectiveness of each.

This is great, thanks. So in the short term, we can just not compute range key properties. And in a follow-up I will add the ability to compute them and store them in the range key block handle.

RaduBerinde requested a review from jbowens July 22, 2024 19:30

RaduBerinde requested a review from a team as a code owner July 22, 2024 19:30

RaduBerinde added 2 commits July 22, 2024 12:46

sstable: use sync.Pool for interval block collectors

b38f48b

RaduBerinde force-pushed the interval-collector-new-api-and-pool branch from ada8da1 to b38f48b Compare July 22, 2024 19:47

jbowens reviewed Jul 29, 2024

View reviewed changes

RaduBerinde commented Jul 29, 2024

View reviewed changes

jbowens reviewed Jul 29, 2024

View reviewed changes

RaduBerinde commented Jul 30, 2024

View reviewed changes

jbowens reviewed Jul 30, 2024

View reviewed changes

RaduBerinde commented Jul 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sstable: new property collector API #3784

sstable: new property collector API #3784

RaduBerinde commented Jul 22, 2024

cockroach-teamcity commented Jul 22, 2024

jbowens left a comment

RaduBerinde left a comment

RaduBerinde left a comment

jbowens left a comment

RaduBerinde left a comment

jbowens left a comment

RaduBerinde left a comment

	// PointKeyFilters can be used to avoid scanning tables and blocks in tables
	// when iterating over point keys. This slice represents an intersection
	// across all filters, i.e., all filters must indicate that the block is
	// relevant.
	//
	// Performance note: When len(PointKeyFilters) > 0, the caller should ensure
	// that cap(PointKeyFilters) is at least len(PointKeyFilters)+1. This helps
	// avoid allocations in Pebble internal code that mutates the slice.
	PointKeyFilters []BlockPropertyFilter
	// RangeKeyFilters can be usefd to avoid scanning tables and blocks in tables
	// when iterating over range keys. The same requirements that apply to
	// PointKeyFilters apply here too.
	RangeKeyFilters []BlockPropertyFilter

sstable: new property collector API #3784

Are you sure you want to change the base?

sstable: new property collector API #3784

Conversation

RaduBerinde commented Jul 22, 2024

sstable: new property collector API

sstable: use sync.Pool for interval block collectors

cockroach-teamcity commented Jul 22, 2024

jbowens left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment