sstable: prefix bloom filter #5

petermattis · 2018-08-03T12:00:32Z

Add support for constructing the bloom filter on a prefix of the user key. Add a Comparator.PrefixExtractor function for extracting the prefix.

One question here is whether this should be considered a prefix extractor or restricted to user keys which have a version. An alternate api would be to provide Comparator.Split to split a user key into a prefix and a version suffix. This might be useful in the future for segregating historic versions of user keys into a separate storage area.

The text was updated successfully, but these errors were encountered:

petermattis · 2019-02-20T18:35:26Z

@tbg This is done, right?

tbg · 2019-02-20T19:12:41Z

No, iterator support is missing. Was planning to continue with that.

…

On Wed, Feb 20, 2019, 19:35 Peter Mattis ***@***.*** wrote: @tbg <https://github.com/tbg> This is done, right? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/petermattis/pebble/issues/5#issuecomment-465701141>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE135MwwEqYH9R7EWunms7roeL8UJmhhks5vPZVugaJpZM4Vt3IK> .

petermattis · 2019-05-01T11:53:32Z

I think this would make a reasonable starter project. What remains to be done is to implement IterOptions.Prefix. The comment above that field says:

// If Prefix is true, the iterator will only be used to iterate over keys
// matching that of the key it is first positioned at. If the Comparer was
// supplied with a user-defined Split function and bloom filters are
// enabled, this allows for improved performance by skipping SSTables known
// not to contain the given prefix. The iterator will not properly observe
// keys not matching the prefix.
//
// TODO(tbg): should an assertion trip if the first key's prefix is unstable?
// TODO(tbg): should Prefix override (or sharpen) {Lower,Upper}Bound? When
// we see the first key, we get the prefix and a separator which should be
// a good {Lower,Upper}Bound.
// TODO(tbg): unimplemented.
// Prefix bool

Ryanfsdf · 2019-05-13T14:46:07Z

@petermattis @tbg

I'm wondering if IterOptions.Prefix should even be required, or if the existence of Split(Prefix extractor) should indicate that all iterators for the Reader use prefix iteration. This seems to be the approach that RocksDB is taking now (as stated here in the “What’s Changed” section). I don’t like the idea of Options.Comparer.Split being a prerequisite for IterOptions.Prefix, since setting Options.Comparer.Split already communicates that prefix iterators should be used.

The benefit that IterOptions.Prefix offers is that we can still do full key iteration without creating a new Reader, which comes at the cost of the clarify of Options and IterOptions. I don’t think this benefit is worth the cost, but I’m open to suggestions.

Also, I think {Lower,Upper}Bound can be ignored for prefix iterators, since I don’t think there’s a use case for specifying {Lower,Upper}Bound when iterating through a specific prefix. As mentioned here, it seems that CockroachDB never sets {Lower,Upper}Bound for prefix iterators anyways.

In terms of First() and Last(), I’m in favor of leaving them unimplemented. One use case of First() and Last() for prefix iterators is that we may support these methods only if Seek() has already been called, in which case we can let First() return the first key with the prefix and Last() return the last key with the prefix. However, this would change the function contract to require a call to Seek() before calling First() or Last(), which I don’t particularly like. Supporting this would further increase the divide between regular Iterators and Prefix Iterators, in which case we should create a new class PrefixIter entirely, instead of making it an option.

petermattis · 2019-05-13T20:28:12Z

I'm wondering if IterOptions.Prefix should even be required, or if the existence of Split(Prefix extractor) should indicate that all iterators for the Reader use prefix iteration. This seems to be the approach that RocksDB is taking now (as stated here in the “What’s Changed” section). I don’t like the idea of Options.Comparer.Split being a prerequisite for IterOptions.Prefix, since setting Options.Comparer.Split already communicates that prefix iterators should be used.

Note that we don't want all iterators to be prefix iterators as most scan operations need to find multiple keys, not just keys matching the prefix of the seek key. In CRDB, we specify ReadOptions.total_order_seek to disable prefix iteration. My preference is for prefix iteration to be the opt-in mode. @ajkr looks like he had input into the new RocksDB API. Perhaps he can shed light on why it was chosen.

The benefit that IterOptions.Prefix offers is that we can still do full key iteration without creating a new Reader, which comes at the cost of the clarify of Options and IterOptions. I don’t think this benefit is worth the cost, but I’m open to suggestions.

Most of the time, the Reader is the DB, so "creating a new Reader" is not even an option.

Also, I think {Lower,Upper}Bound can be ignored for prefix iterators, since I don’t think there’s a use case for specifying {Lower,Upper}Bound when iterating through a specific prefix. As mentioned here, it seems that CockroachDB never sets {Lower,Upper}Bound for prefix iterators anyways.

Agreed.

In terms of First() and Last(), I’m in favor of leaving them unimplemented.

Agreed.

ajkr · 2019-05-18T02:16:56Z

Note that we don't want all iterators to be prefix iterators as most scan operations need to find multiple keys, not just keys matching the prefix of the seek key. In CRDB, we specify ReadOptions.total_order_seek to disable prefix iteration. My preference is for prefix iteration to be the opt-in mode. @ajkr looks like he had input into the new RocksDB API. Perhaps he can shed light on why it was chosen.

I don't recall discussions around the total_order_seek / prefix_same_as_start API. It must've been decided a very long time ago. I do not know the reasons for it either, but forgetting to set total_order_seek=true is incredibly common, indicating the API needs improvement. Fortunately it'll be getting the attention it deserves soon via making use of iterate_{lower,upper}_bound together with the Seek / SeekForPrev targets to automatically infer whether the user is doing a prefix scan or larger range scan.

petermattis · 2019-05-18T02:20:19Z

@ajkr How do you feel about the proposed API in this PR: Iterator.SeekPrefixGE? This makes it clear that prefix iteration is being requested, and allows to switch between prefix and non-prefix iteration using the same iterator.

ajkr · 2019-05-18T02:22:15Z

Taking a look now. Sorry I got addicted to the release problem for too long this week.

ajkr · 2019-05-18T02:57:26Z

It's a great API. I can barely think of how to misuse it, but let's see :).

The only part I didn't get is the discussion around ignoring UpperBound when SeekPrefixGE is used. Let's say I want to lookup user key "a" with timestamp > 100 (the timestamps are ordered descending per our comparator). My intuition would be to set UpperBound = "a@100" and call SeekPrefixGE("a@MaxTimestamp"). I'd expect it to both use bloom filter and not return keys at or above the upper bound.

ajkr · 2019-05-18T03:01:32Z

Do we plan for UpperBound to be dynamically changeable? libroach currently changes iterate_upper_bound before a Seek (which might or might not be safe), so I wonder whether we'd need to use Pebble in that way as well.

Ryanfsdf · 2019-05-18T04:13:22Z

The only part I didn't get is the discussion around ignoring UpperBound when SeekPrefixGE is used. Let's say I want to lookup user key "a" with timestamp > 100 (the timestamps are ordered descending per our comparator). My intuition would be to set UpperBound = "a@100" and call SeekPrefixGE("a@MaxTimestamp"). I'd expect it to both use bloom filter and not return keys at or above the upper bound.

Hi Andrew, I initially stated in my PR that {Lower,Upper}Bound would be ignored but I later revised it so that {Lower,Upper}Bound are both respected. So your assumption that it would both use bloom filters and not return keys at or above the upper bound is correct.

However, in the current state of Pebble, setting UpperBound requires creating a new iterator (Which doesn't seem ideal if we want to set UpperBound on the fly, like what you mentioned is done in libroach).

petermattis · 2019-05-18T14:24:38Z

Do we plan for UpperBound to be dynamically changeable? libroach currently changes iterate_upper_bound before a Seek (which might or might not be safe), so I wonder whether we'd need to use Pebble in that way as well.

However, in the current state of Pebble, setting UpperBound requires creating a new iterator (Which doesn't seem ideal if we want to set UpperBound on the fly, like what you mentioned is done in libroach).

I had forgotten that libroach was adjusting the upper-bound for an iterator on the fly. Let's hold off on supporting that in Pebble for now until I figure out why that is being done and if there is a better approach.

Ryanfsdf · 2019-05-21T15:32:25Z

This was fixed by #139.

petermattis changed the title ~~Prefix bloom filter~~ sstable: prefix bloom filter Aug 11, 2018

petermattis assigned tbg Feb 20, 2019

petermattis assigned Ryanfsdf and unassigned tbg May 7, 2019

Ryanfsdf closed this as completed May 21, 2019

sumeerbhola mentioned this issue Oct 22, 2021

docs: draft design sketch for ranged keys #1341

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sstable: prefix bloom filter #5

sstable: prefix bloom filter #5

petermattis commented Aug 3, 2018

petermattis commented Feb 20, 2019

tbg commented Feb 20, 2019 via email

petermattis commented May 1, 2019

Ryanfsdf commented May 13, 2019 •

edited

Loading

petermattis commented May 13, 2019

ajkr commented May 18, 2019 •

edited

Loading

petermattis commented May 18, 2019

ajkr commented May 18, 2019

ajkr commented May 18, 2019 •

edited

Loading

ajkr commented May 18, 2019

Ryanfsdf commented May 18, 2019

petermattis commented May 18, 2019

Ryanfsdf commented May 21, 2019

sstable: prefix bloom filter #5

sstable: prefix bloom filter #5

Comments

petermattis commented Aug 3, 2018

petermattis commented Feb 20, 2019

tbg commented Feb 20, 2019 via email

petermattis commented May 1, 2019

Ryanfsdf commented May 13, 2019 • edited Loading

petermattis commented May 13, 2019

ajkr commented May 18, 2019 • edited Loading

petermattis commented May 18, 2019

ajkr commented May 18, 2019

ajkr commented May 18, 2019 • edited Loading

ajkr commented May 18, 2019

Ryanfsdf commented May 18, 2019

petermattis commented May 18, 2019

Ryanfsdf commented May 21, 2019

Ryanfsdf commented May 13, 2019 •

edited

Loading

ajkr commented May 18, 2019 •

edited

Loading

ajkr commented May 18, 2019 •

edited

Loading