Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sstable: prefix bloom filter #5

Closed
petermattis opened this issue Aug 3, 2018 · 13 comments
Closed

sstable: prefix bloom filter #5

petermattis opened this issue Aug 3, 2018 · 13 comments
Assignees

Comments

@petermattis
Copy link
Collaborator

Add support for constructing the bloom filter on a prefix of the user key. Add a Comparator.PrefixExtractor function for extracting the prefix.

One question here is whether this should be considered a prefix extractor or restricted to user keys which have a version. An alternate api would be to provide Comparator.Split to split a user key into a prefix and a version suffix. This might be useful in the future for segregating historic versions of user keys into a separate storage area.

@petermattis petermattis changed the title Prefix bloom filter sstable: prefix bloom filter Aug 11, 2018
@petermattis
Copy link
Collaborator Author

@tbg This is done, right?

@tbg
Copy link
Member

tbg commented Feb 20, 2019 via email

@petermattis
Copy link
Collaborator Author

I think this would make a reasonable starter project. What remains to be done is to implement IterOptions.Prefix. The comment above that field says:

// If Prefix is true, the iterator will only be used to iterate over keys
// matching that of the key it is first positioned at. If the Comparer was
// supplied with a user-defined Split function and bloom filters are
// enabled, this allows for improved performance by skipping SSTables known
// not to contain the given prefix. The iterator will not properly observe
// keys not matching the prefix.
//
// TODO(tbg): should an assertion trip if the first key's prefix is unstable?
// TODO(tbg): should Prefix override (or sharpen) {Lower,Upper}Bound? When
// we see the first key, we get the prefix and a separator which should be
// a good {Lower,Upper}Bound.
// TODO(tbg): unimplemented.
// Prefix bool

@petermattis petermattis assigned Ryanfsdf and unassigned tbg May 7, 2019
@Ryanfsdf
Copy link
Contributor

Ryanfsdf commented May 13, 2019

@petermattis @tbg

I'm wondering if IterOptions.Prefix should even be required, or if the existence of Split(Prefix extractor) should indicate that all iterators for the Reader use prefix iteration. This seems to be the approach that RocksDB is taking now (as stated here in the “What’s Changed” section). I don’t like the idea of Options.Comparer.Split being a prerequisite for IterOptions.Prefix, since setting Options.Comparer.Split already communicates that prefix iterators should be used.

The benefit that IterOptions.Prefix offers is that we can still do full key iteration without creating a new Reader, which comes at the cost of the clarify of Options and IterOptions. I don’t think this benefit is worth the cost, but I’m open to suggestions.

Also, I think {Lower,Upper}Bound can be ignored for prefix iterators, since I don’t think there’s a use case for specifying {Lower,Upper}Bound when iterating through a specific prefix. As mentioned here, it seems that CockroachDB never sets {Lower,Upper}Bound for prefix iterators anyways.

In terms of First() and Last(), I’m in favor of leaving them unimplemented. One use case of First() and Last() for prefix iterators is that we may support these methods only if Seek() has already been called, in which case we can let First() return the first key with the prefix and Last() return the last key with the prefix. However, this would change the function contract to require a call to Seek() before calling First() or Last(), which I don’t particularly like. Supporting this would further increase the divide between regular Iterators and Prefix Iterators, in which case we should create a new class PrefixIter entirely, instead of making it an option.

@petermattis
Copy link
Collaborator Author

I'm wondering if IterOptions.Prefix should even be required, or if the existence of Split(Prefix extractor) should indicate that all iterators for the Reader use prefix iteration. This seems to be the approach that RocksDB is taking now (as stated here in the “What’s Changed” section). I don’t like the idea of Options.Comparer.Split being a prerequisite for IterOptions.Prefix, since setting Options.Comparer.Split already communicates that prefix iterators should be used.

Note that we don't want all iterators to be prefix iterators as most scan operations need to find multiple keys, not just keys matching the prefix of the seek key. In CRDB, we specify ReadOptions.total_order_seek to disable prefix iteration. My preference is for prefix iteration to be the opt-in mode. @ajkr looks like he had input into the new RocksDB API. Perhaps he can shed light on why it was chosen.

The benefit that IterOptions.Prefix offers is that we can still do full key iteration without creating a new Reader, which comes at the cost of the clarify of Options and IterOptions. I don’t think this benefit is worth the cost, but I’m open to suggestions.

Most of the time, the Reader is the DB, so "creating a new Reader" is not even an option.

Also, I think {Lower,Upper}Bound can be ignored for prefix iterators, since I don’t think there’s a use case for specifying {Lower,Upper}Bound when iterating through a specific prefix. As mentioned here, it seems that CockroachDB never sets {Lower,Upper}Bound for prefix iterators anyways.

Agreed.

In terms of First() and Last(), I’m in favor of leaving them unimplemented.

Agreed.

@ajkr
Copy link
Contributor

ajkr commented May 18, 2019

Note that we don't want all iterators to be prefix iterators as most scan operations need to find multiple keys, not just keys matching the prefix of the seek key. In CRDB, we specify ReadOptions.total_order_seek to disable prefix iteration. My preference is for prefix iteration to be the opt-in mode. @ajkr looks like he had input into the new RocksDB API. Perhaps he can shed light on why it was chosen.

I don't recall discussions around the total_order_seek / prefix_same_as_start API. It must've been decided a very long time ago. I do not know the reasons for it either, but forgetting to set total_order_seek=true is incredibly common, indicating the API needs improvement. Fortunately it'll be getting the attention it deserves soon via making use of iterate_{lower,upper}_bound together with the Seek / SeekForPrev targets to automatically infer whether the user is doing a prefix scan or larger range scan.

@petermattis
Copy link
Collaborator Author

@ajkr How do you feel about the proposed API in this PR: Iterator.SeekPrefixGE? This makes it clear that prefix iteration is being requested, and allows to switch between prefix and non-prefix iteration using the same iterator.

@ajkr
Copy link
Contributor

ajkr commented May 18, 2019

Taking a look now. Sorry I got addicted to the release problem for too long this week.

@ajkr
Copy link
Contributor

ajkr commented May 18, 2019

It's a great API. I can barely think of how to misuse it, but let's see :).

The only part I didn't get is the discussion around ignoring UpperBound when SeekPrefixGE is used. Let's say I want to lookup user key "a" with timestamp > 100 (the timestamps are ordered descending per our comparator). My intuition would be to set UpperBound = "a@100" and call SeekPrefixGE("a@MaxTimestamp"). I'd expect it to both use bloom filter and not return keys at or above the upper bound.

@ajkr
Copy link
Contributor

ajkr commented May 18, 2019

Do we plan for UpperBound to be dynamically changeable? libroach currently changes iterate_upper_bound before a Seek (which might or might not be safe), so I wonder whether we'd need to use Pebble in that way as well.

@Ryanfsdf
Copy link
Contributor

The only part I didn't get is the discussion around ignoring UpperBound when SeekPrefixGE is used. Let's say I want to lookup user key "a" with timestamp > 100 (the timestamps are ordered descending per our comparator). My intuition would be to set UpperBound = "a@100" and call SeekPrefixGE("a@MaxTimestamp"). I'd expect it to both use bloom filter and not return keys at or above the upper bound.

Hi Andrew, I initially stated in my PR that {Lower,Upper}Bound would be ignored but I later revised it so that {Lower,Upper}Bound are both respected. So your assumption that it would both use bloom filters and not return keys at or above the upper bound is correct.

However, in the current state of Pebble, setting UpperBound requires creating a new iterator (Which doesn't seem ideal if we want to set UpperBound on the fly, like what you mentioned is done in libroach).

@petermattis
Copy link
Collaborator Author

Do we plan for UpperBound to be dynamically changeable? libroach currently changes iterate_upper_bound before a Seek (which might or might not be safe), so I wonder whether we'd need to use Pebble in that way as well.

However, in the current state of Pebble, setting UpperBound requires creating a new iterator (Which doesn't seem ideal if we want to set UpperBound on the fly, like what you mentioned is done in libroach).

I had forgotten that libroach was adjusting the upper-bound for an iterator on the fly. Let's hold off on supporting that in Pebble for now until I figure out why that is being done and if there is a better approach.

@Ryanfsdf
Copy link
Contributor

This was fixed by #139.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants