-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZeroTrie: Add cursor type for manual iteration and use it in BlobSchemaV2 #4383
Conversation
11% improvement on BlobDataProvider read performance:
|
@younies This provides the core functionality we need to implement |
provider/blob/src/blob_schema.rs
Outdated
trie: ZeroTrieSimpleAscii<&'a [u8]>, | ||
} | ||
|
||
impl<'a> fmt::Write for ZeroTrieStepWrite<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make this a general fmt::Write
instead of something more locale-specific, it should handle UTF-8 input correctly. write_str
currently forwards non-ASCII bytes, and write_char
panics (debug) or forwards (release).
You can use the Err
case to signal that non-ASCII is encountered. Then you can allow the unwrap on the grounds that locale.write_to
produces ASCII only.
You should probably call this ZeroTrieSimpleAsciiStepWrite
.
/// assert_eq!(it.head_value(), None); // "abcdxy" | ||
/// ``` | ||
#[inline] | ||
pub fn step(&mut self, byte: u8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you keeping the lookup state in the trie itself? The API I'd expect would be:
- call
lookup_write(&self)
to obtain aZeroTrieSimpleAsciiLookup<'a>
type that has a non-exclusive ref to the trie - that type can implement
fmt::Write
, or exposestep
- that type has a
current()
getter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZeroTrieSimpleAscii
is capable of stepping itself without the need to introduce any new types. This is an API that we can add regardless of whether we expose a more general ZeroTrieLookup
type that has an API similar to the one you describe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that your proposed API is more intuitive and self-explanatory. If you're okay with that then I can push to this PR to implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL
No appreciable difference in performance. Ready for re-review with the new API. FYI @younies I added an example for how to query for the longest prefix: |
experimental/zerotrie/src/cursor.rs
Outdated
step_bsearch_only(&mut self.trie.store, byte) | ||
} | ||
|
||
/// Takes the value at the current position and moves the cursor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unclear to me where it moves the cursor, I don't see a default place where it could be moved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted peek_value
which means I no longer need that clause to distinguish the behavior of value
from peek_value
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but why does this "take" and use a &mut
? This should not mutate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Values are stored in trie nodes. When we read a value, we can step over that trie node so that we don't need to do that again next time we call step
with the next character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is value()
idempotent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn value(self) -> Result<usize, Self>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could make a moving pub fn value(self)
along with a borrowing pub fn has_value(&self)
which is mostly as efficient as pub fn take_value(&mut self)
but I still prefer the mutating value function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn value(self) -> Result<usize, Self>
?
That signature doesn't allow checking the presence of a value and then continuing, as we need to do in the longest-prefix example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually my has_value
suggestion does not work well in the longest prefix example if you care about retaining the value, which we probably do most of the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the function name to take_value
|
||
impl<'a> ZeroTrieSimpleAsciiCursor<'a> { | ||
/// Steps the cursor one byte into the trie. | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this handle non-ASCII?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>
Also can someone review the diffbase PR #4382? |
🎉 All dependencies have been resolved ! |
@@ -504,6 +504,80 @@ pub fn get_phf_extended(mut trie: &[u8], mut ascii: &[u8]) -> Option<usize> { | |||
} | |||
} | |||
|
|||
pub(crate) fn step_bsearch_only(trie: &mut &[u8], c: u8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: docs
changelog pls |
Fixes #4249
Fixes #4379
Depends on #4381
Depends on #4382