-
-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use rustc_layout_scalar_valid_range_end(usize::MAX - 1) for the index #72
Comments
I guess I'm not opposed to documenting this, although it kind of seems self evident to me? |
This commit primarily adds vectorized substring search routines in a new memmem sub-module. They were originally taken from bstr, but heavily modified to incorporate a variant of the "generic SIMD" algorithm[1]. The main highlights: * We guarantee `O(m + n)` time complexity and constant space complexity. * Two-Way is the primary implementation that can handle all cases. * Vectorized variants handle a number of common cases. * Vectorized code uses a heuristic informed by a frequency background distribution of bytes, originally devised inside the regex crate. This makes it more likely that searching will spend more time in the fast vector loops. While adding memmem to this crate is perhaps a bit of a scope increase, I think it fits well. It also puts a core primitive, substring search, very low in the dependency DAG and therefore making it widely available. For example, it is intended to use these new routines in the regex, aho-corasick and bstr crates. This commit does a number of other things, mainly as a result of convenience. It drastically improves test coverage for substring search (as compared to what bstr had), completely overhauls the benchmark suite to make it more comprehensive and adds `cargo fuzz` support for all API items in the crate. Closes #58, Closes #72 [1] - http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd
This commit primarily adds vectorized substring search routines in a new memmem sub-module. They were originally taken from bstr, but heavily modified to incorporate a variant of the "generic SIMD" algorithm[1]. The main highlights: * We guarantee `O(m + n)` time complexity and constant space complexity. * Two-Way is the primary implementation that can handle all cases. * Vectorized variants handle a number of common cases. * Vectorized code uses a heuristic informed by a frequency background distribution of bytes, originally devised inside the regex crate. This makes it more likely that searching will spend more time in the fast vector loops. While adding memmem to this crate is perhaps a bit of a scope increase, I think it fits well. It also puts a core primitive, substring search, very low in the dependency DAG and therefore making it widely available. For example, it is intended to use these new routines in the regex, aho-corasick and bstr crates. This commit does a number of other things, mainly as a result of convenience. It drastically improves test coverage for substring search (as compared to what bstr had), completely overhauls the benchmark suite to make it more comprehensive and adds `cargo fuzz` support for all API items in the crate. Closes #58, Closes #72 [1] - http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd
This commit primarily adds vectorized substring search routines in a new memmem sub-module. They were originally taken from bstr, but heavily modified to incorporate a variant of the "generic SIMD" algorithm[1]. The main highlights: * We guarantee `O(m + n)` time complexity and constant space complexity. * Two-Way is the primary implementation that can handle all cases. * Vectorized variants handle a number of common cases. * Vectorized code uses a heuristic informed by a frequency background distribution of bytes, originally devised inside the regex crate. This makes it more likely that searching will spend more time in the fast vector loops. While adding memmem to this crate is perhaps a bit of a scope increase, I think it fits well. It also puts a core primitive, substring search, very low in the dependency DAG and therefore making it widely available. For example, it is intended to use these new routines in the regex, aho-corasick and bstr crates. This commit does a number of other things, mainly as a result of convenience. It drastically improves test coverage for substring search (as compared to what bstr had), completely overhauls the benchmark suite to make it more comprehensive and adds `cargo fuzz` support for all API items in the crate. Closes #58, Closes #72 [1] - http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd
This commit primarily adds vectorized substring search routines in a new memmem sub-module. They were originally taken from bstr, but heavily modified to incorporate a variant of the "generic SIMD" algorithm[1]. The main highlights: * We guarantee `O(m + n)` time complexity and constant space complexity. * Two-Way is the primary implementation that can handle all cases. * Vectorized variants handle a number of common cases. * Vectorized code uses a heuristic informed by a frequency background distribution of bytes, originally devised inside the regex crate. This makes it more likely that searching will spend more time in the fast vector loops. While adding memmem to this crate is perhaps a bit of a scope increase, I think it fits well. It also puts a core primitive, substring search, very low in the dependency DAG and therefore making it widely available. For example, it is intended to use these new routines in the regex, aho-corasick and bstr crates. This commit does a number of other things, mainly as a result of convenience. It drastically improves test coverage for substring search (as compared to what bstr had), completely overhauls the benchmark suite to make it more comprehensive and adds `cargo fuzz` support for all API items in the crate. Closes #58, Closes #72 [1] - http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd
From rust-lang/rust#73139 (comment):
It would be great to tell rustc this is the case by using
rustc_layout_scalar_valid_range_end(usize::MAX - 1)
. That would allow storingOption<index>
inusize
instead of needing an extra bit, which in some cases could double the size of the struct due to alignment requirements.Failing that (since
rustc_layout_scalar_valid_range_end
is unstable and likely will never be stabilized), would it be possible to document that the index is always less thanusize::MAX
?The text was updated successfully, but these errors were encountered: