-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the complete reverse case for the two way algorithm #27474
Conversation
Add tests for .rfind(&str), using the reverse searcher case for substring search.
Fix quadratic behavior in StrSearcher in reverse search with periodic needles. This commit adds the missing pieces for the "short period" case in reverse search. The short case will show up when the needle is literally periodic, for example "abababab". Two way uses a "critical factorization" of the needle: x = u v. Searching matches v first, if mismatch at character k, skip k forward. Matching u, if mismatch, skip period(x) forward. To avoid O(mn) behavior after mismatch in u, memorize the already matched prefix. The short period case requires that |u| < period(x). For the reverse search we need to compute a different critical factorization x = u' v' where |v'| < period(x), because we are searching for the reversed needle. A short v' also benefits the algorithm in general. The reverse critical factorization is computed quickly by using the same maximal suffix algorithm, but terminating as soon as we have a location with local period equal to period(x). This adds extra fields crit_pos_back and memory_back for the reverse case. The new overhead for TwoWaySearcher::new is low, and additionally I think the "short period" case is uncommon in many applications of string search. The maximal_suffix methods were updated in documentation and the algorithms updated to not use !0 and wrapping add, variable left is now 1 larger, offset 1 smaller. Use periodicity when computing byteset: in the periodic case, just iterate over one period instead of the whole needle. Example before (rfind) after (twoway_rfind) benchmark shows the removal of quadratic behavior. needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100 ``` test periodic::rfind ... bench: 1,926,595 ns/iter (+/- 11,390) = 10 MB/s test periodic::twoway_rfind ... bench: 51,740 ns/iter (+/- 66) = 386 MB/s ```
r? @huonw (rust_highfive has picked a reviewer for you, use r? to override) |
The battery of benchmarks is hosted on a workspace externally (exact commit) Benchmarking with LTO and the before case (rfind) and after case (twoway_rfind) are almost unchanged, except for the periodic pathology I found. Testcases named periodic, periodic2.
|
CC @nham @shepmaster |
I'd love to hear from you @gereeter who helped me with the last PR (#26327). This time I've sweated some to understand the algorithm a lot better. It turns out it's not as symmetrical as we had supposed, and the reverse case needs its own critical factorization (for the short period case, for periodic needles). Fortunately computing the critical factorization for the reverse is really quick, because we can just stop when we find a period that matches. Maybe there's an even quicker way to compute it? Either way, the already present overhead in |
Just for fun, not really related to the PR, here's a comparison to a fast string search: glibc::memmem, which also uses two way search but with a full 256 entry skip table. It is non-simd except for using memchr once when starting. My conclusion is that rust is reasonably competitive, and we win on several benchmarks, for example the test case "short_word_long", which is a common kind of natural language search, and the cases short_2let_long, short_3let_long where the needle is just 2 or 3 bytes.
|
I found an improvement to the innerest parts of the loop in TwoWaySearcher::next (the forward search). Added that as an add-on commit, see its commit log. Edit: Both next and next_back. Relies on slice length bounds being inside
The |
The innermost loop of TwoWaySearcher checks the boundary of the haystack vs position + needle.len(), and it checks the last byte of the needle against the byteset. If these two steps are combined by using the indexing of the last needle byte's position as bounds check, the algorithm improves its throughput. We improve the innermost loop by reducing the number of instructions used, and elminating the panic case for the checked indexing that was previously used. Selected benchmarks from the external/workspace testsuite. Benchmarks improve across the board. ``` before: test bb_in_aa::twoway_find ... bench: 4,229 ns/iter (+/- 1,305) = 23646 MB/s test bb_in_aa::twoway_rfind ... bench: 3,873 ns/iter (+/- 101) = 25819 MB/s test short_1let_long::twoway_find ... bench: 7,075 ns/iter (+/- 29) = 360 MB/s test short_1let_long::twoway_rfind ... bench: 6,640 ns/iter (+/- 79) = 384 MB/s test short_2let_long::twoway_find ... bench: 3,823 ns/iter (+/- 16) = 667 MB/s test short_2let_long::twoway_rfind ... bench: 3,774 ns/iter (+/- 44) = 675 MB/s test short_3let_long::twoway_find ... bench: 3,582 ns/iter (+/- 47) = 712 MB/s test short_3let_long::twoway_rfind ... bench: 3,616 ns/iter (+/- 34) = 705 MB/s with this commit: test bb_in_aa::twoway_find ... bench: 2,952 ns/iter (+/- 20) = 33875 MB/s test bb_in_aa::twoway_rfind ... bench: 2,939 ns/iter (+/- 99) = 34025 MB/s test short_1let_long::twoway_find ... bench: 4,593 ns/iter (+/- 4) = 555 MB/s test short_1let_long::twoway_rfind ... bench: 4,592 ns/iter (+/- 76) = 555 MB/s test short_2let_long::twoway_find ... bench: 2,804 ns/iter (+/- 3) = 909 MB/s test short_2let_long::twoway_rfind ... bench: 2,807 ns/iter (+/- 40) = 908 MB/s test short_3let_long::twoway_find ... bench: 3,105 ns/iter (+/- 120) = 821 MB/s test short_3let_long::twoway_rfind ... bench: 3,019 ns/iter (+/- 50) = 844 MB/s ``` - `bb_in_aa`: fast skip due to byteset filter loop improves. - 1/2/3let: Searches for 1, 2, or 3 ascii bytes improves.
if is_long { | ||
searcher.next_back::<MatchOnly>(self.haystack.as_bytes(), | ||
self.needle.as_bytes(), | ||
true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not is_long
here instead of the branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to encourage the compiler to compile two different versions of next_back
(next
does the same)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure if this is actually working to compile to two versions. Wrapping Long vs Short period cases into the strategy trait could force it instead.
This was an idea from the original code, commit 39cb5b1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I investigated this, it looks like this is what happens: it seems to work when looking at the code generated in benchmarks. The compiler transforms branches into either long period or short period outside of the main 'search loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked with @bluss on IRC and this seems plausible. Can you put a comment that briefly describes the optimization so future edits don't "fix" the code?
I don't understand this algorithm at all, and it'll take me a while to get up to speed, so I'm not sure I'm the best person to review. |
The bulk of my testing is not in the rust source or in this PR, it's in my separate repo, a lot of quickcheck tests for interesting properties (finding correct matches, match-reject ranges add up, match-reject ranges on char boundaries etc.) Main issues preventing tests to be included are productivity issues:
|
OK, so I spent some time reading the relevant paper and hashing this out with @bluss, and I'm reasonably convinced that this is a good approach. Having to compute a new factorization makes sense to me, and using the previously computed period to stop the reverse period calculation early is pretty clever. :-) I had one little nit about adding a comment, but otherwise, I'm pretty happy with this. |
@BurntSushi Thank you for reading! Should we find some one more to review it, or is it good? Since you were so leninent on comments, I'll underline that I don't mind detailed critique. |
@bluss I think I'm not the right person to answer that---I've never r+'d anything before. I'd feel better if someone else chimed in. |
fn maximal_suffix(arr: &[u8], order_greater: bool) -> (usize, usize) { | ||
let mut left = 0; // Corresponds to i in the paper | ||
let mut right = 1; // Corresponds to j in the paper | ||
let mut offset = 0; // Corresponds to k in the paper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this have a note saying that the paper starts k
at 1 because the array indexing in the paper starts at one, and that using 0 instead fits 0-based indexing correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. You're pointing out that the new version is in a way closer to the paper than the old, I didn't notice that, it's nice.
It was rewritten to insert unchecked indexing while the code being obviously correct. Once written in this shape, unchecked indexing didn't make a difference..
Sorry for not responding earlier - I was away. I had a few comments on making things clearer and less repetitive, but otherwise this looks very good. Thanks for doing this, @bluss! |
@gereeter Away is what you should be when it's summer! I've added more comments, clarified some things. Thank you for reviewing! I want to skip merging the two maximal suffix functions. Any way to do it nicely ends up with just as much code or more, but arranged differently. |
Break out a separate static method to create the "byteset".
@bors r+ |
1 similar comment
@bors r+ |
📌 Commit 01e8812 has been approved by |
@BurntSushi @bors doesn't think you are a reviewer. |
StrSearcher: Implement the complete reverse case for the two way algorithm Fix quadratic behavior in StrSearcher in reverse search with periodic needles. This commit adds the missing pieces for the "short period" case in reverse search. The short case will show up when the needle is literally periodic, for example "abababab". Two way uses a "critical factorization" of the needle: x = u v. Searching matches v first, if mismatch at character k, skip k forward. Matching u, if mismatch, skip period(x) forward. To avoid O(mn) behavior after mismatch in u, memorize the already matched prefix. The short period case requires that |u| < period(x). For the reverse search we need to compute a different critical factorization x = u' v' where |v'| < period(x), because we are searching for the reversed needle. A short v' also benefits the algorithm in general. The reverse critical factorization is computed quickly by using the same maximal suffix algorithm, but terminating as soon as we have a location with local period equal to period(x). This adds extra fields crit_pos_back and memory_back for the reverse case. The new overhead for TwoWaySearcher::new is low, and additionally I think the "short period" case is uncommon in many applications of string search. The maximal_suffix methods were updated in documentation and the algorithms updated to not use !0 and wrapping add, variable left is now 1 larger, offset 1 smaller. Use periodicity when computing byteset: in the periodic case, just iterate over one period instead of the whole needle. Example before (rfind) after (twoway_rfind) benchmark shows the removal of quadratic behavior. needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100 ``` test periodic::rfind ... bench: 1,926,595 ns/iter (+/- 11,390) = 10 MB/s test periodic::twoway_rfind ... bench: 51,740 ns/iter (+/- 66) = 386 MB/s ```
StrSearcher: Implement the complete reverse case for the two way algorithm
Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.
This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".
Two way uses a "critical factorization" of the needle: x = u v.
Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.
To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.
The short period case requires that |u| < period(x).
For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.
The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).
This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.
The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.
Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.
Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.
needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100