-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize string searching using two way search (WIP) #14135
Conversation
impl<'a> MatchIndices<'a> { | ||
// This is split out into a separate function so that it will be duplicated, | ||
// allowing there to be fewer branches in the loop. | ||
#[inline(always)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a large function, does inlining actually make it faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also strongly discourage #[inline(always)]
because it is easy to get wrong and make code much worse. This should be changed to #[inline]
at the least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But also it should not be inline without evidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the force-inlining attribute is for making two copies of next_inner
specialized for longPeriod
(otherwise a hot loop will continuously test longPeriod
). How about making an explicit macro to produce two copies and removing the attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are approaches are essentially equivalent, with equal problems (i.e. the problem with inline(always)
is the code bloat it causes (a problem with a macro too), not something specific to the act of inlining).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lifthrasiir is correct - I was intending to specialize next_inner
on longPeriod
. I haven't gotten around to benchmarking the difference yet, but I assumed it was worthwhile given that glibc manually inlines and specializes both this and maximal_suffix
(for which I'm using a similar trick). Regardless, I downgraded these to inline
from inline(always)
while refactoring the code, and it didn't seem to affect performance much - I think that it is inlining anyway, as it can easily see that both functions are only called twice.
This is cool! How fast is it on the Pride & Prejudice benchmark I used in #14107? |
C:
Rust:
This code is far better than before, but it still needs work. |
test str::bench::bench_contains_bad_naive ... bench: 1309 ns/iter (+/- 36) test str::bench::bench_contains_equal ... bench: 137 ns/iter (+/- 2) test str::bench::bench_contains_short_long ... bench: 5473 ns/iter (+/- 14) test str::bench::bench_contains_short_short ... bench: 57 ns/iter (+/- 6)
test str::bench::bench_contains_bad_naive ... bench: 300 ns/iter (+/- 12) from 1309 ns/iter (+/- 36) test str::bench::bench_contains_equal ... bench: 154 ns/iter (+/- 7) from 137 ns/iter (+/- 2) test str::bench::bench_contains_short_long ... bench: 2998 ns/iter (+/- 74) from 5473 ns/iter (+/- 14) test str::bench::bench_contains_short_short ... bench: 65 ns/iter (+/- 2) from 57 ns/iter (+/- 6)
This changes the previously naive string searching algorithm to a two-way search like glibc, which should be faster on average while still maintaining worst case linear time complexity. This fixes #14107. Note that I don't think this should be merged yet, as this is the only approach to speeding up search I've tried - it's worth considering options like Boyer-Moore or adding a bad character shift table to this. However, the benchmarks look quite good so far: test str::bench::bench_contains_bad_naive ... bench: 290 ns/iter (+/- 12) from 1309 ns/iter (+/- 36) test str::bench::bench_contains_equal ... bench: 479 ns/iter (+/- 10) from 137 ns/iter (+/- 2) test str::bench::bench_contains_short_long ... bench: 2844 ns/iter (+/- 105) from 5473 ns/iter (+/- 14) test str::bench::bench_contains_short_short ... bench: 55 ns/iter (+/- 4) from 57 ns/iter (+/- 6) Except for the case specifically designed to be optimal for the naive case (`bench_contains_equal`), this gets as good or better performance as the previous code.
feat: Add Lapce section to the manual
This changes the previously naive string searching algorithm to a two-way search like glibc, which should be faster on average while still maintaining worst case linear time complexity. This fixes #14107. Note that I don't think this should be merged yet, as this is the only approach to speeding up search I've tried - it's worth considering options like Boyer-Moore or adding a bad character shift table to this. However, the benchmarks look quite good so far:
Except for the case specifically designed to be optimal for the naive case (
bench_contains_equal
), this gets as good or better performance as the previous code.