Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize string searching using two way search (WIP) #14135

Merged
merged 2 commits into from
May 16, 2014

Conversation

gereeter
Copy link
Contributor

This changes the previously naive string searching algorithm to a two-way search like glibc, which should be faster on average while still maintaining worst case linear time complexity. This fixes #14107. Note that I don't think this should be merged yet, as this is the only approach to speeding up search I've tried - it's worth considering options like Boyer-Moore or adding a bad character shift table to this. However, the benchmarks look quite good so far:

test str::bench::bench_contains_bad_naive                   ... bench:       290 ns/iter (+/- 12)     from 1309 ns/iter (+/- 36)
test str::bench::bench_contains_equal                       ... bench:       479 ns/iter (+/- 10)     from  137 ns/iter (+/- 2)
test str::bench::bench_contains_short_long                  ... bench:      2844 ns/iter (+/- 105)    from 5473 ns/iter (+/- 14)
test str::bench::bench_contains_short_short                 ... bench:        55 ns/iter (+/- 4)      from   57 ns/iter (+/- 6)

Except for the case specifically designed to be optimal for the naive case (bench_contains_equal), this gets as good or better performance as the previous code.

impl<'a> MatchIndices<'a> {
// This is split out into a separate function so that it will be duplicated,
// allowing there to be fewer branches in the loop.
#[inline(always)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a large function, does inlining actually make it faster?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also strongly discourage #[inline(always)] because it is easy to get wrong and make code much worse. This should be changed to #[inline] at the least.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also it should not be inline without evidence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the force-inlining attribute is for making two copies of next_inner specialized for longPeriod (otherwise a hot loop will continuously test longPeriod). How about making an explicit macro to produce two copies and removing the attribute?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are approaches are essentially equivalent, with equal problems (i.e. the problem with inline(always) is the code bloat it causes (a problem with a macro too), not something specific to the act of inlining).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifthrasiir is correct - I was intending to specialize next_inner on longPeriod. I haven't gotten around to benchmarking the difference yet, but I assumed it was worthwhile given that glibc manually inlines and specializes both this and maximal_suffix (for which I'm using a similar trick). Regardless, I downgraded these to inline from inline(always) while refactoring the code, and it didn't seem to affect performance much - I think that it is inlining anyway, as it can easily see that both functions are only called twice.

@huonw
Copy link
Member

huonw commented May 12, 2014

This is cool!

How fast is it on the Pride & Prejudice benchmark I used in #14107?

@gereeter
Copy link
Contributor Author

C:

real    0m0.093s
user    0m0.081s
sys 0m0.012s

Rust:

real    0m0.241s
user    0m0.241s
sys 0m0.000s

This code is far better than before, but it still needs work.

test str::bench::bench_contains_bad_naive                   ... bench:      1309 ns/iter (+/- 36)
test str::bench::bench_contains_equal                       ... bench:       137 ns/iter (+/- 2)
test str::bench::bench_contains_short_long                  ... bench:      5473 ns/iter (+/- 14)
test str::bench::bench_contains_short_short                 ... bench:        57 ns/iter (+/- 6)
test str::bench::bench_contains_bad_naive                   ... bench:       300 ns/iter (+/- 12)     from 1309 ns/iter (+/- 36)
test str::bench::bench_contains_equal                       ... bench:       154 ns/iter (+/- 7)      from  137 ns/iter (+/- 2)
test str::bench::bench_contains_short_long                  ... bench:      2998 ns/iter (+/- 74)     from 5473 ns/iter (+/- 14)
test str::bench::bench_contains_short_short                 ... bench:        65 ns/iter (+/- 2)      from   57 ns/iter (+/- 6)
bors added a commit that referenced this pull request May 16, 2014
This changes the previously naive string searching algorithm to a two-way search like glibc, which should be faster on average while still maintaining worst case linear time complexity. This fixes #14107. Note that I don't think this should be merged yet, as this is the only approach to speeding up search I've tried - it's worth considering options like Boyer-Moore or adding a bad character shift table to this. However, the benchmarks look quite good so far:

    test str::bench::bench_contains_bad_naive                   ... bench:       290 ns/iter (+/- 12)     from 1309 ns/iter (+/- 36)
    test str::bench::bench_contains_equal                       ... bench:       479 ns/iter (+/- 10)     from  137 ns/iter (+/- 2)
    test str::bench::bench_contains_short_long                  ... bench:      2844 ns/iter (+/- 105)    from 5473 ns/iter (+/- 14)
    test str::bench::bench_contains_short_short                 ... bench:        55 ns/iter (+/- 4)      from   57 ns/iter (+/- 6)

Except for the case specifically designed to be optimal for the naive case (`bench_contains_equal`), this gets as good or better performance as the previous code.
@bors bors closed this May 16, 2014
@bors bors merged commit 39cb5b1 into rust-lang:master May 16, 2014
@gereeter gereeter deleted the two-way-search branch December 17, 2015 01:29
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 13, 2023
feat: Add Lapce section to the manual
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

String searching is slow
5 participants