Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve case insensitive search to avoid allocations. #4394

Merged
merged 5 commits into from
Nov 23, 2021

Conversation

cyriltovena
Copy link
Contributor

❯ benchcmp  before.txt after.txt
benchmark                                            old ns/op     new ns/op     delta
Benchmark_LineFilter/default_true_(?i)foo-16         2400          2233          -6.96%
Benchmark_LineFilter/simplified_true_(?i)foo-16      201           228           +13.13%
Benchmark_LineFilter/default_false_(?i)foo-16        2443          2376          -2.74%
Benchmark_LineFilter/simplified_false_(?i)foo-16     185           231           +24.96%

benchmark                                            old allocs     new allocs     delta
Benchmark_LineFilter/default_true_(?i)foo-16         0              0              +0.00%
Benchmark_LineFilter/simplified_true_(?i)foo-16      1              0              -100.00%
Benchmark_LineFilter/default_false_(?i)foo-16        0              0              +0.00%
Benchmark_LineFilter/simplified_false_(?i)foo-16     1              0              -100.00%

benchmark                                            old bytes     new bytes     delta
Benchmark_LineFilter/default_true_(?i)foo-16         0             0             +0.00%
Benchmark_LineFilter/simplified_true_(?i)foo-16      128           0             -100.00%
Benchmark_LineFilter/default_false_(?i)foo-16        0             0             +0.00%
Benchmark_LineFilter/simplified_false_(?i)foo-16     128           0             -100.00%

It's not much but for a billions line it makes a big difference.

Signed-off-by: Cyril Tovena cyril.tovena@gmail.com

```
❯ benchcmp  before.txt after.txt
benchmark                                            old ns/op     new ns/op     delta
Benchmark_LineFilter/default_true_(?i)foo-16         2400          2233          -6.96%
Benchmark_LineFilter/simplified_true_(?i)foo-16      201           228           +13.13%
Benchmark_LineFilter/default_false_(?i)foo-16        2443          2376          -2.74%
Benchmark_LineFilter/simplified_false_(?i)foo-16     185           231           +24.96%

benchmark                                            old allocs     new allocs     delta
Benchmark_LineFilter/default_true_(?i)foo-16         0              0              +0.00%
Benchmark_LineFilter/simplified_true_(?i)foo-16      1              0              -100.00%
Benchmark_LineFilter/default_false_(?i)foo-16        0              0              +0.00%
Benchmark_LineFilter/simplified_false_(?i)foo-16     1              0              -100.00%

benchmark                                            old bytes     new bytes     delta
Benchmark_LineFilter/default_true_(?i)foo-16         0             0             +0.00%
Benchmark_LineFilter/simplified_true_(?i)foo-16      128           0             -100.00%
Benchmark_LineFilter/default_false_(?i)foo-16        0             0             +0.00%
Benchmark_LineFilter/simplified_false_(?i)foo-16     128           0             -100.00%
```

It's not much but for a billions line it makes a big difference.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
@cyriltovena cyriltovena requested a review from a team as a code owner September 29, 2021 07:38
Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment, but looking good. Thanks :)

}
l.buf = BytesBufferPool.Get(len(line)).([]byte)[:len(line)]
}
for i := 0; i < len(line); i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the buffer is being reused, don't we need to set l.buf l.buf[:len(line)]? Otherwise, it looks like it could include the end of a previous line that was longer. Alternatively, we could also return l.buf[:len(line)] at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do line 210 I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that's only run when len(line) > cap(l.buf) (line 206). If buf is already cap=6, len=6 and line is len=5, we'll end up returning the whole len=6 buf at the end, despite only writing the first 5 indices. That seems like a bug to me; is there something I'm missing?

@cyriltovena
Copy link
Contributor Author

Btw the code is taken from the strings library but alloc free.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
@pull-request-size pull-request-size bot added size/M and removed size/L labels Oct 27, 2021
@cyriltovena
Copy link
Contributor Author

I decided to go another way and do a one pass comparison handling utf8 along the way.

We're loosing 10% speed because this is not using assembly anymore but we're doing all of it without allocation which at scale will prove to be better:

❯ benchcmp  before.txt after.txt
benchmark                                            old ns/op      new ns/op      delta
Benchmark_LineFilter/default_true_(?i)foo-16         2420824098     2484584633     +2.63%
Benchmark_LineFilter/simplified_true_(?i)foo-16      185915427      213538073      +14.86%
Benchmark_LineFilter/default_false_(?i)foo-16        2282755169     2199247964     -3.66%
Benchmark_LineFilter/simplified_false_(?i)foo-16     183386899      206861735      +12.80%

benchmark                                            old allocs     new allocs     delta
Benchmark_LineFilter/default_true_(?i)foo-16         8              7              -12.50%
Benchmark_LineFilter/simplified_true_(?i)foo-16      1000008        0              -100.00%
Benchmark_LineFilter/default_false_(?i)foo-16        5              5              +0.00%
Benchmark_LineFilter/simplified_false_(?i)foo-16     1000004        0              -100.00%

benchmark                                            old bytes     new bytes     delta
Benchmark_LineFilter/default_true_(?i)foo-16         39120         39112         -0.02%
Benchmark_LineFilter/simplified_true_(?i)foo-16      128000565     0             -100.00%
Benchmark_LineFilter/default_false_(?i)foo-16        39096         39096         +0.00%
Benchmark_LineFilter/simplified_false_(?i)foo-16     128000312     0             -100.00%

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
@cyriltovena
Copy link
Contributor Author

@owen-d PTAL this should be easier to maintain.

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work

@owen-d owen-d merged commit dc222dc into grafana:main Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants