optimize str::from_utf8() validation when slice contains multibyte chars and str.chars().count() in all cases #88834

the8472 · 2021-09-10T20:16:39Z

The change shows small but consistent improvements across several x86 target feature levels. I also tried to optimize counting with slice.as_chunks but that yielded more inconsistent results, bigger improvements for some optimization levels, lesser ones in others.

old, -O2, x86-64
test str::str_char_count_emoji                                  ... bench:       1,924 ns/iter (+/- 26)
test str::str_char_count_lorem                                  ... bench:         879 ns/iter (+/- 12)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)

new, -O2, x86-64
test str::str_char_count_emoji                                  ... bench:       1,878 ns/iter (+/- 21)
test str::str_char_count_lorem                                  ... bench:         851 ns/iter (+/- 11)
test str::str_char_count_lorem_short                            ... bench:           4 ns/iter (+/- 0)

old, -O2, x86-64-v2
test str::str_char_count_emoji                                  ... bench:       1,477 ns/iter (+/- 46)
test str::str_char_count_lorem                                  ... bench:         675 ns/iter (+/- 15)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)

new, -O2, x86-64-v2
test str::str_char_count_emoji                                  ... bench:       1,323 ns/iter (+/- 39)
test str::str_char_count_lorem                                  ... bench:         593 ns/iter (+/- 18)
test str::str_char_count_lorem_short                            ... bench:           4 ns/iter (+/- 0)

old, -O2, x86-64-v3
test str::str_char_count_emoji                                  ... bench:         748 ns/iter (+/- 7)
test str::str_char_count_lorem                                  ... bench:         348 ns/iter (+/- 2)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)

new, -O2, x86-64-v3
test str::str_char_count_emoji                                  ... bench:         650 ns/iter (+/- 4)
test str::str_char_count_lorem                                  ... bench:         301 ns/iter (+/- 1)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)

and for the multibyte-char string validation:

old, -O2, x86-64
test str::str_validate_emoji                                    ... bench:       4,606 ns/iter (+/- 64)

new, -O2, x86-64
test str::str_validate_emoji                                    ... bench:       3,837 ns/iter (+/- 60)

rust-highfive · 2021-09-10T20:16:42Z

r? @yaahc

(rust-highfive has picked a reviewer for you, use r? to override)

the8472 · 2021-09-10T20:19:11Z

perf run since since utf8_is_cont_byte is also used in other places and I only benched str.chars().count()

@bors try @rust-timer queue

rust-timer · 2021-09-10T20:19:12Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-09-10T20:19:19Z

⌛ Trying commit ce90315d5ffbf771cea57bf5060eec2e8a5455bb with merge fa3e0445c3bd996c5883d0948e712f6eb91e4b38...

library/core/src/str/validations.rs

it shows consistent improvements across several x86_64 feature levels ``` old, -O2, x86-64 test str::str_char_count_emoji ... bench: 1,924 ns/iter (+/- 26) test str::str_char_count_lorem ... bench: 879 ns/iter (+/- 12) test str::str_char_count_lorem_short ... bench: 5 ns/iter (+/- 0) new, -O2, x86-64 test str::str_char_count_emoji ... bench: 1,878 ns/iter (+/- 21) test str::str_char_count_lorem ... bench: 851 ns/iter (+/- 11) test str::str_char_count_lorem_short ... bench: 4 ns/iter (+/- 0) old, -O2, x86-64-v2 test str::str_char_count_emoji ... bench: 1,477 ns/iter (+/- 46) test str::str_char_count_lorem ... bench: 675 ns/iter (+/- 15) test str::str_char_count_lorem_short ... bench: 5 ns/iter (+/- 0) new, -O2, x86-64-v2 test str::str_char_count_emoji ... bench: 1,323 ns/iter (+/- 39) test str::str_char_count_lorem ... bench: 593 ns/iter (+/- 18) test str::str_char_count_lorem_short ... bench: 4 ns/iter (+/- 0) old, -O2, x86-64-v3 test str::str_char_count_emoji ... bench: 748 ns/iter (+/- 7) test str::str_char_count_lorem ... bench: 348 ns/iter (+/- 2) test str::str_char_count_lorem_short ... bench: 5 ns/iter (+/- 0) new, -O2, x86-64-v3 test str::str_char_count_emoji ... bench: 650 ns/iter (+/- 4) test str::str_char_count_lorem ... bench: 301 ns/iter (+/- 1) test str::str_char_count_lorem_short ... bench: 5 ns/iter (+/- 0) ```

…byte chars ``` old, -O2, x86-64 test str::str_validate_emoji ... bench: 4,606 ns/iter (+/- 64) new, -O2, x86-64 test str::str_validate_emoji ... bench: 3,837 ns/iter (+/- 60) ```

the8472 · 2021-09-10T22:51:26Z

@bors try-
@bors try

bors · 2021-09-10T22:51:33Z

⌛ Trying commit 66195d8 with merge a80e5872cb4aecf1c759ad6e8ae0b9a3297fdb2f...

bors · 2021-09-11T00:12:12Z

☀️ Try build successful - checks-actions
Build commit: a80e5872cb4aecf1c759ad6e8ae0b9a3297fdb2f (a80e5872cb4aecf1c759ad6e8ae0b9a3297fdb2f)

rust-timer · 2021-09-11T00:12:13Z

Queued a80e5872cb4aecf1c759ad6e8ae0b9a3297fdb2f with parent b69fe57, future comparison URL.

rust-timer · 2021-09-11T01:59:16Z

Finished benchmarking commit (a80e5872cb4aecf1c759ad6e8ae0b9a3297fdb2f): comparison url.

Summary: This change led to small relevant mixed results 🤷 in compiler performance.

Very small improvement in instruction counts (up to -0.3% on full builds of deeply-nested)
Small regression in instruction counts (up to 0.5% on full builds of ctfe-stress-4)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

the8472 · 2021-09-11T10:30:47Z

let's see if it was due to the extra function calls even though they should be inlined.

@bors try @rust-timer queue

rust-timer · 2021-09-11T10:30:48Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-09-11T10:30:56Z

⌛ Trying commit 5e1428e with merge ddb82bd66bb17f60beb471f9c8b345a5e1130e56...

bors · 2021-09-11T12:02:50Z

☀️ Try build successful - checks-actions
Build commit: ddb82bd66bb17f60beb471f9c8b345a5e1130e56 (ddb82bd66bb17f60beb471f9c8b345a5e1130e56)

rust-timer · 2021-09-11T12:02:51Z

Queued ddb82bd66bb17f60beb471f9c8b345a5e1130e56 with parent 4e880f8, future comparison URL.

rust-timer · 2021-09-11T13:42:23Z

Finished benchmarking commit (ddb82bd66bb17f60beb471f9c8b345a5e1130e56): comparison url.

Summary: This change led to small relevant mixed results 🤷 in compiler performance.

Small improvement in instruction counts (up to -0.4% on full builds of cargo)
Small regression in instruction counts (up to 0.4% on incr-unchanged builds of ripgrep)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

the8472 · 2021-09-11T13:58:38Z

Several of the improved and regressed benchmarks spend less/more time in LLVM_passes, finish_ongoing_codegen and run_linker so my guess is that this is mostly llvm noise rather than rust code being affected.

joshtriplett · 2021-10-04T05:03:15Z

I think the microbenchmark results seem clear here, and the rest seems likely to be noise. On balance, this seems likely to be a win.

@bors r+

bors · 2021-10-04T05:03:17Z

📌 Commit 5e1428e has been approved by joshtriplett

bors · 2021-10-04T12:50:01Z

⌛ Testing commit 5e1428e with merge 175b8db...

bors · 2021-10-04T15:30:34Z

☀️ Test successful - checks-actions
Approved by: joshtriplett
Pushing 175b8db to master...

rust-timer · 2021-10-04T20:25:47Z

Finished benchmarking commit (175b8db): comparison url.

Summary: This benchmark run did not return any relevant changes.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

rust-highfive assigned yaahc Sep 10, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 10, 2021

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 10, 2021

This comment has been minimized.

Sign in to view

the8472 force-pushed the char-count branch from ce90315 to 02200c3 Compare September 10, 2021 21:00

falk-hueffner reviewed Sep 10, 2021

View reviewed changes

library/core/src/str/validations.rs Outdated Show resolved Hide resolved

the8472 force-pushed the char-count branch from 02200c3 to 05c6060 Compare September 10, 2021 21:32

the8472 added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Sep 10, 2021

the8472 force-pushed the char-count branch from 05c6060 to 6df1a63 Compare September 10, 2021 22:11

the8472 changed the title ~~Improve str.chars().count() performance~~ optimize str::from_utf8() validation when slice contains multibyte chars and str.chars().count() in all cases Sep 10, 2021

This comment has been minimized.

Sign in to view

the8472 added 3 commits September 11, 2021 00:25

benchmark for str.chars().count()

4c44f06

optimization continuation byte validation of strings containing multi…

66195d8

…byte chars ``` old, -O2, x86-64 test str::str_validate_emoji ... bench: 4,606 ns/iter (+/- 64) new, -O2, x86-64 test str::str_validate_emoji ... bench: 3,837 ns/iter (+/- 60) ```

the8472 force-pushed the char-count branch from 6df1a63 to 66195d8 Compare September 10, 2021 22:25

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 11, 2021

manually inline function

5e1428e

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 11, 2021

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 11, 2021

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 28, 2021

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 4, 2021

bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 4, 2021

bors merged commit 175b8db into rust-lang:master Oct 4, 2021

rustbot added this to the 1.57.0 milestone Oct 4, 2021

bors mentioned this pull request Oct 4, 2021

Add {floor,ceil}_char_boundary methods to str #86497

Merged

rustbot removed the perf-regression Performance regression. label Oct 4, 2021

optimize str::from_utf8() validation when slice contains multibyte chars and str.chars().count() in all cases #88834

optimize str::from_utf8() validation when slice contains multibyte chars and str.chars().count() in all cases #88834

Uh oh!

Conversation

the8472 commented Sep 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Sep 10, 2021

Uh oh!

the8472 commented Sep 10, 2021

Uh oh!

rust-timer commented Sep 10, 2021

Uh oh!

bors commented Sep 10, 2021

Uh oh!

This comment has been minimized.

Uh oh!

This comment has been minimized.

the8472 commented Sep 10, 2021

Uh oh!

bors commented Sep 10, 2021

Uh oh!

bors commented Sep 11, 2021

Uh oh!

rust-timer commented Sep 11, 2021

Uh oh!

rust-timer commented Sep 11, 2021

Uh oh!

the8472 commented Sep 11, 2021

Uh oh!

rust-timer commented Sep 11, 2021

Uh oh!

bors commented Sep 11, 2021

Uh oh!

bors commented Sep 11, 2021

Uh oh!

rust-timer commented Sep 11, 2021

Uh oh!

rust-timer commented Sep 11, 2021

Uh oh!

the8472 commented Sep 11, 2021

Uh oh!

joshtriplett commented Oct 4, 2021

Uh oh!

bors commented Oct 4, 2021

Uh oh!

bors commented Oct 4, 2021

Uh oh!

bors commented Oct 4, 2021

Uh oh!

rust-timer commented Oct 4, 2021

Uh oh!

Uh oh!

the8472 commented Sep 10, 2021 •

edited

Loading