Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize str::from_utf8() validation when slice contains multibyte chars and str.chars().count() in all cases #88834

Merged
merged 4 commits into from
Oct 4, 2021

Conversation

the8472
Copy link
Member

@the8472 the8472 commented Sep 10, 2021

The change shows small but consistent improvements across several x86 target feature levels. I also tried to optimize counting with slice.as_chunks but that yielded more inconsistent results, bigger improvements for some optimization levels, lesser ones in others.

old, -O2, x86-64
test str::str_char_count_emoji                                  ... bench:       1,924 ns/iter (+/- 26)
test str::str_char_count_lorem                                  ... bench:         879 ns/iter (+/- 12)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)

new, -O2, x86-64
test str::str_char_count_emoji                                  ... bench:       1,878 ns/iter (+/- 21)
test str::str_char_count_lorem                                  ... bench:         851 ns/iter (+/- 11)
test str::str_char_count_lorem_short                            ... bench:           4 ns/iter (+/- 0)

old, -O2, x86-64-v2
test str::str_char_count_emoji                                  ... bench:       1,477 ns/iter (+/- 46)
test str::str_char_count_lorem                                  ... bench:         675 ns/iter (+/- 15)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)

new, -O2, x86-64-v2
test str::str_char_count_emoji                                  ... bench:       1,323 ns/iter (+/- 39)
test str::str_char_count_lorem                                  ... bench:         593 ns/iter (+/- 18)
test str::str_char_count_lorem_short                            ... bench:           4 ns/iter (+/- 0)

old, -O2, x86-64-v3
test str::str_char_count_emoji                                  ... bench:         748 ns/iter (+/- 7)
test str::str_char_count_lorem                                  ... bench:         348 ns/iter (+/- 2)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)

new, -O2, x86-64-v3
test str::str_char_count_emoji                                  ... bench:         650 ns/iter (+/- 4)
test str::str_char_count_lorem                                  ... bench:         301 ns/iter (+/- 1)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)

and for the multibyte-char string validation:

old, -O2, x86-64
test str::str_validate_emoji                                    ... bench:       4,606 ns/iter (+/- 64)

new, -O2, x86-64
test str::str_validate_emoji                                    ... bench:       3,837 ns/iter (+/- 60)

@rust-highfive
Copy link
Collaborator

r? @yaahc

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 10, 2021
@the8472
Copy link
Member Author

the8472 commented Sep 10, 2021

perf run since since utf8_is_cont_byte is also used in other places and I only benched str.chars().count()

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 10, 2021
@bors
Copy link
Contributor

bors commented Sep 10, 2021

⌛ Trying commit ce90315d5ffbf771cea57bf5060eec2e8a5455bb with merge fa3e0445c3bd996c5883d0948e712f6eb91e4b38...

@rust-log-analyzer

This comment has been minimized.

@the8472 the8472 added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Sep 10, 2021
@the8472 the8472 changed the title Improve str.chars().count() performance optimize str::from_utf8() validation when slice contains multibyte chars and str.chars().count() in all cases Sep 10, 2021
@rust-log-analyzer

This comment has been minimized.

it shows consistent improvements across several x86_64 feature levels

```
old, -O2, x86-64
test str::str_char_count_emoji                                  ... bench:       1,924 ns/iter (+/- 26)
test str::str_char_count_lorem                                  ... bench:         879 ns/iter (+/- 12)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)

new, -O2, x86-64
test str::str_char_count_emoji                                  ... bench:       1,878 ns/iter (+/- 21)
test str::str_char_count_lorem                                  ... bench:         851 ns/iter (+/- 11)
test str::str_char_count_lorem_short                            ... bench:           4 ns/iter (+/- 0)

old, -O2, x86-64-v2
test str::str_char_count_emoji                                  ... bench:       1,477 ns/iter (+/- 46)
test str::str_char_count_lorem                                  ... bench:         675 ns/iter (+/- 15)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)

new, -O2, x86-64-v2
test str::str_char_count_emoji                                  ... bench:       1,323 ns/iter (+/- 39)
test str::str_char_count_lorem                                  ... bench:         593 ns/iter (+/- 18)
test str::str_char_count_lorem_short                            ... bench:           4 ns/iter (+/- 0)

old, -O2, x86-64-v3
test str::str_char_count_emoji                                  ... bench:         748 ns/iter (+/- 7)
test str::str_char_count_lorem                                  ... bench:         348 ns/iter (+/- 2)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)

new, -O2, x86-64-v3
test str::str_char_count_emoji                                  ... bench:         650 ns/iter (+/- 4)
test str::str_char_count_lorem                                  ... bench:         301 ns/iter (+/- 1)
test str::str_char_count_lorem_short                            ... bench:           5 ns/iter (+/- 0)
```
…byte chars

```
old, -O2, x86-64
test str::str_validate_emoji                                    ... bench:       4,606 ns/iter (+/- 64)

new, -O2, x86-64
test str::str_validate_emoji                                    ... bench:       3,837 ns/iter (+/- 60)
```
@the8472
Copy link
Member Author

the8472 commented Sep 10, 2021

@bors try-
@bors try

@bors
Copy link
Contributor

bors commented Sep 10, 2021

⌛ Trying commit 66195d8 with merge a80e5872cb4aecf1c759ad6e8ae0b9a3297fdb2f...

@bors
Copy link
Contributor

bors commented Sep 11, 2021

☀️ Try build successful - checks-actions
Build commit: a80e5872cb4aecf1c759ad6e8ae0b9a3297fdb2f (a80e5872cb4aecf1c759ad6e8ae0b9a3297fdb2f)

@rust-timer
Copy link
Collaborator

Queued a80e5872cb4aecf1c759ad6e8ae0b9a3297fdb2f with parent b69fe57, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (a80e5872cb4aecf1c759ad6e8ae0b9a3297fdb2f): comparison url.

Summary: This change led to small relevant mixed results 🤷 in compiler performance.

  • Very small improvement in instruction counts (up to -0.3% on full builds of deeply-nested)
  • Small regression in instruction counts (up to 0.5% on full builds of ctfe-stress-4)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 11, 2021
@the8472
Copy link
Member Author

the8472 commented Sep 11, 2021

let's see if it was due to the extra function calls even though they should be inlined.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 11, 2021
@bors
Copy link
Contributor

bors commented Sep 11, 2021

⌛ Trying commit 5e1428e with merge ddb82bd66bb17f60beb471f9c8b345a5e1130e56...

@bors
Copy link
Contributor

bors commented Sep 11, 2021

☀️ Try build successful - checks-actions
Build commit: ddb82bd66bb17f60beb471f9c8b345a5e1130e56 (ddb82bd66bb17f60beb471f9c8b345a5e1130e56)

@rust-timer
Copy link
Collaborator

Queued ddb82bd66bb17f60beb471f9c8b345a5e1130e56 with parent 4e880f8, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (ddb82bd66bb17f60beb471f9c8b345a5e1130e56): comparison url.

Summary: This change led to small relevant mixed results 🤷 in compiler performance.

  • Small improvement in instruction counts (up to -0.4% on full builds of cargo)
  • Small regression in instruction counts (up to 0.4% on incr-unchanged builds of ripgrep)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 11, 2021
@the8472
Copy link
Member Author

the8472 commented Sep 11, 2021

Several of the improved and regressed benchmarks spend less/more time in LLVM_passes, finish_ongoing_codegen and run_linker so my guess is that this is mostly llvm noise rather than rust code being affected.

@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 28, 2021
@joshtriplett
Copy link
Member

I think the microbenchmark results seem clear here, and the rest seems likely to be noise. On balance, this seems likely to be a win.

@bors r+

@bors
Copy link
Contributor

bors commented Oct 4, 2021

📌 Commit 5e1428e has been approved by joshtriplett

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 4, 2021
@bors
Copy link
Contributor

bors commented Oct 4, 2021

⌛ Testing commit 5e1428e with merge 175b8db...

@bors
Copy link
Contributor

bors commented Oct 4, 2021

☀️ Test successful - checks-actions
Approved by: joshtriplett
Pushing 175b8db to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 4, 2021
@bors bors merged commit 175b8db into rust-lang:master Oct 4, 2021
@rustbot rustbot added this to the 1.57.0 milestone Oct 4, 2021
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (175b8db): comparison url.

Summary: This benchmark run did not return any relevant changes.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

@rustbot rustbot removed the perf-regression Performance regression. label Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants