-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize ascii::escape_default #94776
Conversation
r? @scottmcm (rust-highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 8761424 with merge cf6cd92ed676344f27c15baa799fa0d5092a60c5... |
☀️ Try build successful - checks-actions |
Queued cf6cd92ed676344f27c15baa799fa0d5092a60c5 with parent 10dccdc, future comparison URL. |
Nice work! Do you have measurements from rustc-perf? You can copy and paste the tables from the "compare" page into a GitHub comment and it should auto-format the markdown for you (example). I think there may be scope for further improvements here. I ran Cachegrind, the profile has this:
for Beyond that, the analysis identified that memory allocation rates are quite high for this benchmark. I ran DHAT to get more info. (I had to increase the
Frame 26 is interesting in both PPs, the relevant code in
So it's building up a very large (4MB) string, and it does it twice. That's quite the iterator chain, I wonder if it can be improved. Alternatively, I wonder if looking at the stack frames beneath could be useful -- are both of these calls to |
library/core/src/ascii.rs
Outdated
unsafe { | ||
( | ||
*hex_digits.get_unchecked((b >> 4) as usize), | ||
*hex_digits.get_unchecked((b & 0xf) as usize), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive by review: this might do without use of unsafe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks, I didn't even try that, I've yet to get a good feeling for what gets optimized and not. I'll do a local perf run with that and see if there's any changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were right of course, addressed in 7f4f4fc
Based on @paolobarbolini's tip that the unsafe block was unnecessary in this case. Not much left of `hexify()` after this, so seemed clearer to just inline it.
Finished benchmarking commit (cf6cd92ed676344f27c15baa799fa0d5092a60c5): comparison url. Summary: This benchmark run shows 1 relevant improvement 🎉 to instruction counts.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. @bors rollup=never |
@martingms Your results are so much better than the CI results that I suspect they are too good to be true. I've never seen Perhaps something went wrong with the measurements. Are you sure you are comparing two compilers that are identical other than this change? I typically have two |
I do the same, but I might've changed config on one without doing a full rebuild. I'll do full rebuilds and try again 👍 |
Looks like rebuilding llvm did it, this looks more realistic perhaps :)
|
You were right @nnethercote, inlining
|
Let's do this for peace of mind again, even if perf.rlo doesn't track @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit c62ab42 with merge 41e02b6896c4e3953ae231dfc8553bab2b2b8e25... |
☀️ Try build successful - checks-actions |
Queued 41e02b6896c4e3953ae231dfc8553bab2b2b8e25 with parent ba14a83, future comparison URL. |
Finished benchmarking commit (41e02b6896c4e3953ae231dfc8553bab2b2b8e25): comparison url. Summary: This benchmark run did not return any relevant results. If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. @bors rollup=never |
This looks good. If there are additional improvements (in or beneath @bors r+ rollup |
📌 Commit c62ab42 has been approved by |
I'll take a look at that next week if no-one beats me to it :) |
Rollup of 5 pull requests Successful merges: - rust-lang#93283 (Fix for localized windows editions in testcase fn read_link() Issue#93211) - rust-lang#94592 (Fallback to top-level config.toml if not present in current directory, and remove fallback for env vars and CLI flags) - rust-lang#94776 (Optimize ascii::escape_default) - rust-lang#94840 (update `replace_bound_vars_with_placeholders` doc comment) - rust-lang#94842 (Remove unnecessary try_opt for operations that cannot fail) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
ascii::escape_default
showed up as a hot function when compilingdeunicode-1.3.1
in @nnethercote's analysis of @lqd's rustc-benchmarking-data.After taking a look at the generated assembly it looked like a LUT-based approach could be faster for
hexify()
-ing ascii characters, so that's what this PR implementsThe patch looks like it provides about a 1-2% improvement in instructions for that particular crate. This should definitely be verified with a perf run as I'm still getting used to the
rustc-perf
tooling and might easily have made an error!