-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change output normalization logic to be linear against size of output #128200
Conversation
Seeing if this is the culprit of the perf regression requiring the #128179 revert. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[perf] Change output normalization logic to be linear against size of output I believe the previous code was accidentally quadratic. Let's perf it.
compiler/rustc_errors/src/emitter.rs
Outdated
s = s.replace(*c, replacement); | ||
} | ||
s | ||
// We replace some characters so the CLI output is always consistent and underlines aligned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this in a LazyCell
or whatever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorting it and using binary search would be another option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, did this just to smoke test if I was right that this is the actual problem.
Does anyone have thoughts about how to avoid the String
allocation? Doubt it actually affects performance that much... but still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can return a borrowed Cow
if no substitutions happened and an owned one if one does happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And a big match
might be even simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@the8472 what do you think of the following?
str.chars()
.flat_map(|c| match output_replacements.binary_search_by_key(&c, |(k, _)| *k) {
Ok(i) => Either::Left(output_replacements[i].1.chars()),
_ => Either::Right([c].into_iter()),
})
.collect()
There would be a single heap allocation by collect
and nothing else, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it doesn't know the length so it'd still do the amortized growth thing, so O(log(l))
allocations.
And it wouldn't avoid an allocation if nothing changed.
But it's better than what's currently in the PR.
compiler/rustc_errors/src/emitter.rs
Outdated
str.chars() | ||
.map(|c| output_replacements.get(&c).map_or(c.to_string(), |s| s.to_string())) | ||
.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those to_string
are going to cause a lot of tiny allocations. Try using fold
and appending to a String
instead.
compiler/rustc_errors/src/emitter.rs
Outdated
s = s.replace(*c, replacement); | ||
} | ||
s | ||
// We replace some characters so the CLI output is always consistent and underlines aligned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorting it and using binary search would be another option
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (ee7d1a7): comparison URL. Overall result: ❌✅ regressions and improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 769.861s -> 770.138s (0.04%) |
cd8ef40
to
e35d147
Compare
These commits modify the If this was unintentional then you should revert the changes before this PR is merged. |
Given that there's no improvement, I'm guessing that the previous version of this PR didn't do what I wanted it to. Trying this other version, but we might have to go ahead with the revert :-/ @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Change output normalization logic to be linear against size of output I believe the previous code was accidentally quadratic. Let's perf it.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (c7620ca): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 772.066s -> 771.597s (-0.06%) |
Scan strings to be normalized for printing in a linear scan and collect the resulting `String` only once. Use a binary search when looking for chars to be replaced, instead of a `HashMap::get`.
e35d147
to
4790f32
Compare
@@ -2559,22 +2559,13 @@ fn num_decimal_digits(num: usize) -> usize { | |||
|
|||
// We replace some characters so the CLI output is always consistent and underlines aligned. | |||
// Keep the following list in sync with `rustc_span::char_width`. | |||
// ATTENTION: keep lexicografically sorted so that the binary search will work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ATTENTION: keep lexicografically sorted so that the binary search will work | |
// ATTENTION: keep lexicographically sorted so that the binary search will work |
// Scan the input string for a character in the ordered table above. If it's present, replace | ||
// it with it's alternative string (it can be more than 1 char!). Otherwise, retain the input | ||
// char. At the end, allocate all chars into a string in one operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems to be a bit inaccurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it was accurate, before I changed it to the current algo. Given this is already r+d, I'll do a follow up (unless you want to do that as part of your draft PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already included it in the draft, yes, so I guess we can do it as a part of it then.
There are two drafts actually. One adding const asserts (#128465) and one using perfect hashing (#128463). They both need this one to land first, after which I guess the former can be reviewed independently of the latter, which needs some additional benchmarking.
…felix Change output normalization logic to be linear against size of output Modify the rendered output normalization routine to scan each character *once* and construct a `String` to be printed out to the terminal *once*, instead of using `String::replace` in a loop multiple times. The output doesn't change, but the time spent to prepare a diagnostic is now faster (or rather, closer to what it was before rust-lang#127528).
The job Click to see the possible cause of the failure (guessed by this bot)
|
💔 Test failed - checks-actions |
Looks like a spurious failure. (I think I do not have enough rights to @bors retry) |
@bors retry |
☀️ Test successful - checks-actions |
Finished benchmarking commit (8c7e0e1): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary -2.8%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 762.407s -> 760.794s (-0.21%) |
More improvements than regressions, and this fixed an earlier regression. @rustbot label: +perf-regression-triaged |
rustc_errors: use perfect hashing for character replacements The correctness of code in rust-lang#128200 relies on an array being sorted (so that it can be used in binary search later), which is currently enforced with `// tidy-alphabetical` (and characters being written in `\u{XXXX}` form), as well as lack of duplicate entries with conflicting keys, which is not currently enforced. A const assert or a test can be added checking that (implemented in rust-lang#128465). But this PR tries to use [perfect hashing](https://en.wikipedia.org/wiki/Perfect_hash_function) instead. The performance implications are unclear. Asymptotically it's faster, but in reality we should just benchmark. Plus if there are no significant performance wins, this entire things is probably not even worse the additional dependencies it brings. UPD: funnily enough, there's a PR optimizing the binary search implementation (rust-lang#128254) in the queue right now. So I guess we have to wait until that is merged too before benchmarking this.
Some `const { }` asserts for rust-lang#128200 The correctness of code in rust-lang#128200 relies on an array being sorted (so that it can be used in binary search later), which is currently enforced with `// tidy-alphabetical` (and characters being written in `\u{XXXX}` form), as well as lack of duplicate entries with conflicting keys, which is not currently enforced. This PR changes it to using a `const{ }` assertion (and also checks for duplicate entries). Sadly, we cannot use the recently-stabilized `is_sorted_by_key` here, because it is not const (but it would not allow us to check for uniqueness anyways). Instead, let's write a manual loop. Alternative approach (perfect hash function): rust-lang#128463 r? `@ghost`
…iaskrgr Rollup of 2 pull requests Successful merges: - rust-lang#128465 (Some `const { }` asserts for rust-lang#128200 ) - rust-lang#128795 (Update E0517 message to reflect RFC 2195.) r? `@ghost` `@rustbot` modify labels: rollup
Some `const { }` asserts for rust-lang#128200 The correctness of code in rust-lang#128200 relies on an array being sorted (so that it can be used in binary search later), which is currently enforced with `// tidy-alphabetical` (and characters being written in `\u{XXXX}` form), as well as lack of duplicate entries with conflicting keys, which is not currently enforced. This PR changes it to using a `const{ }` assertion (and also checks for duplicate entries). Sadly, we cannot use the recently-stabilized `is_sorted_by_key` here, because it is not const (but it would not allow us to check for uniqueness anyways). Instead, let's write a manual loop. Alternative approach (perfect hash function): rust-lang#128463 r? `@ghost`
Modify the rendered output normalization routine to scan each character once and construct a
String
to be printed out to the terminal once, instead of usingString::replace
in a loop multiple times. The output doesn't change, but the time spent to prepare a diagnostic is now faster (or rather, closer to what it was before #127528).