-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure rustc_lexer::unescape #136538
Conversation
rustbot has assigned @petrochenkov. Use |
This comment has been minimized.
This comment has been minimized.
faacd4e
to
e3eb0c7
Compare
This comment has been minimized.
This comment has been minimized.
e3eb0c7
to
14cb84c
Compare
This comment has been minimized.
This comment has been minimized.
14cb84c
to
30234cd
Compare
This comment has been minimized.
This comment has been minimized.
30234cd
to
43f7cb3
Compare
This comment has been minimized.
This comment has been minimized.
43f7cb3
to
c48ee31
Compare
This comment has been minimized.
This comment has been minimized.
c48ee31
to
f1583e1
Compare
This comment has been minimized.
This comment has been minimized.
f1583e1
to
5e3d052
Compare
This comment has been minimized.
This comment has been minimized.
5e3d052
to
fd8ecb4
Compare
This comment has been minimized.
This comment has been minimized.
fd8ecb4
to
a1cfaa7
Compare
Every year someone restructures this code to match their taste. |
Could not assign reviewer from: |
This comment has been minimized.
This comment has been minimized.
a1cfaa7
to
b1523a5
Compare
Haha :) that is ominous. I am hoping to see some small perf wins to show good taste, and I am seeing a chance for perhaps adding some structure to the errors. |
This comment has been minimized.
This comment has been minimized.
f6d5378
to
63a58cd
Compare
This comment has been minimized.
This comment has been minimized.
63a58cd
to
a51ed76
Compare
This comment has been minimized.
This comment has been minimized.
Separate the functions for unescaping each kind of string and unit: - this duplicates some code, but also gets rid of code that is only there for genericity - each function is now simpler by inlining booleans, which might lead to faster code Use a Peekable<CharIndices<'_>> instead of going back and forth between string slice and chars iterator. - this gets rid of most position computations - allows removal of double traversal for correct backslash newline escapes in skip_ascii_whitespace Improves documentation
a51ed76
to
e00522b
Compare
rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead. cc @rust-lang/rust-analyzer |
@hkBst: is this ready for a perf run and then review? |
@nnethercote yes, sorry for not stating that explicitly. |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Restructure rustc_lexer::unescape Separate the functions for unescaping each kind of string and unit: - this duplicates some code, but also gets rid of code that is only there for genericity - each function is now simpler by inlining booleans, which might lead to faster code Use a Peekable<CharIndices<'_>> instead of going back and forth between string slice and chars iterator. - this gets rid of most position computations - allows removal of double traversal for correct backslash newline escapes in skip_ascii_whitespace Improves documentation
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (75f0e3d): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary 3.5%, secondary 2.6%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -2.7%, secondary 0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 781.62s -> 780.755s (-0.11%) |
Thanks for the PR! I appreciate that this code is very finicky, and I understand the desire to improve it. But there are some issue with the PR as it stands.
Also, you've made a bunch of changes in a single commit. Because this code is so intricate, it's really hard to read the diff. I suspect it could be broken into several atomic commits, each one doing a single distinct change. That would be much easier to review. I won't suggest doing that for the whole PR, because I don't think it's acceptable for the reasons I gave above, sorry! However, I wonder if a subset of the changes might still be worthwhile, such as the |
@nnethercote thanks for reviewing and interpreting the perf results (I wasn't quite sure how to interpret them). I realize the diff is quite useless as this is almost a complete code replacement, so thanks again for your efforts. I really appreciate them. I think this happened mostly because I started out duplicating some code to make it more concrete, so I could slowly simplify parts until I could understand it better, but I am happy to rework history if/once I find a useful improvement. In that respect it is funny that you shold mention being interested in the Peekable changes as I had started to suspect that part of negatively impacting performance. Maybe if I can isolate that we can try and see... I have some more ideas for this code. In particular I think it would be much better if the functions taking a callback turned into iterators, such that it is easier to consume them. This may have further unpredictable perf results, but iterators are supposed to be fast right? Or to put it another way: I don't have a good sense how well callback-based code optimizes, but I can imagine that it could be opaque to the compiler resulting in missed opportunities. I'd be happy to hear from you, if you have any more thoughts on this. |
Oh, one more thing I wanted to mention. I think the similarities between unescaping bytes, chars, and C strings are definitely there, but the current code also compromises to squeeze them together, such as the bytes functions returning chars instead of bytes, the MixedUnit allowing nul bytes (might be acceptable if it was used for something other than just C strings, in fact I'm currently thinking of this struct as CChar), and the uses of unreachable. |
See #136919 for a PR that introduces just Peekable, which almost necessitates using CharIndices instead of Chars, since you lose |
See #136931 for a PR that makes a start at exploring moving to iterators instead of callbacks. |
Thanks for filing the new PRs. #136947 puts me back in the review rotation, once it is merged you'll be able to request reviews from me again. I'll wait until you've finished drafting the new PRs before taking a look, and I'll close this PR. |
I'd like to take another stab at this, using a macro to remove the code duplication, and removing the use of Peekable and perhaps other things that I now suspect of causing perf regressions. Should I file a new PR or do you want to reopen this one? |
A new PR makes sense. That way the old approach doesn't get mixed up with the new approach. |
Separate the functions for unescaping each kind of string and unit:
Use a Peekable<CharIndices<'_>> instead of going back and forth between string slice and chars iterator.
Improves documentation