Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover wrong-cased keywords that start items #99918

Merged
merged 3 commits into from
Nov 11, 2022

Conversation

WaffleLapkin
Copy link
Member

(this pr was inspired by this tweet)

r? @estebank

We've talked a bit about this recovery, but I just wanted to make sure that this is the right approach :)

For now I've only added the case insensitive recovery to uses, since most other items like impl blocks, modules, functions can start with multiple keywords which complicates the matter.

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jul 29, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 29, 2022
@WaffleLapkin WaffleLapkin marked this pull request as draft July 29, 2022 19:20
@rust-log-analyzer

This comment has been minimized.

@estebank
Copy link
Contributor

This seems reasonable and I think we could even get those more complex item signatures working as well with this approach.

@WaffleLapkin WaffleLapkin marked this pull request as ready for review September 13, 2022 18:59
@WaffleLapkin
Copy link
Member Author

I've added the support for functions (the original intent of this PR), the code is somewhat messy, but I think it works.

Comment on lines 591 to 615
if case_insensitive
&& let Some((ident, /* is_raw */ false)) = self.token.ident()
&& ident.as_str().to_lowercase() == kw.as_str().to_lowercase() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised rustfix didn't catch this:

Suggested change
if case_insensitive
&& let Some((ident, /* is_raw */ false)) = self.token.ident()
&& ident.as_str().to_lowercase() == kw.as_str().to_lowercase() {
if case_insensitive
&& let Some((ident, /* is_raw */ false)) = self.token.ident()
&& ident.as_str().to_lowercase() == kw.as_str().to_lowercase()
{

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean rustfmt? It doesn't work at all with if let chains currently

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also don't really have a style guide for it, so 🤷

Comment on lines +621 to +644
&& let Some((ident, /* is_raw */ false)) = self.token.ident()
&& ident.as_str().to_lowercase() == kw.as_str().to_lowercase() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
&& let Some((ident, /* is_raw */ false)) = self.token.ident()
&& ident.as_str().to_lowercase() == kw.as_str().to_lowercase() {
&& let Some((ident, /* is_raw */ false)) = self.token.ident()
&& ident.as_str().to_lowercase() == kw.as_str().to_lowercase()
{

/// Eats a keyword, optionally ignoring the case.
/// If the case differs (and is ignored) an error is issued.
/// This is useful for recovery.
fn eat_keyword_case(&mut self, kw: Symbol, case_insensitive: bool) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you do the insensitive check unconditionally? I guess that doesn't work because you need to pass in that the next token can be the keyword or at least an arbitrary ident... 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, being always case insensitive doesn't work because you may have a keyword | ident, for example here:
https://github.com/rust-lang/rust/blob/6a25a433b62701ed48e85857a3b9f29bc4ba758f/compiler/rustc_parse/src/parser/item.rs#L219
If we'd always use case insensitive checks, then Use!() would stop working. I had similar problems while working on this, where

impl Const /* <-- type */ {}

was wrongly parsed as impl const /* waiting for a trait */

#![allow(unused_imports)]

fn main() {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up work: handle fn foo(_: &Dyn Debug) {}


if case == Case::Insensitive
&& let Some((ident, /* is_raw */ false)) = self.token.ident()
&& ident.as_str().to_lowercase() == kw.as_str().to_lowercase() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm mildly amused at the thought of introducing a CamelCased keyword 😄

@WaffleLapkin
Copy link
Member Author

@estebank just to check: this is waiting on review, right? or do I have to do something?

@bors
Copy link
Collaborator

bors commented Sep 26, 2022

☔ The latest upstream changes (presumably #102297) made this pull request unmergeable. Please resolve the merge conflicts.

@WaffleLapkin
Copy link
Member Author

(ping @estebank)

@estebank
Copy link
Contributor

@bors r+

@bors
Copy link
Collaborator

bors commented Nov 10, 2022

📌 Commit d86f9cd has been approved by estebank

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 10, 2022
@bors
Copy link
Collaborator

bors commented Nov 11, 2022

⌛ Testing commit d86f9cd with merge 5b82ea7...

@bors
Copy link
Collaborator

bors commented Nov 11, 2022

☀️ Test successful - checks-actions
Approved by: estebank
Pushing 5b82ea7 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 11, 2022
@bors bors merged commit 5b82ea7 into rust-lang:master Nov 11, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5b82ea7): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.6% [0.2%, 1.2%] 80
Regressions ❌
(secondary)
0.8% [0.3%, 1.4%] 11
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.6% [0.2%, 1.2%] 80

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.1% [1.5%, 2.8%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

@rustbot rustbot added the perf-regression Performance regression. label Nov 11, 2022
@WaffleLapkin
Copy link
Member Author

Uh oh, this looks like a genuine perf regression

@estebank
Copy link
Contributor

@WaffleLapkin can you spend time on this?

@WaffleLapkin
Copy link
Member Author

I can, although I'm not sure what the next step would be

Comment on lines +613 to +619
if case == Case::Insensitive
&& let Some((ident, /* is_raw */ false)) = self.token.ident()
&& ident.as_str().to_lowercase() == kw.as_str().to_lowercase() {
true
} else {
false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any obvious reason for the slow down, but can you try putting up PRs that revert some of these changes independently? Let's try to find if there's a single method that's introducing most of the perf regression (and it very well could be that it's just worse cache locality due to change in method inlining or arguments or something "silly" like that).

@WaffleLapkin WaffleLapkin deleted the fnFnfun branch November 14, 2022 15:19
@WaffleLapkin
Copy link
Member Author

WaffleLapkin commented Nov 14, 2022

So here is the result of my investigation.

Cleared by-hand (please, someone, teach me how to profile rustc properly ;-;) top of the cachegrind diff output:

 7,475,252 (17.30%)  /h/rustx/library/std/src/thread/local.rs:rustc_span::SESSION_GLOBALS::FOO::__getit
 7,475,252 (17.30%)  /h/rustx/library/core/src/option.rs:rustc_span::SESSION_GLOBALS::FOO::__getit
 7,325,476 (16.95%)  /h/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.3/src/raw/mod.rs:<hashbrown::raw::RawTable<(rustc_ast::node_id::NodeId, rustc_hir::hir_id::ItemLocalId)>>::reserve_rehash::<hashbrown::map::make_hasher<rustc_ast::node_id::NodeId, rustc_ast::node_id::NodeId, rustc_hir::hir_id::ItemLocalId, core::hash::BuildHasherDefault<rustc_hash::FxHasher>>::{closure#0}>
-7,325,476 (-17.0%)  /h/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.3/src/raw/mod.rs:<hashbrown::raw::RawTable<(rustc_ast::node_id::NodeId, rustc_hir::hir_id::ItemLocalId)>>::reserve_rehash::<hashbrown::map::make_hasher<rustc_ast::node_id::NodeId, rustc_ast::node_id::NodeId, rustc_hir::hir_id::ItemLocalId, core::hash::BuildHasherDefault<rustc_hash::FxHasher>>::{closure#0}>
-7,233,604 (-16.7%)  /h/rustx/library/std/src/thread/local.rs:rustc_span::SESSION_GLOBALS::FOO::__getit
-7,233,604 (-16.7%)  /h/rustx/library/core/src/option.rs:rustc_span::SESSION_GLOBALS::FOO::__getit
 6,036,360 (13.97%)  /h/rustx/library/alloc/src/str.rs:alloc::str::<impl str>::to_lowercase
 4,666,512 (10.80%)  ./malloc/./malloc/malloc.c:_int_free
 3,966,467 ( 9.18%)  /h/rustx/compiler/rustc_parse/src/parser/mod.rs:<rustc_parse::parser::Parser>::check_keyword_case
 3,173,364 ( 7.34%)  ???:???
 2,774,172 ( 6.42%)  ./malloc/./malloc/malloc.c:malloc
-2,284,897 (-5.29%)  /h/rustx/compiler/rustc_parse/src/parser/ty.rs:<rustc_parse::parser::Parser>::parse_ty_common
 2,270,285 ( 5.25%)  /h/rustx/compiler/rustc_parse/src/parser/ty.rs:<rustc_parse::parser::Parser>::parse_ty_common
 2,067,139 ( 4.78%)  /h/rustx/library/alloc/src/rc.rs:<alloc::rc::Rc<alloc::vec::Vec<rustc_ast::tokenstream::TokenTree>> as core::ops::drop::Drop>::drop
-2,022,692 (-4.68%)  /h/rustx/library/alloc/src/rc.rs:<alloc::rc::Rc<alloc::vec::Vec<rustc_ast::tokenstream::TokenTree>> as core::ops::drop::Drop>::drop
 1,962,130 ( 4.54%)  /h/rustx/compiler/rustc_parse/src/parser/item.rs:<rustc_parse::parser::Parser>::parse_item_kind
 1,950,946 ( 4.52%)  /h/rustx/library/core/src/ptr/mod.rs:<alloc::rc::Rc<alloc::vec::Vec<rustc_ast::tokenstream::TokenTree>> as core::ops::drop::Drop>::drop
-1,947,527 (-4.51%)  /h/rustx/library/core/src/ptr/mod.rs:<alloc::rc::Rc<alloc::vec::Vec<rustc_ast::tokenstream::TokenTree>> as core::ops::drop::Drop>::drop
 1,898,266 ( 4.39%)  /h/rustx/library/alloc/src/vec/mod.rs:<rustc_parse::parser::Parser>::check_keyword_case
 1,848,000 ( 4.28%)  /h/rustx/library/core/src/unicode/unicode_data.rs:core::unicode::unicode_data::conversions::to_lower
 1,450,368 ( 3.36%)  ./malloc/./malloc/malloc.c:free
-1,370,214 (-3.17%)  /h/rustx/library/alloc/src/vec/mod.rs:<rustc_parse::parser::Parser>::parse_item_common_
-1,324,848 (-3.07%)  /h/rustx/library/alloc/src/vec/mod.rs:<rustc_parse::parser::Parser>::check_fn_front_matter
 1,073,310 ( 2.48%)  /h/rustx/compiler/rustc_parse/src/parser/mod.rs:<rustc_parse::parser::Parser>::eat_keyword_case
 1,056,000 ( 2.44%)  /h/rustx/library/core/src/str/validations.rs:alloc::str::<impl str>::to_lowercase
-1,034,664 (-2.39%)  /h/rustx/compiler/rustc_parse/src/parser/mod.rs:<rustc_parse::parser::Parser>::check_fn_front_matter
   973,236 ( 2.25%)  /h/rustx/library/core/src/slice/iter/macros.rs:alloc::str::<impl str>::to_lowercase
   922,769 ( 2.14%)  /h/rustx/library/alloc/src/vec/mod.rs:<rustc_parse::parser::Parser>::parse_ty_common
   906,747 ( 2.10%)  /h/rustx/compiler/rustc_ast/src/token.rs:<rustc_ast::token::Token>::is_non_raw_ident_where::<<rustc_ast::token::Token>::is_keyword::{closure#0}>
  -887,508 (-2.05%)  /h/rustx/library/alloc/src/vec/mod.rs:<rustc_parse::parser::Parser>::parse_ty_common
  -829,182 (-1.92%)  /h/rustx/compiler/rustc_parse/src/parser/mod.rs:<rustc_parse::parser::Parser>::parse_item_common_
  -824,068 (-1.91%)  /h/rustx/compiler/rustc_parse/src/parser/item.rs:<rustc_parse::parser::Parser>::check_fn_front_matter
   792,000 ( 1.83%)  /h/rustx/library/alloc/src/vec/mod.rs:alloc::str::<impl str>::to_lowercase
   792,000 ( 1.83%)  /h/rustx/library/core/src/num/mod.rs:core::unicode::unicode_data::conversions::to_lower
   785,356 ( 1.82%)  /h/rustx/library/alloc/src/raw_vec.rs:alloc::raw_vec::RawVec<T,A>::allocate_in
   724,944 ( 1.68%)  /h/rustx/compiler/rustc_span/src/symbol.rs:<rustc_span::symbol::Interner>::get
  -721,773 (-1.67%)  /h/rustx/library/core/src/slice/iter/macros.rs:<rustc_parse::parser::Parser>::check_fn_front_matter
   705,735 ( 1.63%)  /h/rustx/compiler/rustc_parse/src/parser/mod.rs:<rustc_parse::parser::Parser>::eat_keyword
   652,037 ( 1.51%)  /h/rustx/compiler/rustc_parse/src/parser/mod.rs:<rustc_parse::parser::Parser>::look_ahead::<bool, <rustc_parse::parser::Parser>::isnt_macro_invocation::{closure#0}>

The thing to notice here is that there are a lot of to_lowercase added, which is very suspicious -- to_lowercase should only be called on the error path.

Upon further investigation I've found this function, where I accidentally used Case::Insensitive:

pub(super) fn parse_fn_front_matter(&mut self, orig_vis: &Visibility) -> PResult<'a, FnHeader> {
let sp_start = self.token.span;
let constness = self.parse_constness(Case::Insensitive);
let async_start_sp = self.token.span;
let asyncness = self.parse_asyncness(Case::Insensitive);
let unsafe_start_sp = self.token.span;
let unsafety = self.parse_unsafety(Case::Insensitive);
let ext_start_sp = self.token.span;
let ext = self.parse_extern(Case::Insensitive);
if let Async::Yes { span, .. } = asyncness {
self.ban_async_in_2015(span);
}
if !self.eat_keyword_case(kw::Fn, Case::Insensitive) {

I don't think it breaks the parser per se (check_fn_front_matter doesn't let you get to parse_fn_front_matter with malformed keywords), but I bet that's the thing that makes it slow.

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 23, 2022
Fix perf regression by correctly matching keywords

This should (hopefully) fix regression from rust-lang#99918

r? `@estebank`
Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023
Recover wrong-cased keywords that start items

(_this pr was inspired by [this tweet](https://twitter.com/Azumanga/status/1552982326409367561)_)

r? `@estebank`

We've talked a bit about this recovery, but I just wanted to make sure that this is the right approach :)

For now I've only added the case insensitive recovery to `use`s, since most other items like `impl` blocks, modules, functions can start with multiple keywords which complicates the matter.
Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023
Fix perf regression by correctly matching keywords

This should (hopefully) fix regression from rust-lang#99918

r? `@estebank`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants