- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.9k
Macro parser performance improvements and refactoring #37701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Macro parser performance improvements and refactoring #37701
Conversation
|  | 
11bf624    to
    8c10253      
    Compare
  
            
          
                src/libsyntax/ext/tt/macro_parser.rs
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this recreates the Rc that we just unwrapped. Perhaps this could be avoided with something like:
            while let TokenTree::Token(sp, token::Interpolated(nt)) = tt {
                if let token::NtTT(..) = *nt {
                    match Rc::try_unwrap(nt) {
                        Ok(token::NtTT(sub_tt)) => tt = sub_tt,
                        Ok(_) => unreachable!(),
                        Err(nt_rc) => match *nt_rc {
                            token::NtTT(ref sub_tt) => tt = sub_tt.clone(),
                            _ => unreachable!(),
                        },
                    }
                } else {
                    tt = TokenTree::Token(sp, token::Interpolated(nt.clone()));
                    break
                }
            }a5c2641    to
    5fd8f3d      
    Compare
  
    Also makes nameize non-public since it's only locally used.
Change multiple functions to be non-public. Change nameize to accept an iterator so as to avoid an allocation.
5fd8f3d    to
    2189f57      
    Compare
  
    | new_pos.matches[idx] | ||
| .push(Rc::new(MatchedSeq(sub, mk_sp(ei.sp_lo, | ||
| span.hi)))); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a way to avoid doing this work here, that would likely bring significant wins. Callgrind reports ~1 trillion instructions (if I'm reading it right) spent on the let sub = ei.matches[idx].clone(); line for the workload I'm profiling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to avoid this work by refactoring Rc out of Vec in NamedMatch::MatchedSeq, that is MatchedSeq(Vec<Rc<NamedMatch>>, syntax_pos::Span) -> MatchedSeq(Rc<Vec<NamedMatch>>, syntax_pos::Span), perhaps with a RefCell between the Rc and Vec.
Then, sub would be an Rc so cloning would be cheap, and Rc<NamedMatch> would never be needed since cloning NamedMatch would be almost as cheap as an cloning an Rc.
| After realizing that I was benchmarking old code (from the make build system, I switched to rustbuild mid-PR); latest perf gains are even higher: from ~890 seconds with  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! r=me unless you'd like to add more commits.
| I'll try the optimization suggested by @jseyfried and clean up history a little. | 
abc57c4    to
    2189f57      
    Compare
  
    | Okay, so the suggested optimization doesn't work without more changes, since then we start sharing the vector instead of duplicating it--and we don't want to share it. It's something to look into for the future, but I'll leave it as out of scope for this PR since the changes are unlikely to be trivial. | 
| @bors r+ | 
| 📌 Commit 2189f57 has been approved by  | 
…fried Macro parser performance improvements and refactoring This PR locally increased performance of #37074 by ~6.6 minutes. Follow up to #37569, but doesn't focus explicitly on expansion performance. History is relatively clean, but I can/will do some more polishing if this is deemed mergeable. Partially posting this now so I can get Travis to run tests for me. r? @jseyfried
This PR locally increased performance of #37074 by ~6.6 minutes.
Follow up to #37569, but doesn't focus explicitly on expansion performance.
History is relatively clean, but I can/will do some more polishing if this is deemed mergeable. Partially posting this now so I can get Travis to run tests for me.
r? @jseyfried