-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proc_macro: use crossbeam channels for the proc_macro cross-thread bridge #99123
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
I don't think it's worth doing a perf run with the cross-thread executor enabled until at least #98189 has landed, and this patch shouldn't impact perf at all for the existing same-thread executor. We can do a perf comparison with a PR changing the default after these patches have both landed to see how much more work is left before a cross-thread executor is a viable option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, and being able to test cross-thread proc macro isolation would be great, but I'm not sure if the new -Z
flag requires an MCP (cc @rust-lang/compiler).
MCP has completed (see also the Zulip thread, which was mostly discussing the background of proc macro isolation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me after rebase
Some changes occurred in library/proc_macro/src/bridge cc @rust-lang/wg-rls-2 |
This comment has been minimized.
This comment has been minimized.
…idge This is done by having the crossbeam dependency inserted into the proc_macro server code from the server side, to avoid adding a dependency to proc_macro. In addition, this introduces a -Z command-line option which will switch rustc to run proc-macros using this cross-thread executor. With the changes to the bridge in rust-lang#98186, rust-lang#98187, rust-lang#98188 and rust-lang#98189, the performance of the executor should be much closer to same-thread execution. In local testing, the crossbeam executor was substantially more performant than either of the two existing CrossThread strategies, so they have been removed to keep things simple.
@bors r+ rollup=never |
☀️ Test successful - checks-actions |
Finished benchmarking commit (bd84c73): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Footnotes |
Having said that, clicking on "show non-relevant results" shows some possible improvements. E.g. |
cc @wesleywiser Just noticed something here - so Regarding the apparent regression: if this is LLVM doing something weird, we could mark the cross-thread executor method as |
I don't know if its helpful @eddyb , but I've recreated the regression locally, and I think the bulk of the increase in instruction count is coming from
In any case, I'm going to take a shot now at marking the cross-thread executor method as |
Unfortunately I took my shot (see below) at marking the cross-thread executor as diffdiff --git a/library/proc_macro/src/bridge/server.rs b/library/proc_macro/src/bridge/server.rs
index e068ec60b6a..a29fae17292 100644
--- a/library/proc_macro/src/bridge/server.rs
+++ b/library/proc_macro/src/bridge/server.rs
@@ -213,6 +213,8 @@ impl<P> ExecutionStrategy for CrossThread<P>
where
P: MessagePipe<Buffer> + Send + 'static,
{
+ #[cold]
+ #[inline(never)]
fn run_bridge_and_client(
&self,
dispatcher: &mut impl DispatcherTrait, |
proc_macro_execution_strategy: ProcMacroExecutionStrategy = (ProcMacroExecutionStrategy::SameThread, | ||
parse_proc_macro_execution_strategy, [UNTRACKED], | ||
"how to run proc-macro code (default: same-thread)"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pnkfelix I'm running out of ideas... what if you move this to the end? (as in, maybe the order of the fields or their offsets, in this huge struct
, is causing weird things)
It could even be its presence - the flag could be removed and exec_strategy
(the sole user of the flag, in compiler/rustc_expand/src/proc_macro.rs
) made to, idk, use std::env::var
instead? (or just hardcode the choice, but that might cause changes in optimizations)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wacky! I moved the field to the end of the struct, and that made the two methods with the two highest instruction counts (parse_tt
and MacroRulesMacroExpander::expand
) each have lower instruction-counts (Ir) according to cachegrind
, but the overall instruction-count increased:
Instruction Counts | 8f68c43c | bd84c73f | field-at-end |
---|---|---|---|
TOTAL | 988,908,196 (100.0%) | 1,002,719,736 (100.0%) | 1,138,908,984 (100.0%) |
parse_tt | 134,903,906 (13.64%) | 149,152,914 (14.87%) | 99,794,157 (8.76%) |
MacroRules MacroExpander ::expand |
56,387,220 (5.70%) | 56,379,944 (5.62%) | 50,969,953 (4.48%) |
SipHasher128::finish128 | 5,975,227 (0.60%) | 5,975,227 (0.60%) | 36,508,882 (3.21%) |
token_name_eq | (not in report) | (not in report) | 29,133,905 (2.56%) |
try_mark_previous_green | 53,862,272 (5.45%) | 53,872,906 (5.37%) | 25,344,946 (2.23%) |
I don't really want to spend too much more trying to dissect this thing that is probably just an LLVM codegen oddity, but I will at least try eliminating the field entirely as you suggest.
(As an aside, the fact that this kind of field shuffling had such an enormous impact on parse_tt
does make me wonder if there's some kind of low-hanging fruit embedded in the code there, e.g. maybe the code is accessing the session when it should be locally stashing an unchanging value? But isn't that the kind of thing you'd like to believe your compiler is doing for you automatically? 😆 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that the code is hot, and also that register spilling is happening in a different way or inlining choices are a little different. I.e. the kind of thing that's hard to control, and even if you manage to get it in the faster state now a tiny change later on might make it go back to the slower state.
This is done by having the crossbeam dependency inserted into the
proc_macro
server code from the server side, to avoid adding a dependency toproc_macro
.In addition, this introduces a -Z command-line option which will switch rustc to run proc-macros using this cross-thread executor. With the changes to the bridge in #98186, #98187, #98188 and #98189, the performance of the executor should be much closer to same-thread execution.
In local testing, the crossbeam executor was substantially more performant than either of the two existing
CrossThread
strategies, so they have been removed to keep things simple.r? @eddyb