Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch proc_macro RPC for TokenStream iteration and combination operations #98186

Merged
merged 7 commits into from
Jun 18, 2022

Conversation

mystor
Copy link
Contributor

@mystor mystor commented Jun 17, 2022

This is the first part of #86822, split off as requested in #86822 (review). It reduces the number of RPC calls required for common operations such as iterating over and concatenating TokenStreams.

@rust-highfive

This comment was marked as off-topic.

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jun 17, 2022
@rust-highfive

This comment was marked as outdated.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 17, 2022
@mystor
Copy link
Contributor Author

mystor commented Jun 17, 2022

r? @eddyb

@rust-highfive rust-highfive assigned eddyb and unassigned petrochenkov Jun 17, 2022
@eddyb
Copy link
Member

eddyb commented Jun 17, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 17, 2022
@bors
Copy link
Contributor

bors commented Jun 17, 2022

⌛ Trying commit 8494bf02bcef1af99174294c22e520e82af5a18e with merge a6e406c9dd73d5556dcb04c2d7471ddc2c1faf0e...

@rust-log-analyzer

This comment has been minimized.

… and iterate TokenStreams

This significantly reduces the cost of common interactions with TokenStream
when running with the CrossThread execution strategy, by reducing the number of
RPC calls required.
…tend impls

This is an experimental patch to try to reduce the codegen complexity of
TokenStream's FromIterator and Extend implementations for downstream
crates, by moving the core logic into a helper type. This might help
improve build performance of crates which depend on proc_macro as
iterators are used less, and the compiler may take less time to do
things like attempt specializations or other iterator optimizations.

The change intentionally sacrifices some optimization opportunities,
such as using the specializations for collecting iterators derived from
Vec::into_iter() into Vec.

This is one of the simpler potential approaches to reducing the amount
of code generated in crates depending on proc_macro, so it seems worth
trying before other more-involved changes.
@eddyb
Copy link
Member

eddyb commented Jun 17, 2022

Ugh, did the force push break bors/perf?

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@bors
Copy link
Contributor

bors commented Jun 17, 2022

⌛ Trying commit 4d45af9 with merge dfb02cb3ced64e8fe898f03c414d3654c4b59d88...

run_client(bridge, |input| {
f(crate::TokenStream(Some(input)))
.0
.unwrap_or_else(|| TokenStream::concat_streams(None, vec![]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite clear to me which close paren matches which open paren. Could you please use a temp variable somewhere to break this expression up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a new commit which moves this empty stream handling to the server side of the bridge while keeping the interface to callers the same, which should be more performant and hopefully easier to read.

@bors
Copy link
Contributor

bors commented Jun 17, 2022

☀️ Try build successful - checks-actions
Build commit: dfb02cb3ced64e8fe898f03c414d3654c4b59d88 (dfb02cb3ced64e8fe898f03c414d3654c4b59d88)

@rust-timer
Copy link
Collaborator

Queued dfb02cb3ced64e8fe898f03c414d3654c4b59d88 with parent 0423e06, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (dfb02cb3ced64e8fe898f03c414d3654c4b59d88): comparison url.

Instruction count

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
1.2% 1.2% 1
Improvements 🎉
(primary)
-1.0% -3.9% 26
Improvements 🎉
(secondary)
-12.1% -37.7% 15
All 😿🎉 (primary) -1.0% -3.9% 26

Max RSS (memory usage)

Results
  • Primary benchmarks: 🎉 relevant improvement found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
3.3% 3.3% 1
Improvements 🎉
(primary)
-2.1% -2.1% 1
Improvements 🎉
(secondary)
-2.7% -2.9% 2
All 😿🎉 (primary) -2.1% -2.1% 1

Cycles

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
4.0% 5.1% 2
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-4.5% -4.5% 1
Improvements 🎉
(secondary)
-21.0% -43.6% 10
All 😿🎉 (primary) 1.2% 5.1% 3

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 17, 2022
@lqd
Copy link
Member

lqd commented Jun 17, 2022

btw the lone regression mentioned by perfbot is noise: the coercions debug benchmark seems to be jumpy recently.

Copy link
Member

@eddyb eddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, most of the comments are nits, pretty much r=me when you run out of actionable/agreeable ones.

library/proc_macro/src/bridge/mod.rs Outdated Show resolved Hide resolved
library/proc_macro/src/bridge/mod.rs Outdated Show resolved Hide resolved
Comment on lines 336 to 349
impl<T: Mark> Mark for Vec<T> {
type Unmarked = Vec<T::Unmarked>;
fn mark(unmarked: Self::Unmarked) -> Self {
// Should be a no-op due to std's in-place collect optimizations.
unmarked.into_iter().map(T::mark).collect()
}
}
impl<T: Unmark> Unmark for Vec<T> {
type Unmarked = Vec<T::Unmarked>;
fn unmark(self) -> Self::Unmarked {
// Should be a no-op due to std's in-place collect optimizations.
self.into_iter().map(T::unmark).collect()
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This marking stuff is sad, I keep forgetting to look into whether the RPC traits can get another parameter (defaulting to Self) to allow dispatching on more than one type (assuming that's the problem in the first place - maybe long term such reshuffle could also allow passing sequences without allocating a Vec etc.).

library/proc_macro/src/bridge/mod.rs Show resolved Hide resolved
library/proc_macro/src/bridge/mod.rs Show resolved Hide resolved
Comment on lines +239 to +246
trees: Vec<
bridge::TokenTree<
bridge::client::Group,
bridge::client::Punct,
bridge::client::Ident,
bridge::client::Literal,
>,
>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random note: we should do some perf experiments using SmallVec (or a hacked up copy of it) with various sizes - since this is only ever on the stack in a "leaf" (client) function, that then passes that data to RPC, we might be able to set some large inline capacity like 32, and have it be an overall win.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite possibly. It would be really nice if we could easily depend on crates in proc_macro so we don't need to make a small copy of SmallVec to use it, but that's a problem for another PR.

Comment on lines 444 to +445
run_client(bridge, |(input, input2)| {
f(crate::TokenStream(input), crate::TokenStream(input2)).0
f(crate::TokenStream(Some(input)), crate::TokenStream(Some(input2))).0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to pass Option on the RPC to the client, as well, not just back to the server?

(Also I'm imagining we might be able to get rid of the split between single-input and dual-input entry-points, and assert!(input2.is_none()) somewhere seems like it would be easier than dealing with handles.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could I suppose, but it's mildly inconvenient and introduces a bunch of extra code without much benefit so I'm inclined not to for now.

compiler/rustc_expand/src/proc_macro_server.rs Outdated Show resolved Hide resolved
// FIXME: This is a raw port of the previous approach, and can probably
// be optimized.
let mut cursor = stream.into_trees();
let mut stack = Vec::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say that this can be SmallVec with an inline capacity of 2 or so, but it looks like what's actually happening is TokenTree::from_internal could actually be returning ArrayVec<_, 3> (I suppose at that point it's not even a FromInternal impl heh).

That is, despite being ominously named "stack" (as if it's some kind of vertical tree traversal), it's a weird FILO queue of pending additional TokenTrees (which your implementation could consume right away with e.g. a nested for loop).

(you can add a comment noting this, or just ignore it as a note to self)

continue;
let next = stack.pop().or_else(|| {
let next = cursor.next_with_spacing()?;
Some(TokenTree::from_internal((next, &mut stack, self)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random note: these from/to "internal" things should probably be s/internal/rustc, for clarity.

@eddyb
Copy link
Member

eddyb commented Jun 18, 2022

@bors r+

@bors
Copy link
Contributor

bors commented Jun 18, 2022

📌 Commit df925fd has been approved by eddyb

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 18, 2022
@bors
Copy link
Contributor

bors commented Jun 18, 2022

⌛ Testing commit df925fd with merge 0182fd9...

@bors
Copy link
Contributor

bors commented Jun 18, 2022

☀️ Test successful - checks-actions
Approved by: eddyb
Pushing 0182fd9 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 18, 2022
@bors bors merged commit 0182fd9 into rust-lang:master Jun 18, 2022
@rustbot rustbot added this to the 1.63.0 milestone Jun 18, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (0182fd9): comparison url.

Instruction count

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-1.0% -4.0% 26
Improvements 🎉
(secondary)
-12.1% -37.5% 15
All 😿🎉 (primary) -1.0% -4.0% 26

Max RSS (memory usage)

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 🎉 relevant improvement found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-2.5% -2.5% 1
All 😿🎉 (primary) N/A N/A 0

Cycles

Results
  • Primary benchmarks: 🎉 relevant improvement found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
4.9% 4.9% 1
Improvements 🎉
(primary)
-3.4% -3.4% 1
Improvements 🎉
(secondary)
-19.4% -42.3% 11
All 😿🎉 (primary) -3.4% -3.4% 1

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

mystor added a commit to mystor/rust that referenced this pull request Jul 29, 2022
…idge

This is done by having the crossbeam dependency inserted into the
proc_macro server code from the server side, to avoid adding a
dependency to proc_macro.

In addition, this introduces a -Z command-line option which will switch
rustc to run proc-macros using this cross-thread executor. With the
changes to the bridge in rust-lang#98186, rust-lang#98187, rust-lang#98188 and rust-lang#98189, the
performance of the executor should be much closer to same-thread
execution.

In local testing, the crossbeam executor was substantially more
performant than either of the two existing CrossThread strategies, so
they have been removed to keep things simple.
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 30, 2022
proc_macro: use crossbeam channels for the proc_macro cross-thread bridge

This is done by having the crossbeam dependency inserted into the `proc_macro` server code from the server side, to avoid adding a dependency to `proc_macro`.

In addition, this introduces a -Z command-line option which will switch rustc to run proc-macros using this cross-thread executor. With the changes to the bridge in rust-lang#98186, rust-lang#98187, rust-lang#98188 and rust-lang#98189, the performance of the executor should be much closer to same-thread execution.

In local testing, the crossbeam executor was substantially more performant than either of the two existing `CrossThread` strategies, so they have been removed to keep things simple.

r? `@eddyb`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants