Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce RPC overhead for common proc_macro operations #86822

Closed
wants to merge 11 commits into from

Conversation

mystor
Copy link
Contributor

@mystor mystor commented Jul 2, 2021

This is a reduced version of #86816 without changes which are only relevant to the CrossThread execution strategy. With these changes there are similar performance improvements to the SameThread execution strategy as in the previous bug, however the optimizations to CrossThread are less significant. If they're desired, the CrossThread changes could be landed separately.

Given the same test situation as the #86816, this patch stack gets the following numbers:

PR (SameThread)
~/src/serde/test_suite$ cargo +dev check --tests && rm -f ../target/debug/deps/libtest_*.rmeta && time cargo +dev check --tests
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
    Checking serde_test_suite v0.0.0 (/home/nika/src/serde/test_suite)
    Finished dev [unoptimized + debuginfo] target(s) in 0.76s

real	0m0.776s
user	0m2.286s
sys	0m0.433s
PR (CrossThread2)
~/src/serde/test_suite$ cargo +dev check --tests && rm -f ../target/debug/deps/libtest_*.rmeta && time cargo +dev check --tests
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
    Checking serde_test_suite v0.0.0 (/home/nika/src/serde/test_suite)
    Finished dev [unoptimized + debuginfo] target(s) in 1.19s

real	0m1.199s
user	0m2.841s
sys	0m1.116s

As with the previous patch, each commit should be individually reviewable.

Blocked on #90876

@rust-highfive
Copy link
Collaborator

r? @petrochenkov

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 2, 2021
@rust-log-analyzer

This comment has been minimized.

@bors

This comment has been minimized.

@mystor mystor force-pushed the min_proc_macro branch 2 times, most recently from b212de2 to 3afaa9f Compare July 3, 2021 03:18
@@ -366,14 +451,16 @@ fn maybe_install_panic_hook(force_show_panics: bool) {
});
}

static SYMBOL_COUNTER: AtomicUsize = AtomicUsize::new(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be problematic when using rust-analyzer on 32bit systems as now only 2^32 different idents are possible for the entire time the editor is open. This could be hit when a proc macro randomizes the identifier names. (proc-macros should be deterministic, but are not required to) Rust-analyzer loads the proc macro once at startup and never unloads it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handles are actually truncated to 32 bits before being used in OwnedStore, so this will actually impact both 32-bit and 64-bit hosts similarly:

let handle = Handle::new(counter as u32).expect("`proc_macro` handle counter overflowed");
assert!(self.data.insert(handle, x).is_none());

This was actually already the case before the changes in this patch stack, and this change isn't making anything worse. Previously the code atomic was defined in handle definition macro instead of being written out separately:

$($ity: AtomicUsize::new(1),)*

Whenever a new Ident was created it would be intered in this store, and the counter stored in the client dylib's globals would be atomically incremented by {Interned,Owning}Store. To my knowledge, @eddyb did this to make sure that if a proc macro stashed an Ident (or Literal, Span or other handle) in global state it would not be possible to observe identifier re-use on subsequent invocations. Because of that, randomized idents won't actually be necessary to exhaust this number, as *Store is destroyed after each macro invocation, but the counters are global (within each loaded dylib) so a set of completely fresh integers will be used for the next invocation's store.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make it thread-local if we're only worried about illegal stashing of handles in proc macro TLS (and not unsafe code being used to move them between threads), but then it will still run out unless RA keeps spawning threads.

@rust-log-analyzer

This comment has been minimized.

@petrochenkov
Copy link
Contributor

petrochenkov commented Jul 4, 2021

(I'll get to this next weekend around July 18, most likely.)

@petrochenkov
Copy link
Contributor

I did some preliminary review, it looks like the general idea is moving majority of token's internal data (as well as some immutable global data) to the library/proc_macro side of the bridge, so that more operations can be performed without crossing it.

The downside is duplication of data and also duplication of some logic as well between the library/proc_macro and compiler/rustc_expand. I'm personally entirely fine with duplicating data, but would prefer to avoid duplicating logic if possible.

I'm going to r? @eddyb for design review since the proc macro bridge is entirely an eddyb's creation, but feel free to return this to me later for a more detailed code review.

@bors
Copy link
Contributor

bors commented Jul 30, 2021

☔ The latest upstream changes (presumably #87445) made this pull request unmergeable. Please resolve the merge conflicts.

@JohnCSimon
Copy link
Member

Ping from triage: can you please address the merge conflicts? Thank you.
@rustbot label +S-waiting-on-author -S-waiting-on-review

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 15, 2021
@JohnCSimon
Copy link
Member

Ping from triage:
@mystor
can you please address the merge conflicts? Thank you.

@mystor
Copy link
Contributor Author

mystor commented Sep 6, 2021

can you please address the merge conflicts? Thank you.

I don't believe that this patch is going to land any time soon as @eddyb hasn't had the time to look at it. I do plan to rebase the changes once #87264 has landed, however.

@JohnCSimon JohnCSimon added the S-blocked Status: Blocked on something else such as an RFC or other implementation work. label Sep 27, 2021
@apiraino apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 14, 2021
@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 31, 2021
@mystor
Copy link
Contributor Author

mystor commented Nov 13, 2021

update: #87264 has landed now, but I'm going to wait until we figure out if we're going to land the performance regression fixes in #90876 to avoid rebasing this PR twice.

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 12, 2021
@JohnCSimon JohnCSimon removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jan 30, 2022
This requires a dependency on `unicode-normalization` and `rustc_lexer`, which
is currently not possible for `proc_macro`. Instead, a second `extern "C" fn`
is provided by the compiler server to perform these steps from any thread.

String values are interned in both the server and client, meaning that
identifiers can be stringified without any RPC roundtrips without substantially
inflating their size.

RPC messages passing symbols include the full un-interned value, and are
re-interned on the receiving side. This could potentially be optimized in the
future.

The symbol infrastructure will alwo be used for literals in a following part.
This builds on the symbol infrastructure built for ident to replicate the
`LitKind` and `Lit` structures in rustc within the `proc_macro` client,
allowing literals to be fully created and interacted with from the client
thread. Only parsing and subspan operations still require sync RPC.
…tend impls

This is an experimental patch to try to reduce the codegen complexity of
TokenStream's FromIterator and Extend implementations for downstream
crates, by moving the core logic into a helper type. This might help
improve build performance of crates which depend on proc_macro as
iterators are used less, and the compiler may take less time to do
things like attempt specializations or other iterator optimizations.

The change intentionally sacrifices some optimization opportunities,
such as using the specializations for collecting iterators derived from
Vec::into_iter() into Vec.

This is one of the simpler potential approaches to reducing the amount
of code generated in crates depending on proc_macro, so it seems worth
trying before other more-involved changes.
Copy link
Member

@eddyb eddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long to make time for this, the large diff worried me there were more fundamental changes, but it was straightforward to at least skim.

I think we can land some parts of this right away, assuming they're a net perf win - my thinking is that this is roughly 3 changes:

  1. TokenStream serialized as Vec<TokenTree>: first 3 commits plus the 5th,
    "proc_macro: reduce the number of messages required to create, extend and iterate TokenStreams"
    • maybe also the last commit (perf tradeoff for the above, IIUC),
      "Try to reduce codegen complexity of TokenStream's FromIterator and Extend impls"
  2. Bridge vs BridgeConfig vs ExpnConfig: 4th commit,
    "proc_macro: cache static spans in client's thread-local state"
  3. the rest of the commits, transitioning to a more serialize-heavy setup,
    "proc_macro: stop using a remote object handle for ..."

(while I see some connections between those parts, I hope to finish reviewing them more in isolation)

@mystor
Copy link
Contributor Author

mystor commented Jun 17, 2022

I've filed #98186, #98187, #98188 and #98189 as parts of the work here. Each depends on the previous patch, so will need to be landed in order. I ended up splitting Punct and Group out from Ident and Literal, as the latter two are significantly more involved changes due to needing the Symbol type and more complex validation.

bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 18, 2022
Batch proc_macro RPC for TokenStream iteration and combination operations

This is the first part of rust-lang#86822, split off as requested in rust-lang#86822 (review). It reduces the number of RPC calls required for common operations such as iterating over and concatenating TokenStreams.
@bors
Copy link
Contributor

bors commented Jun 18, 2022

☔ The latest upstream changes (presumably #98186) made this pull request unmergeable. Please resolve the merge conflicts.

bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 26, 2022
proc_macro/bridge: cache static spans in proc_macro's client thread-local state

This is the second part of rust-lang#86822, split off as requested in rust-lang#86822 (review). This patch removes the RPC calls required for the very common operations of `Span::call_site()`, `Span::def_site()` and `Span::mixed_site()`.

Some notes:

This part is one of the ones I don't love as a final solution from a design standpoint, because I don't like how the spans are serialized immediately at macro invocation. I think a more elegant solution might've been to reserve special IDs for `call_site`, `def_site`, and `mixed_site` at compile time (either starting at 1 or from `u32::MAX`) and making reading a Span handle automatically map these IDs to the relevant values, rather than doing extra serialization.

This would also have an advantage for potential future work to allow `proc_macro` to operate more independently from the compiler (e.g. to reduce the necessity of `proc-macro2`), as methods like `Span::call_site()` could be made to function without access to the compiler backend.

That was unfortunately tricky to do at the time, as this was the first part I wrote of the patches. After the later part (rust-lang#98188, rust-lang#98189), the other uses of `InternedStore` are removed meaning that a custom serialization strategy for `Span` is easier to implement.

If we want to go that path, we'll still need the majority of the work to split the bridge object and introduce the `Context` trait for free methods, and it will be easier to do after `Span` is the only user of `InternedStore` (after rust-lang#98189).
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 28, 2022
proc_macro/bridge: stop using a remote object handle for proc_macro Punct and Group

This is the third part of rust-lang#86822, split off as requested in rust-lang#86822 (review). This patch transforms the `Punct` and `Group` types into structs serialized over IPC rather than handles, making them more efficient to create and manipulate from within proc-macros.
@mystor
Copy link
Contributor Author

mystor commented Jul 1, 2022

Closing as #98186, #98187, and #98188 have landed, leaving only #98189.

@mystor mystor closed this Jul 1, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 19, 2022
proc_macro/bridge: stop using a remote object handle for proc_macro Ident and Literal

This is the fourth part of rust-lang#86822, split off as requested in rust-lang#86822 (review). This patch transforms the `Ident` and `Group` types into structs serialized over IPC rather than handles.

Symbol values are interned on both the client and server when deserializing, to avoid unnecessary string copies and keep the size of `TokenTree` down. To do the interning efficiently on the client, the proc-macro crate is given a vendored version of the fxhash hasher, as `SipHash` appeared to cause performance issues. This was done rather than depending on `rustc_hash` as it is unfortunately difficult to depend on crates from within `proc_macro` due to it being built at the same time as `std`.

In addition, a custom arena allocator and symbol store was also added, inspired by those in `rustc_arena` and `rustc_span`. To prevent symbol re-use across multiple invocations of a macro on the same thread, a new range of `Symbol` names are used for each invocation of the macro, and symbols from previous invocations are cleaned-up.

In order to keep `Ident` creation efficient, a special ASCII-only case was added to perform ident validation without using RPC for simple identifiers. Full identifier validation couldn't be easily added, as it would require depending on the `rustc_lexer` and `unicode-normalization` crates from within `proc_macro`. Unicode identifiers are validated and normalized using RPC.

See the individual commit messages for more details on trade-offs and design decisions behind these patches.
workingjubilee pushed a commit to tcdi/postgrestd that referenced this pull request Sep 15, 2022
Batch proc_macro RPC for TokenStream iteration and combination operations

This is the first part of #86822, split off as requested in rust-lang/rust#86822 (review). It reduces the number of RPC calls required for common operations such as iterating over and concatenating TokenStreams.
workingjubilee pushed a commit to tcdi/postgrestd that referenced this pull request Sep 15, 2022
proc_macro/bridge: cache static spans in proc_macro's client thread-local state

This is the second part of rust-lang/rust#86822, split off as requested in rust-lang/rust#86822 (review). This patch removes the RPC calls required for the very common operations of `Span::call_site()`, `Span::def_site()` and `Span::mixed_site()`.

Some notes:

This part is one of the ones I don't love as a final solution from a design standpoint, because I don't like how the spans are serialized immediately at macro invocation. I think a more elegant solution might've been to reserve special IDs for `call_site`, `def_site`, and `mixed_site` at compile time (either starting at 1 or from `u32::MAX`) and making reading a Span handle automatically map these IDs to the relevant values, rather than doing extra serialization.

This would also have an advantage for potential future work to allow `proc_macro` to operate more independently from the compiler (e.g. to reduce the necessity of `proc-macro2`), as methods like `Span::call_site()` could be made to function without access to the compiler backend.

That was unfortunately tricky to do at the time, as this was the first part I wrote of the patches. After the later part (#98188, #98189), the other uses of `InternedStore` are removed meaning that a custom serialization strategy for `Span` is easier to implement.

If we want to go that path, we'll still need the majority of the work to split the bridge object and introduce the `Context` trait for free methods, and it will be easier to do after `Span` is the only user of `InternedStore` (after #98189).
workingjubilee pushed a commit to tcdi/postgrestd that referenced this pull request Sep 15, 2022
proc_macro/bridge: stop using a remote object handle for proc_macro Punct and Group

This is the third part of rust-lang/rust#86822, split off as requested in rust-lang/rust#86822 (review). This patch transforms the `Punct` and `Group` types into structs serialized over IPC rather than handles, making them more efficient to create and manipulate from within proc-macros.
workingjubilee pushed a commit to tcdi/postgrestd that referenced this pull request Sep 15, 2022
proc_macro/bridge: stop using a remote object handle for proc_macro Ident and Literal

This is the fourth part of rust-lang/rust#86822, split off as requested in rust-lang/rust#86822 (review). This patch transforms the `Ident` and `Group` types into structs serialized over IPC rather than handles.

Symbol values are interned on both the client and server when deserializing, to avoid unnecessary string copies and keep the size of `TokenTree` down. To do the interning efficiently on the client, the proc-macro crate is given a vendored version of the fxhash hasher, as `SipHash` appeared to cause performance issues. This was done rather than depending on `rustc_hash` as it is unfortunately difficult to depend on crates from within `proc_macro` due to it being built at the same time as `std`.

In addition, a custom arena allocator and symbol store was also added, inspired by those in `rustc_arena` and `rustc_span`. To prevent symbol re-use across multiple invocations of a macro on the same thread, a new range of `Symbol` names are used for each invocation of the macro, and symbols from previous invocations are cleaned-up.

In order to keep `Ident` creation efficient, a special ASCII-only case was added to perform ident validation without using RPC for simple identifiers. Full identifier validation couldn't be easily added, as it would require depending on the `rustc_lexer` and `unicode-normalization` crates from within `proc_macro`. Unicode identifiers are validated and normalized using RPC.

See the individual commit messages for more details on trade-offs and design decisions behind these patches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.