-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce RPC overhead for common proc_macro operations #86822
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
b212de2
to
3afaa9f
Compare
@@ -366,14 +451,16 @@ fn maybe_install_panic_hook(force_show_panics: bool) { | |||
}); | |||
} | |||
|
|||
static SYMBOL_COUNTER: AtomicUsize = AtomicUsize::new(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be problematic when using rust-analyzer on 32bit systems as now only 2^32 different idents are possible for the entire time the editor is open. This could be hit when a proc macro randomizes the identifier names. (proc-macros should be deterministic, but are not required to) Rust-analyzer loads the proc macro once at startup and never unloads it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handles are actually truncated to 32 bits before being used in OwnedStore
, so this will actually impact both 32-bit and 64-bit hosts similarly:
rust/library/proc_macro/src/bridge/handle.rs
Lines 29 to 30 in a8b8558
let handle = Handle::new(counter as u32).expect("`proc_macro` handle counter overflowed"); | |
assert!(self.data.insert(handle, x).is_none()); |
This was actually already the case before the changes in this patch stack, and this change isn't making anything worse. Previously the code atomic was defined in handle definition macro instead of being written out separately:
$($ity: AtomicUsize::new(1),)* |
Whenever a new Ident
was created it would be intered in this store, and the counter stored in the client dylib's globals would be atomically incremented by {Interned,Owning}Store
. To my knowledge, @eddyb did this to make sure that if a proc macro stashed an Ident
(or Literal
, Span
or other handle) in global state it would not be possible to observe identifier re-use on subsequent invocations. Because of that, randomized idents won't actually be necessary to exhaust this number, as *Store
is destroyed after each macro invocation, but the counters are global (within each loaded dylib) so a set of completely fresh integers will be used for the next invocation's store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make it thread-local if we're only worried about illegal stashing of handles in proc macro TLS (and not unsafe
code being used to move them between threads), but then it will still run out unless RA keeps spawning threads.
This comment has been minimized.
This comment has been minimized.
(I'll get to this |
I did some preliminary review, it looks like the general idea is moving majority of token's internal data (as well as some immutable global data) to the The downside is duplication of data and also duplication of some logic as well between the I'm going to r? @eddyb for design review since the proc macro bridge is entirely an eddyb's creation, but feel free to return this to me later for a more detailed code review. |
☔ The latest upstream changes (presumably #87445) made this pull request unmergeable. Please resolve the merge conflicts. |
Ping from triage: can you please address the merge conflicts? Thank you. |
Ping from triage: |
This requires a dependency on `unicode-normalization` and `rustc_lexer`, which is currently not possible for `proc_macro`. Instead, a second `extern "C" fn` is provided by the compiler server to perform these steps from any thread. String values are interned in both the server and client, meaning that identifiers can be stringified without any RPC roundtrips without substantially inflating their size. RPC messages passing symbols include the full un-interned value, and are re-interned on the receiving side. This could potentially be optimized in the future. The symbol infrastructure will alwo be used for literals in a following part.
This builds on the symbol infrastructure built for ident to replicate the `LitKind` and `Lit` structures in rustc within the `proc_macro` client, allowing literals to be fully created and interacted with from the client thread. Only parsing and subspan operations still require sync RPC.
…tend impls This is an experimental patch to try to reduce the codegen complexity of TokenStream's FromIterator and Extend implementations for downstream crates, by moving the core logic into a helper type. This might help improve build performance of crates which depend on proc_macro as iterators are used less, and the compiler may take less time to do things like attempt specializations or other iterator optimizations. The change intentionally sacrifices some optimization opportunities, such as using the specializations for collecting iterators derived from Vec::into_iter() into Vec. This is one of the simpler potential approaches to reducing the amount of code generated in crates depending on proc_macro, so it seems worth trying before other more-involved changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking so long to make time for this, the large diff worried me there were more fundamental changes, but it was straightforward to at least skim.
I think we can land some parts of this right away, assuming they're a net perf win - my thinking is that this is roughly 3 changes:
TokenStream
serialized asVec<TokenTree>
: first 3 commits plus the 5th,
"proc_macro: reduce the number of messages required to create, extend and iterate TokenStreams"- maybe also the last commit (perf tradeoff for the above, IIUC),
"Try to reduce codegen complexity of TokenStream's FromIterator and Extend impls"
- maybe also the last commit (perf tradeoff for the above, IIUC),
Bridge
vsBridgeConfig
vsExpnConfig
: 4th commit,
"proc_macro: cache static spans in client's thread-local state"- the rest of the commits, transitioning to a more serialize-heavy setup,
"proc_macro: stop using a remote object handle for ..."
(while I see some connections between those parts, I hope to finish reviewing them more in isolation)
I've filed #98186, #98187, #98188 and #98189 as parts of the work here. Each depends on the previous patch, so will need to be landed in order. I ended up splitting |
Batch proc_macro RPC for TokenStream iteration and combination operations This is the first part of rust-lang#86822, split off as requested in rust-lang#86822 (review). It reduces the number of RPC calls required for common operations such as iterating over and concatenating TokenStreams.
☔ The latest upstream changes (presumably #98186) made this pull request unmergeable. Please resolve the merge conflicts. |
proc_macro/bridge: cache static spans in proc_macro's client thread-local state This is the second part of rust-lang#86822, split off as requested in rust-lang#86822 (review). This patch removes the RPC calls required for the very common operations of `Span::call_site()`, `Span::def_site()` and `Span::mixed_site()`. Some notes: This part is one of the ones I don't love as a final solution from a design standpoint, because I don't like how the spans are serialized immediately at macro invocation. I think a more elegant solution might've been to reserve special IDs for `call_site`, `def_site`, and `mixed_site` at compile time (either starting at 1 or from `u32::MAX`) and making reading a Span handle automatically map these IDs to the relevant values, rather than doing extra serialization. This would also have an advantage for potential future work to allow `proc_macro` to operate more independently from the compiler (e.g. to reduce the necessity of `proc-macro2`), as methods like `Span::call_site()` could be made to function without access to the compiler backend. That was unfortunately tricky to do at the time, as this was the first part I wrote of the patches. After the later part (rust-lang#98188, rust-lang#98189), the other uses of `InternedStore` are removed meaning that a custom serialization strategy for `Span` is easier to implement. If we want to go that path, we'll still need the majority of the work to split the bridge object and introduce the `Context` trait for free methods, and it will be easier to do after `Span` is the only user of `InternedStore` (after rust-lang#98189).
proc_macro/bridge: stop using a remote object handle for proc_macro Punct and Group This is the third part of rust-lang#86822, split off as requested in rust-lang#86822 (review). This patch transforms the `Punct` and `Group` types into structs serialized over IPC rather than handles, making them more efficient to create and manipulate from within proc-macros.
proc_macro/bridge: stop using a remote object handle for proc_macro Ident and Literal This is the fourth part of rust-lang#86822, split off as requested in rust-lang#86822 (review). This patch transforms the `Ident` and `Group` types into structs serialized over IPC rather than handles. Symbol values are interned on both the client and server when deserializing, to avoid unnecessary string copies and keep the size of `TokenTree` down. To do the interning efficiently on the client, the proc-macro crate is given a vendored version of the fxhash hasher, as `SipHash` appeared to cause performance issues. This was done rather than depending on `rustc_hash` as it is unfortunately difficult to depend on crates from within `proc_macro` due to it being built at the same time as `std`. In addition, a custom arena allocator and symbol store was also added, inspired by those in `rustc_arena` and `rustc_span`. To prevent symbol re-use across multiple invocations of a macro on the same thread, a new range of `Symbol` names are used for each invocation of the macro, and symbols from previous invocations are cleaned-up. In order to keep `Ident` creation efficient, a special ASCII-only case was added to perform ident validation without using RPC for simple identifiers. Full identifier validation couldn't be easily added, as it would require depending on the `rustc_lexer` and `unicode-normalization` crates from within `proc_macro`. Unicode identifiers are validated and normalized using RPC. See the individual commit messages for more details on trade-offs and design decisions behind these patches.
Batch proc_macro RPC for TokenStream iteration and combination operations This is the first part of #86822, split off as requested in rust-lang/rust#86822 (review). It reduces the number of RPC calls required for common operations such as iterating over and concatenating TokenStreams.
proc_macro/bridge: cache static spans in proc_macro's client thread-local state This is the second part of rust-lang/rust#86822, split off as requested in rust-lang/rust#86822 (review). This patch removes the RPC calls required for the very common operations of `Span::call_site()`, `Span::def_site()` and `Span::mixed_site()`. Some notes: This part is one of the ones I don't love as a final solution from a design standpoint, because I don't like how the spans are serialized immediately at macro invocation. I think a more elegant solution might've been to reserve special IDs for `call_site`, `def_site`, and `mixed_site` at compile time (either starting at 1 or from `u32::MAX`) and making reading a Span handle automatically map these IDs to the relevant values, rather than doing extra serialization. This would also have an advantage for potential future work to allow `proc_macro` to operate more independently from the compiler (e.g. to reduce the necessity of `proc-macro2`), as methods like `Span::call_site()` could be made to function without access to the compiler backend. That was unfortunately tricky to do at the time, as this was the first part I wrote of the patches. After the later part (#98188, #98189), the other uses of `InternedStore` are removed meaning that a custom serialization strategy for `Span` is easier to implement. If we want to go that path, we'll still need the majority of the work to split the bridge object and introduce the `Context` trait for free methods, and it will be easier to do after `Span` is the only user of `InternedStore` (after #98189).
proc_macro/bridge: stop using a remote object handle for proc_macro Punct and Group This is the third part of rust-lang/rust#86822, split off as requested in rust-lang/rust#86822 (review). This patch transforms the `Punct` and `Group` types into structs serialized over IPC rather than handles, making them more efficient to create and manipulate from within proc-macros.
proc_macro/bridge: stop using a remote object handle for proc_macro Ident and Literal This is the fourth part of rust-lang/rust#86822, split off as requested in rust-lang/rust#86822 (review). This patch transforms the `Ident` and `Group` types into structs serialized over IPC rather than handles. Symbol values are interned on both the client and server when deserializing, to avoid unnecessary string copies and keep the size of `TokenTree` down. To do the interning efficiently on the client, the proc-macro crate is given a vendored version of the fxhash hasher, as `SipHash` appeared to cause performance issues. This was done rather than depending on `rustc_hash` as it is unfortunately difficult to depend on crates from within `proc_macro` due to it being built at the same time as `std`. In addition, a custom arena allocator and symbol store was also added, inspired by those in `rustc_arena` and `rustc_span`. To prevent symbol re-use across multiple invocations of a macro on the same thread, a new range of `Symbol` names are used for each invocation of the macro, and symbols from previous invocations are cleaned-up. In order to keep `Ident` creation efficient, a special ASCII-only case was added to perform ident validation without using RPC for simple identifiers. Full identifier validation couldn't be easily added, as it would require depending on the `rustc_lexer` and `unicode-normalization` crates from within `proc_macro`. Unicode identifiers are validated and normalized using RPC. See the individual commit messages for more details on trade-offs and design decisions behind these patches.
This is a reduced version of #86816 without changes which are only relevant to the
CrossThread
execution strategy. With these changes there are similar performance improvements to theSameThread
execution strategy as in the previous bug, however the optimizations toCrossThread
are less significant. If they're desired, theCrossThread
changes could be landed separately.Given the same test situation as the #86816, this patch stack gets the following numbers:
As with the previous patch, each commit should be individually reviewable.
Blocked on #90876