-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Implement profiling for compiler-generated move/copy operations #147206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
r? @davidtwco rustbot has assigned @davidtwco. Use |
This comment has been minimized.
This comment has been minimized.
I'm not really sure whether Statement::Assign's rvalues covers all the interesting cases or not. I'd like to make sure it covers:
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Interesting idea! Is it really worth distinguishing moves and copies? That doesn't make much of a difference for the runtime code, it's mostly a type system thing.
In MIR these will be spread across various places. The codegen backend would have an easier time centralizing all the ways in which operand uses get codegen'd as memcpy. But I am not sure if there's still a good way to adjust debuginfo there... Isn't there a mutating MIR visitor you can use that traverses all operands? |
library/core/src/intrinsics/mod.rs
Outdated
/// Compiler-generated move operation - never actually called. | ||
/// Used solely for profiling and debugging visibility. | ||
/// | ||
/// This function serves as a symbolic marker that appears in stack traces | ||
/// when rustc generates move operations, making them visible in profilers. | ||
/// The SIZE parameter encodes the size of the type being moved in the function name. | ||
#[rustc_force_inline] | ||
#[rustc_diagnostic_item = "compiler_move"] | ||
pub fn compiler_move<T, const SIZE: usize>(_src: *const T, _dst: *mut T) { | ||
unreachable!("compiler_move should never be called - it's only for debug info") | ||
} | ||
|
||
/// Compiler-generated copy operation - never actually called. | ||
/// Used solely for profiling and debugging visibility. | ||
/// | ||
/// This function serves as a symbolic marker that appears in stack traces | ||
/// when rustc generates copy operations, making them visible in profilers. | ||
/// The SIZE parameter encodes the size of the type being copied in the function name. | ||
#[rustc_force_inline] | ||
#[rustc_diagnostic_item = "compiler_copy"] | ||
pub fn compiler_copy<T, const SIZE: usize>(_src: *const T, _dst: *mut T) { | ||
unreachable!("compiler_copy should never be called - it's only for debug info") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't intrinsics so I don't think this is the best place for them. The file is already too big anyway.^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I wasn't sure exactly where to put them. Originally I had the idea of actually making them real functions implementing copy & move in terms of calls to them, but that seemed more fiddly than it's worth.
I think in practice it's useful - I've seen very large structures being made
That was actually my first attempt but I ended up with a stream of mysterious crashes/assertion failures from within the guts of llvm. Doing the manipulations at the MIR level turned out to be much more straightforward.
I'll take another look. |
|
This comment has been minimized.
This comment has been minimized.
I was missing parameter passing moves the first time around, so it's a little larger now: about 0.2% for 65 byte, about (almost nothing) for 1024 and closer to 1% for 8 byte. |
This comment has been minimized.
This comment has been minimized.
As part of Rust's move semantics, the compiler will generate memory copy operations to move objects about. These are generally pretty small, and the backend is good at optimizing them. But sometimes, if the type is large, they can end up being surprisingly expensive. In such cases, you might want to pass them by reference, or Box them up. However, these moves are also invisible to profiling. At best they appear as a `memcpy`, but one memcpy is basically indistinguishable from another, and its very hard to know that 1) it's actually a compiler-generated copy, and 2) what type it pertains to. This PR adds two new pseudo-functions in `core::profiling`: ``` pub fn compiler_move<T, const SIZE: usize>(_src: *const T, _dst: *mut T); pub fn compiler_copy<T, const SIZE: usize>(_src: *const T, _dst: *mut T); ``` These functions are never actually called however. A MIR transform pass -- `instrument_moves.rs` -- will locate all `Operand::Move`/`Copy` operations, and modify their source location to make them appear as if they had been inlined from `compiler_move`/`_copy`. These functions have two generic parameters: the type being copied, and its size in bytes. This should make it very easy to identify which types are being expensive in your program (both in aggregate, and at specific hotspots). The size isn't strictly necessary since you can derive it from the type, but it's small and it makes it easier to understand what you're looking at. This functionality is only enabled if you have debug info generation enabled, and also set the `-Zinstrument-moves` option. It does not instrument all moves. By default it will only annotate ones for types over 64 bytes. The `-Zinstrument-moves-size-limit` specifies the size in bytes to start instrumenting for. This has minimal impact on the size of debugging info. For rustc itself, the overall increase in librustc_driver*.so size is around .05% for 65 byte limit, 0.004% for 1025 byte limit, and a worst case of 0.6% for an 8 byte limit. There's no effect on generated code, it only adds debug info. As an example of a backtrace: ``` Breakpoint 1.3, __memcpy_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:255 255 ENTRY_P2ALIGN (MEMMOVE_SYMBOL (__memmove, unaligned_erms), 6) (gdb) bt # 0 __memcpy_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:255 # 1 0x0000555555590e7e in core::profiling::compiler_copy<[u64; 1000], 8000> () at library/core/src/profiling.rs:27 # 2 t::main () at t.rs:10 ```
The job Click to see the possible cause of the failure (guessed by this bot)
|
Hm, the mir tests seem very brittle. They were clean locally. |
All codegen tests have this bittleness. We build all codegen tests (mir-opt, codegen, assembly) suites against a sysroot which is compiled with whatever flags are set in the user-provided profile. The mir-opt suite tends to get blamed because we check in much more of the MIR than just the FileCheck annotations. I suspect your test needs a |
Note that annotation doesn't completely fix the problem; it just means that to bless this test you need to have |
Is "instrument" the best term here? Usually that means to actually run some extra code for the to-be-instrumented operation, doesn't it? This here is just adding debuginfo. Do new |
Can How can the added debuginfo be used to produce useful profiling information? |
Annotate?
OK, I'll kick that off.
I guess in principle, but I was thinking that the default might not necessarily be a constant. For example, it could maybe use the target info to select the cache-line size, or some size threshold.
The idea is that if you have a profiler sampling the pc/rip then it can use the debug info to unwind the stack frames to identify where it has sampled. This way, assuming the unwinder understands inlined functions, you'll be able to see the I still need to validate this in practice. I've managed to use Actually rustc itself is a good candidate of course. Should I just do something like |
Ah I see, that explains it. I'll fix that up. |
Hm, maybe. Or maybe something specifically involving debuginfo? |
I ended up renaming it to
|
As part of Rust's move semantics, the compiler will generate memory copy operations to move objects about. These are generally pretty small, and the backend is good at optimizing them. But sometimes, if the type is large, they can end up being surprisingly expensive. In such cases, you might want to pass them by reference, or Box them up.
However, these moves are also invisible to profiling. At best they appear as a
memcpy
, but one memcpy is basically indistinguishable from another, and its very hard to know that 1) it's actually a compiler-generated copy, and 2) what type it pertains to.This PR adds two new pseudo-intrinsic functions in
core::intrinsics
:These functions are never actually called however. A MIR transform pass --
instrument_moves.rs
-- will locate allOperand::Move
/Copy
operations, and modify their source location to make them appear as if they had been inlined fromcompiler_move
/_copy
.These functions have two generic parameters: the type being copied, and its size in bytes. This should make it very easy to identify which types are being expensive in your program (both in aggregate, and at specific hotspots). The size isn't strictly necessary since you can derive it from the type, but it's small and it makes it easier to understand what you're looking at.
This functionality is only enabled if you have debug info generation enabled, and also set the
-Zinstrument-moves
option.It does not instrument all moves. By default it will only annotate ones for types over 64 bytes. The
-Zinstrument-moves-size-limit
specifies the size in bytes to start instrumenting for.This has minimal impact on the size of debugging info. For rustc itself, the overall increase in librustc_driver*.so size is around .05% for 65 byte limit, 0.004% for 1025 byte limit, and a worst case of 0.6% for an 8 byte limit.
There's no effect on generated code, it only adds debug info.
As an example of a backtrace: