-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not using the byval attribute loses memcpy optimizations #103103
Comments
Note that if we choose option (2) then there will need to be a little more work to tell LLVM that an indirectly-passed function argument was marked immutable by the programmer. |
Actually, (2) is more complicated than I thought. There are actually two sub-options: (2a) At the Rust ABI level, specify that callees are not allowed to write to indirectly-passed arguments. Currently, they can if they declare a by-val parameter (2b) Keep the ABI the same as it is today. In that case we would make LLVM responsible for adding |
Here's the LLVM change. Coupled with the suggested change in Rust ABI this should let us eliminate quite a few memcpys. pcwalton/llvm-project@854b7c4 Also, I tried out (2b) and the results were disappointing: LLVM can't even tell that println is readonly in fat LTO. I think if we want to get rid of these memcpys (and we probably do, as they're a regression over C++) then we should modify the Rust ABI either to use byval or to forbid callees writing to indirectly-passed args. |
I'm curious about why; any chance you have a link handy to a discussion about it? |
I don't, but a comment from @bjorn3 seemed to indicate to me that it was deliberate. |
byval means that the argument is passed at a specific stack location. This requires a copy to this stack location and is as such incompatible with unsized arguments. To solve the issue of these redundant copies I have long been advocating to use move arguments (which pass the orginal location in case of by-ref) instead of copy arguments (which copy the arguments to the stack and then pass the copy's location in case of by-ref) during MIR building where possible, but @RalfJung would like to instead remove move arguments entirely and handle unsized arguments some other way. |
@RalfJung Can we get some forward progress on what to do here? These copies are hurting codegen across the board. I don't think the status quo is going to be tenable long-term; the pressure for a solution will be too great. If I understand correctly, my proposal would be for all arguments to continue to be by-copy, but the copy moves from the caller side to the callee side and as such can be optimized away if known to be immutable. (Callers can't know this because Rust function signatures don't specify mutability of by-value arguments, only definitions do.) |
Your proposal of using byval would still do the copy at call site if the argument couldn't be constructed in place at the right stack location. For example when there are multiple calls with different byval arguments. Your proposal of making arguments immutable is incompatible with unsized arguments as those can be mutated in place and can't be copied at the callee side to make the copy at the callee side. |
So per the unsized arguments RFC unsized arguments are passed as |
by-ref sized arguments are currently also passed &move. We just insert the extra copy at MIR building time as precaution due to unclear semantics of move arguments. Note that we need both to be abi compatible for trait object dispatch I think as is already stable through |
What a mess. Maybe we could just mark by-ref arguments as |
The right solution here for the unsized case is to stop representing |
I still think it'd make sense to mark semantically-by-value but passed-by-ref-at-the-ABI-level immutable freeze arguments as |
@JakobDegen We can easily use a separate type for call arguments to solve the problem of Operand::Move being specified as doing a copy, but that won't solve this perf issue. The problem is that we aren't using Operand::Move too often, but that we use it too little, thus causing many needless copies. |
I wonder what the best way to propagate deduced |
I verified that with my change to LLVM with the soundness fix suggested by @nikic on Zulip, it can do the optimization for the cases I'm looking at as long as Rust codegen can either inspect the MIR for direct callees for readonly safety or to pull that information from the crate metadata. I think we have a way forward for most cases. It would still be better to not generate the copy at the MIR level, but at least this will fix the worst offenders in the generated code. |
I'm sorry I am not familiar with LLVM From the LangRef it sounds like using |
…adonly` on indirect immutable freeze by-value function parameters. Right now, `rustc` only examines function signatures and the platform ABI when determining the LLVM attributes to apply to parameters. This results in missed optimizations, because there are some attributes that can be determined via analysis of the MIR making up the function body. In particular, `readonly` could be applied to most indirectly-passed by-value function arguments (specifically, those that are freeze and are observed not to be mutated), but it currently is not. This patch introduces the machinery that allows `rustc` to determine those attributes. It consists of a query, `deduced_param_attrs`, that, when evaluated, analyzes the MIR of the function to determine supplementary attributes. The results of this query for each function are written into the crate metadata so that the deduced parameter attributes can be applied to cross-crate functions. In this patch, we simply check the parameter for mutations to determine whether the `readonly` attribute should be applied to parameters that are indirect immutable freeze by-value. More attributes could conceivably be deduced in the future: `nocapture` and `noalias` come to mind. Adding `readonly` to indirect function parameters where applicable enables some potential optimizations in LLVM that are discussed in [issue 103103] and [PR 103070] around avoiding stack-to-stack memory copies that appear in functions like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These functions pass a large structure unchanged by value to a subfunction that also doesn't mutate it. Since the structure in this case is passed as an indirect parameter, it's a pointer from LLVM's perspective. As a result, the intermediate copy of the structure that our codegen emits could be optimized away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly nocapture noalias` in both the caller and callee. We already pass `nocapture noalias`, but we're missing `readonly`, as we can't determine whether a by-value parameter is mutated by examining the signature in Rust. I didn't have much success with having LLVM infer the `readonly` attribute, even with fat LTO; it seems that deducing it at the MIR level is necessary. No large benefits should be expected from this optimization *now*; LLVM needs some changes (discussed in [PR 103070]) to more aggressively use the `noalias nocapture readonly` combination in its alias analysis. I have some LLVM patches for these optimizations and have had them looked over. With all the patches applied locally, I enabled LLVM to remove all the `memcpy`s from the following code: ```rust fn main() { println!("Hello {}", 3); } ``` which is a significant codegen improvement over the status quo. I expect that if this optimization kicks in in multiple places even for such a simple program, then it will apply to Rust code all over the place. [issue 103103]: rust-lang#103103 [PR 103070]: rust-lang#103070
Introduce deduced parameter attributes, and use them for deducing `readonly` on indirect immutable freeze by-value function parameters. Introduce deduced parameter attributes, and use them for deducing `readonly` on indirect immutable freeze by-value function parameters. Right now, `rustc` only examines function signatures and the platform ABI when determining the LLVM attributes to apply to parameters. This results in missed optimizations, because there are some attributes that can be determined via analysis of the MIR making up the function body. In particular, `readonly` could be applied to most indirectly-passed by-value function arguments (specifically, those that are freeze and are observed not to be mutated), but it currently is not. This patch introduces the machinery that allows `rustc` to determine those attributes. It consists of a query, `deduced_param_attrs`, that, when evaluated, analyzes the MIR of the function to determine supplementary attributes. The results of this query for each function are written into the crate metadata so that the deduced parameter attributes can be applied to cross-crate functions. In this patch, we simply check the parameter for mutations to determine whether the `readonly` attribute should be applied to parameters that are indirect immutable freeze by-value. More attributes could conceivably be deduced in the future: `nocapture` and `noalias` come to mind. Adding `readonly` to indirect function parameters where applicable enables some potential optimizations in LLVM that are discussed in [issue 103103] and [PR 103070] around avoiding stack-to-stack memory copies that appear in functions like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These functions pass a large structure unchanged by value to a subfunction that also doesn't mutate it. Since the structure in this case is passed as an indirect parameter, it's a pointer from LLVM's perspective. As a result, the intermediate copy of the structure that our codegen emits could be optimized away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly nocapture noalias` in both the caller and callee. We already pass `nocapture noalias`, but we're missing `readonly`, as we can't determine whether a by-value parameter is mutated by examining the signature in Rust. I didn't have much success with having LLVM infer the `readonly` attribute, even with fat LTO; it seems that deducing it at the MIR level is necessary. No large benefits should be expected from this optimization *now*; LLVM needs some changes (discussed in [PR 103070]) to more aggressively use the `noalias nocapture readonly` combination in its alias analysis. I have some LLVM patches for these optimizations and have had them looked over. With all the patches applied locally, I enabled LLVM to remove all the `memcpy`s from the following code: ```rust fn main() { println!("Hello {}", 3); } ``` which is a significant codegen improvement over the status quo. I expect that if this optimization kicks in in multiple places even for such a simple program, then it will apply to Rust code all over the place. [issue 103103]: rust-lang#103103 [PR 103070]: rust-lang#103070
I was pointed to this GH issue by @lqd; I don't think opening a separate issue is warranted since I believe the codegen quality problem I observed is this memcpy issue. So maybe this is useful as a test case for your work, @pcwalton: https://godbolt.org/z/bTn7bP574 With the "move-through" pattern it seems very common to do sparse updates with mut self like in with_size1, so it's surprising and not ideal that it produces worse code than with_size2. You can see that rather than writing the size once to the width/height fields, it does a full store/load/store and the last load/store pair comes from the memcpy intrinsic. (In this and similar examples inlining can usually save the day, but that could be said for most cross-function issues.) It would be even better if this kind of move-through pattern could be lowered to the copy-free code that just does mov [rdi], rsi so that codegen would be no different from a &mut-based implementation. But I'm guessing there are good reasons that can't be done (and in any case is a separate issue). |
Triage: The |
Consider this code (Godbolt link):
LLVM can't eliminate the memcpy between f and g, even though it should legally be able to do so. This is because memcpyopt can only forward to parameters marked byval. We don't use the byval attribute, and this seems to be by design. But we lose this optimization, which I've observed hurting codegen in several places even in hello world (
core::fmt::Write::write_fmt
,core::panicking::assert_failed()
,<core::fmt::Arguments as core::fmt::Display>::fmt
). I suspect losing this optimization hurts us all over the place.There are two solutions I can see:
(1) Use byval for indirect arguments in the Rust ABI.
(2) Change LLVM to allow the optimization to happen for at least
nocapture noalias readonly
parameters. Sincenocapture
implies that the behavior of the callee doesn't depend on the address andnoalias readonly
implies that the memory is strongly immutable, this should work. We mark all indirect by-value arguments asnocapture noalias
already.I'm working on a patch for (2), but I was wondering why we can't just do (1).
The text was updated successfully, but these errors were encountered: