Not using the byval attribute loses memcpy optimizations #103103

pcwalton · 2022-10-16T02:13:33Z

Consider this code (Godbolt link):

pub struct S([i32; 16]);

#[inline(never)]
fn g(x: S) {
    println!("{}", x.0[3]);
}

#[inline(never)]
fn f(x: S) {
    g(x)
}

pub fn main() {
    f(S([0; 16]))
}

LLVM can't eliminate the memcpy between f and g, even though it should legally be able to do so. This is because memcpyopt can only forward to parameters marked byval. We don't use the byval attribute, and this seems to be by design. But we lose this optimization, which I've observed hurting codegen in several places even in hello world (core::fmt::Write::write_fmt, core::panicking::assert_failed(), <core::fmt::Arguments as core::fmt::Display>::fmt). I suspect losing this optimization hurts us all over the place.

There are two solutions I can see:

(1) Use byval for indirect arguments in the Rust ABI.

(2) Change LLVM to allow the optimization to happen for at least nocapture noalias readonly parameters. Since nocapture implies that the behavior of the callee doesn't depend on the address and noalias readonly implies that the memory is strongly immutable, this should work. We mark all indirect by-value arguments as nocapture noalias already.

I'm working on a patch for (2), but I was wondering why we can't just do (1).

The text was updated successfully, but these errors were encountered:

pcwalton · 2022-10-16T03:02:50Z

Note that if we choose option (2) then there will need to be a little more work to tell LLVM that an indirectly-passed function argument was marked immutable by the programmer.

pcwalton · 2022-10-16T03:28:53Z

Actually, (2) is more complicated than I thought. There are actually two sub-options:

(2a) At the Rust ABI level, specify that callees are not allowed to write to indirectly-passed arguments. Currently, they can if they declare a by-val parameter mut (or it has UnsafeCell in it). With this change a callee would have to codegen a memcpy to a temporary if it writes to a by-val parameter.

(2b) Keep the ABI the same as it is today. In that case we would make LLVM responsible for adding readonly attributes itself. Generally this can only be done when building with LTO.

pcwalton · 2022-10-16T03:46:53Z

Here's the LLVM change. Coupled with the suggested change in Rust ABI this should let us eliminate quite a few memcpys. pcwalton/llvm-project@854b7c4

Also, I tried out (2b) and the results were disappointing: LLVM can't even tell that println is readonly in fat LTO. I think if we want to get rid of these memcpys (and we probably do, as they're a regression over C++) then we should modify the Rust ABI either to use byval or to forbid callees writing to indirectly-passed args.

scottmcm · 2022-10-16T05:09:58Z

We don't use the byval attribute, and this seems to be by design.

I'm curious about why; any chance you have a link handy to a discussion about it?

pcwalton · 2022-10-16T05:34:17Z

I'm curious about why; any chance you have a link handy to a discussion about it?

I don't, but a comment from @bjorn3 seemed to indicate to me that it was deliberate.

bjorn3 · 2022-10-16T09:11:08Z

byval means that the argument is passed at a specific stack location. This requires a copy to this stack location and is as such incompatible with unsized arguments. To solve the issue of these redundant copies I have long been advocating to use move arguments (which pass the orginal location in case of by-ref) instead of copy arguments (which copy the arguments to the stack and then pass the copy's location in case of by-ref) during MIR building where possible, but @RalfJung would like to instead remove move arguments entirely and handle unsized arguments some other way.

pcwalton · 2022-10-16T09:27:25Z

@RalfJung Can we get some forward progress on what to do here? These copies are hurting codegen across the board. I don't think the status quo is going to be tenable long-term; the pressure for a solution will be too great.

If I understand correctly, my proposal would be for all arguments to continue to be by-copy, but the copy moves from the caller side to the callee side and as such can be optimized away if known to be immutable. (Callers can't know this because Rust function signatures don't specify mutability of by-value arguments, only definitions do.)

bjorn3 · 2022-10-16T09:30:23Z

Your proposal of using byval would still do the copy at call site if the argument couldn't be constructed in place at the right stack location. For example when there are multiple calls with different byval arguments.

Your proposal of making arguments immutable is incompatible with unsized arguments as those can be mutated in place and can't be copied at the callee side to make the copy at the callee side.

pcwalton · 2022-10-16T09:38:00Z

So per the unsized arguments RFC unsized arguments are passed as &move. This implies that we're already using a different calling convention for unsized arguments. So my immutable arguments proposal could be a replacement for the cases in which we're using by-copy--the ABI for unsized arguments would remain the same as today.

bjorn3 · 2022-10-16T09:42:20Z

by-ref sized arguments are currently also passed &move. We just insert the extra copy at MIR building time as precaution due to unclear semantics of move arguments. Note that we need both to be abi compatible for trait object dispatch I think as is already stable through Box<dyn FnOnce()>.

pcwalton · 2022-10-16T09:45:48Z

What a mess. Maybe we could just mark by-ref arguments as readonly in the callee if they're declared immutable and freeze, and then rely on LTO to propagate that attribute to callers. Then memcpyopt can detect the readonly nocapture noalias combo and forward the memcpy.

JakobDegen · 2022-10-16T19:34:30Z

The right solution here for the unsized case is to stop representing &move arguments in MIR with an Operand::Move. Operand::Move is also very clear about the fact that it makes a copy. This would require someone to go and actually come up with a design for a decent story for unsized locals in MIR though...

pcwalton · 2022-10-16T19:53:06Z

I still think it'd make sense to mark semantically-by-value but passed-by-ref-at-the-ABI-level immutable freeze arguments as readonly, since we know that, and LLVM can use it for LTO more generally than just memcpyopt.

bjorn3 · 2022-10-16T20:01:37Z

@JakobDegen We can easily use a separate type for call arguments to solve the problem of Operand::Move being specified as doing a copy, but that won't solve this perf issue. The problem is that we aren't using Operand::Move too often, but that we use it too little, thus causing many needless copies.

pcwalton · 2022-10-16T20:24:54Z

I wonder what the best way to propagate deduced readonly around across crates would be. Either it would have be a per-parameter bit in the Rust metadata, or it'd have to be added to the ThinLTO summaries.

pcwalton · 2022-10-16T21:42:05Z

I verified that with my change to LLVM with the soundness fix suggested by @nikic on Zulip, it can do the optimization for the cases I'm looking at as long as Rust codegen can either inspect the MIR for direct callees for readonly safety or to pull that information from the crate metadata. I think we have a way forward for most cases.

It would still be better to not generate the copy at the MIR level, but at least this will fix the worst offenders in the generated code.

RalfJung · 2022-10-19T20:20:24Z

I'm sorry I am not familiar with LLVM byref. What new requirements do we need to impose on the MIR level to justify this attribute?

From the LangRef it sounds like using byref for Operand::Copy arguments is a no-brainer, and what is correct for Operand::Copy is certainly also correct for Operand::Move. When we use pointers to implement functions that, on the MIR level, perform a copy -- why would byval ever be wrong?

…adonly` on indirect immutable freeze by-value function parameters. Right now, `rustc` only examines function signatures and the platform ABI when determining the LLVM attributes to apply to parameters. This results in missed optimizations, because there are some attributes that can be determined via analysis of the MIR making up the function body. In particular, `readonly` could be applied to most indirectly-passed by-value function arguments (specifically, those that are freeze and are observed not to be mutated), but it currently is not. This patch introduces the machinery that allows `rustc` to determine those attributes. It consists of a query, `deduced_param_attrs`, that, when evaluated, analyzes the MIR of the function to determine supplementary attributes. The results of this query for each function are written into the crate metadata so that the deduced parameter attributes can be applied to cross-crate functions. In this patch, we simply check the parameter for mutations to determine whether the `readonly` attribute should be applied to parameters that are indirect immutable freeze by-value. More attributes could conceivably be deduced in the future: `nocapture` and `noalias` come to mind. Adding `readonly` to indirect function parameters where applicable enables some potential optimizations in LLVM that are discussed in [issue 103103] and [PR 103070] around avoiding stack-to-stack memory copies that appear in functions like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These functions pass a large structure unchanged by value to a subfunction that also doesn't mutate it. Since the structure in this case is passed as an indirect parameter, it's a pointer from LLVM's perspective. As a result, the intermediate copy of the structure that our codegen emits could be optimized away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly nocapture noalias` in both the caller and callee. We already pass `nocapture noalias`, but we're missing `readonly`, as we can't determine whether a by-value parameter is mutated by examining the signature in Rust. I didn't have much success with having LLVM infer the `readonly` attribute, even with fat LTO; it seems that deducing it at the MIR level is necessary. No large benefits should be expected from this optimization *now*; LLVM needs some changes (discussed in [PR 103070]) to more aggressively use the `noalias nocapture readonly` combination in its alias analysis. I have some LLVM patches for these optimizations and have had them looked over. With all the patches applied locally, I enabled LLVM to remove all the `memcpy`s from the following code: ```rust fn main() { println!("Hello {}", 3); } ``` which is a significant codegen improvement over the status quo. I expect that if this optimization kicks in in multiple places even for such a simple program, then it will apply to Rust code all over the place. [issue 103103]: rust-lang#103103 [PR 103070]: rust-lang#103070

Introduce deduced parameter attributes, and use them for deducing `readonly` on indirect immutable freeze by-value function parameters. Introduce deduced parameter attributes, and use them for deducing `readonly` on indirect immutable freeze by-value function parameters. Right now, `rustc` only examines function signatures and the platform ABI when determining the LLVM attributes to apply to parameters. This results in missed optimizations, because there are some attributes that can be determined via analysis of the MIR making up the function body. In particular, `readonly` could be applied to most indirectly-passed by-value function arguments (specifically, those that are freeze and are observed not to be mutated), but it currently is not. This patch introduces the machinery that allows `rustc` to determine those attributes. It consists of a query, `deduced_param_attrs`, that, when evaluated, analyzes the MIR of the function to determine supplementary attributes. The results of this query for each function are written into the crate metadata so that the deduced parameter attributes can be applied to cross-crate functions. In this patch, we simply check the parameter for mutations to determine whether the `readonly` attribute should be applied to parameters that are indirect immutable freeze by-value. More attributes could conceivably be deduced in the future: `nocapture` and `noalias` come to mind. Adding `readonly` to indirect function parameters where applicable enables some potential optimizations in LLVM that are discussed in [issue 103103] and [PR 103070] around avoiding stack-to-stack memory copies that appear in functions like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These functions pass a large structure unchanged by value to a subfunction that also doesn't mutate it. Since the structure in this case is passed as an indirect parameter, it's a pointer from LLVM's perspective. As a result, the intermediate copy of the structure that our codegen emits could be optimized away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly nocapture noalias` in both the caller and callee. We already pass `nocapture noalias`, but we're missing `readonly`, as we can't determine whether a by-value parameter is mutated by examining the signature in Rust. I didn't have much success with having LLVM infer the `readonly` attribute, even with fat LTO; it seems that deducing it at the MIR level is necessary. No large benefits should be expected from this optimization *now*; LLVM needs some changes (discussed in [PR 103070]) to more aggressively use the `noalias nocapture readonly` combination in its alias analysis. I have some LLVM patches for these optimizations and have had them looked over. With all the patches applied locally, I enabled LLVM to remove all the `memcpy`s from the following code: ```rust fn main() { println!("Hello {}", 3); } ``` which is a significant codegen improvement over the status quo. I expect that if this optimization kicks in in multiple places even for such a simple program, then it will apply to Rust code all over the place. [issue 103103]: rust-lang#103103 [PR 103070]: rust-lang#103070

pervognsen · 2023-02-02T00:23:07Z

I was pointed to this GH issue by @lqd; I don't think opening a separate issue is warranted since I believe the codegen quality problem I observed is this memcpy issue. So maybe this is useful as a test case for your work, @pcwalton: https://godbolt.org/z/bTn7bP574

With the "move-through" pattern it seems very common to do sparse updates with mut self like in with_size1, so it's surprising and not ideal that it produces worse code than with_size2. You can see that rather than writing the size once to the width/height fields, it does a full store/load/store and the last load/store pair comes from the memcpy intrinsic. (In this and similar examples inlining can usually save the day, but that could be said for most cross-function issues.)

It would be even better if this kind of move-through pattern could be lowered to the copy-free code that just does

mov [rdi], rsi
mov [rdi+8], rsi
ret

so that codegen would be no different from a &mut-based implementation. But I'm guessing there are good reasons that can't be done (and in any case is a separate issue).

tmiasko · 2025-03-11T12:47:37Z

Triage: The memcpy inside f is now optimized by either MIR opt (-C opt-level >= 0) or LLVM opt (-Z mir-opt-level=0 and -C opt-level >= 1).

pcwalton added the C-bug Category: This is a bug. label Oct 16, 2022

pcwalton mentioned this issue Oct 18, 2022

Introduce deduced parameter attributes, and use them for deducing readonly on indirect immutable freeze by-value function parameters. #103172

Merged

ChrisDenton added the needs-triage-legacy label Jul 16, 2023

Enselic added A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage-legacy labels Jul 30, 2024

tmiasko closed this as completed Mar 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not using the byval attribute loses memcpy optimizations #103103

Not using the byval attribute loses memcpy optimizations #103103

pcwalton commented Oct 16, 2022 •

edited

Loading

pcwalton commented Oct 16, 2022

pcwalton commented Oct 16, 2022 •

edited

Loading

pcwalton commented Oct 16, 2022 •

edited

Loading

scottmcm commented Oct 16, 2022

pcwalton commented Oct 16, 2022

bjorn3 commented Oct 16, 2022

pcwalton commented Oct 16, 2022

bjorn3 commented Oct 16, 2022

pcwalton commented Oct 16, 2022

bjorn3 commented Oct 16, 2022

pcwalton commented Oct 16, 2022 •

edited

Loading

JakobDegen commented Oct 16, 2022

pcwalton commented Oct 16, 2022 •

edited

Loading

bjorn3 commented Oct 16, 2022

pcwalton commented Oct 16, 2022

pcwalton commented Oct 16, 2022 •

edited

Loading

RalfJung commented Oct 19, 2022 •

edited

Loading

pervognsen commented Feb 2, 2023 •

edited

Loading

tmiasko commented Mar 11, 2025 •

edited

Loading

Not using the byval attribute loses memcpy optimizations #103103

Not using the byval attribute loses memcpy optimizations #103103

Comments

pcwalton commented Oct 16, 2022 • edited Loading

pcwalton commented Oct 16, 2022

pcwalton commented Oct 16, 2022 • edited Loading

pcwalton commented Oct 16, 2022 • edited Loading

scottmcm commented Oct 16, 2022

pcwalton commented Oct 16, 2022

bjorn3 commented Oct 16, 2022

pcwalton commented Oct 16, 2022

bjorn3 commented Oct 16, 2022

pcwalton commented Oct 16, 2022

bjorn3 commented Oct 16, 2022

pcwalton commented Oct 16, 2022 • edited Loading

JakobDegen commented Oct 16, 2022

pcwalton commented Oct 16, 2022 • edited Loading

bjorn3 commented Oct 16, 2022

pcwalton commented Oct 16, 2022

pcwalton commented Oct 16, 2022 • edited Loading

RalfJung commented Oct 19, 2022 • edited Loading

pervognsen commented Feb 2, 2023 • edited Loading

tmiasko commented Mar 11, 2025 • edited Loading

pcwalton commented Oct 16, 2022 •

edited

Loading

pcwalton commented Oct 16, 2022 •

edited

Loading

pcwalton commented Oct 16, 2022 •

edited

Loading

pcwalton commented Oct 16, 2022 •

edited

Loading

pcwalton commented Oct 16, 2022 •

edited

Loading

pcwalton commented Oct 16, 2022 •

edited

Loading

RalfJung commented Oct 19, 2022 •

edited

Loading

pervognsen commented Feb 2, 2023 •

edited

Loading

tmiasko commented Mar 11, 2025 •

edited

Loading