Extra copy of struct before return #38743

iSazonov · 2020-07-03T06:20:19Z

Description

Looking Guid.NewGuid() implementation I prepared a simple test on CSharpLab site
It seems the method does extra struct copy before return (g variable is defined on stack and could be returned as is).

I do not know whether this is an omission in the existing optimization, or a new optimization сould be added.

category:cq
theme:structs
skill-level:intermediate
cost:medium
impact:small

The text was updated successfully, but these errors were encountered:

Dotnet-GitSync-Bot · 2020-07-03T06:20:24Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

GrabYourPitchforks · 2020-07-03T06:48:46Z

I can't speak for the JIT team, but it's possible that writing directly into the return value's storage address could be an incorrect optimization. Consider:

public class MyClass
{
    public Guid MyGuid;
}

MyClass o = new MyClass();
o.MyGuid = SomeMethod();

Guid SomeMethod()
{
    // do something
    if (rnd()) { return g; }
    else { throw new Exception(); }
}

If any portion of SomeMethod has a side effect that may result in the method throwing an exception or otherwise not returning, then the write to MyClass.SomeGuid should never have taken place at all, as the write is performed by SomeMethod's caller, not by SomeMethod itself. (In practice the write is performed by SomeMethod's epilog, which is indistinguishable from the write having been performed by the caller after method return. So the observed behavior is consistent with expectations.)

JulieLeeMSFT · 2020-07-06T18:16:40Z

CC @CarolEidt @sandreenko PTAL

CarolEidt · 2020-07-06T22:59:25Z

There are at least a few factors at play here:

First, taking the address of the local variable g and passing it to another method will cause every write to g's fields to go to the stack because it's address-exposed. Declaring a separate variable to hold the random bytes, and then assigning it field-by-field to g might improve things a bit.
The JIT doesn't currently have support for shuffling and assembling fields of structs in a single register, so even if the first issue is addressed, there will be one, rather than two, stores to the return local.
The JIT doesn't do any forwarding optimizations, so in any case I believe that it will always store to the return local in the epilog.

Perhaps @AndyAyersMS has additional thoughts.

iSazonov · 2020-07-07T03:20:37Z

in any case I believe that it will always store to the return local in the epilog.

It could be allocated on stack before the method call like an optimization for stackalloc.
Given that the size of structures (and tuples?) can be relatively large there can be a good performance gain.

AndyAyersMS · 2020-07-07T18:30:22Z

The .NET analogs of C++'s RVO and NRVO are indeed useful -- but not something the jit is able to do at present. Reverse struct copy-prop (#8887) will eventually allow the jit to construct some structs directly into the hidden return buffer.

However, as @GrabYourPitchforks notes, on the NewGuid example, this optimization would likely be blocked by the call to GetRandomBytes, since the jit would have to assume that method could throw an exception before writing all of its result.

jakobbotsch · 2022-12-07T12:53:35Z

As discussed above, the cases we can do this would be very limited. The return buffer can also alias other arguments in the managed calling convention which makes this even harder. I don't think there is much we would be able to do because of that. Will close this.

jakobbotsch · 2022-12-07T22:58:59Z

There are cases we would still be able to do this, even in the face of there being potential aliasing/overlap between the return buffer and arguments. Reverse copy prop would probably need to handle this in some form anyway, so let's reopen this.

iSazonov · 2022-12-08T03:42:12Z

I can explain my perplexity about this issue.
Structures are good for high performance code and suddenly copying happens. Immediately the question arises whether it is so necessary.
Perhaps this is not the right analogy but it reminds us of async methods that execute synchronously if possible, and moreover try to return value results to avoid allocation.
So my expectation for structures is similar that the compiler tries to make the code as efficient as possible.
Another analogy is related to static anonymous functions. The compiler does this optimization if possible, but we can specify static explicitly to avoid regression. Perhaps something similar could be possible for structures since there are quite a few shadow copy traps there.

jakobbotsch · 2022-12-08T09:28:39Z

@iSazonov It's not just about what is possible but also about what is going to be realistic in the framework of the JIT. The fact that the return buffer is allowed to alias global memory puts the JIT on territory that it generally handles very conservatively today. More powerful alias analysis is quite costly and I don't know how much we could get away with there.

In your original example it is very unlikely we would be able to do anything without making the JIT's alias analysis more powerful. There are two main reasons: 1) Unlike C++, return g does not mean we can make g the return buffer function-wide for the reasons described above, e.g. GetRandomBytes call could partially write it before throwing an exception. 2) Exposing the address of g to GetRandomBytes makes the JIT believe g can alias any other global memory. Removing the copy in this case requires us to optimize something like g = <arbitrary expression involving global memory>; *retbuf = g into *retbuf = <arbitrary expression involving global memory>. That's hard in the framework of the JIT when we know retbuf is allowed to alias.

If the expression does not involve global memory then it becomes easier; so those are the cases where I think we would more easily be able to do something, and why I reopened it. E.g. it may be common that you just have g = <constant assigned to each field>; *retbuf = g which could be optimized.

iSazonov · 2022-12-08T13:47:29Z

Thanks for the clarification! I don't ask for the impossible. The main things I noted is (1) that there is undesirable (invisible) copying (2) that there are examples where we can help the compiler make the code better.
So I'd be happy (1) if you can find ways to improve some special cases, (2) and maybe come up with something more fundamental in JIT/C#.

benaadams · 2022-12-09T01:41:35Z

Some methods don't have any throws and are self contained (especially post inlining); could the Jit write directly to the output buffer in these cases?

benaadams · 2022-12-09T02:28:52Z

Related issue #12219

iSazonov added the tenet-performance Performance related issue label Jul 3, 2020

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jul 3, 2020

GrabYourPitchforks added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 3, 2020

JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Jul 6, 2020

JulieLeeMSFT added this to the Future milestone Jul 6, 2020

iSazonov mentioned this issue May 11, 2021

Improve low level struct performance #46104

Open

5 tasks

jakobbotsch mentioned this issue Oct 12, 2022

[RyuJIT] Introduce a general reverse copy prop for struct temps #8887

Open

jakobbotsch closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2022

jakobbotsch reopened this Dec 7, 2022

iSazonov closed this as completed Apr 23, 2023

ghost locked as resolved and limited conversation to collaborators May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extra copy of struct before return #38743

Extra copy of struct before return #38743

iSazonov commented Jul 3, 2020 •

edited by CarolEidt

Loading

Dotnet-GitSync-Bot commented Jul 3, 2020

GrabYourPitchforks commented Jul 3, 2020 •

edited

Loading

JulieLeeMSFT commented Jul 6, 2020

CarolEidt commented Jul 6, 2020

iSazonov commented Jul 7, 2020

AndyAyersMS commented Jul 7, 2020

jakobbotsch commented Dec 7, 2022

jakobbotsch commented Dec 7, 2022

iSazonov commented Dec 8, 2022

jakobbotsch commented Dec 8, 2022

iSazonov commented Dec 8, 2022

benaadams commented Dec 9, 2022 •

edited

Loading

benaadams commented Dec 9, 2022

Extra copy of struct before return #38743

Extra copy of struct before return #38743

Comments

iSazonov commented Jul 3, 2020 • edited by CarolEidt Loading

Description

Dotnet-GitSync-Bot commented Jul 3, 2020

GrabYourPitchforks commented Jul 3, 2020 • edited Loading

JulieLeeMSFT commented Jul 6, 2020

CarolEidt commented Jul 6, 2020

iSazonov commented Jul 7, 2020

AndyAyersMS commented Jul 7, 2020

jakobbotsch commented Dec 7, 2022

jakobbotsch commented Dec 7, 2022

iSazonov commented Dec 8, 2022

jakobbotsch commented Dec 8, 2022

iSazonov commented Dec 8, 2022

benaadams commented Dec 9, 2022 • edited Loading

benaadams commented Dec 9, 2022

iSazonov commented Jul 3, 2020 •

edited by CarolEidt

Loading

GrabYourPitchforks commented Jul 3, 2020 •

edited

Loading

benaadams commented Dec 9, 2022 •

edited

Loading