Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra copy of struct before return #38743

Closed
iSazonov opened this issue Jul 3, 2020 · 13 comments
Closed

Extra copy of struct before return #38743

iSazonov opened this issue Jul 3, 2020 · 13 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@iSazonov
Copy link
Contributor

iSazonov commented Jul 3, 2020

Description

Looking Guid.NewGuid() implementation I prepared a simple test on CSharpLab site
It seems the method does extra struct copy before return (g variable is defined on stack and could be returned as is).

I do not know whether this is an omission in the existing optimization, or a new optimization сould be added.

category:cq
theme:structs
skill-level:intermediate
cost:medium
impact:small

@iSazonov iSazonov added the tenet-performance Performance related issue label Jul 3, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jul 3, 2020
@GrabYourPitchforks GrabYourPitchforks added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 3, 2020
@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Jul 3, 2020

I can't speak for the JIT team, but it's possible that writing directly into the return value's storage address could be an incorrect optimization. Consider:

public class MyClass
{
    public Guid MyGuid;
}

MyClass o = new MyClass();
o.MyGuid = SomeMethod();

Guid SomeMethod()
{
    // do something
    if (rnd()) { return g; }
    else { throw new Exception(); }
}

If any portion of SomeMethod has a side effect that may result in the method throwing an exception or otherwise not returning, then the write to MyClass.SomeGuid should never have taken place at all, as the write is performed by SomeMethod's caller, not by SomeMethod itself. (In practice the write is performed by SomeMethod's epilog, which is indistinguishable from the write having been performed by the caller after method return. So the observed behavior is consistent with expectations.)

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Jul 6, 2020
@JulieLeeMSFT JulieLeeMSFT added this to the Future milestone Jul 6, 2020
@JulieLeeMSFT
Copy link
Member

CC @CarolEidt @sandreenko PTAL

@CarolEidt
Copy link
Contributor

There are at least a few factors at play here:

  • First, taking the address of the local variable g and passing it to another method will cause every write to g's fields to go to the stack because it's address-exposed. Declaring a separate variable to hold the random bytes, and then assigning it field-by-field to g might improve things a bit.
  • The JIT doesn't currently have support for shuffling and assembling fields of structs in a single register, so even if the first issue is addressed, there will be one, rather than two, stores to the return local.
  • The JIT doesn't do any forwarding optimizations, so in any case I believe that it will always store to the return local in the epilog.

Perhaps @AndyAyersMS has additional thoughts.

@iSazonov
Copy link
Contributor Author

iSazonov commented Jul 7, 2020

in any case I believe that it will always store to the return local in the epilog.

It could be allocated on stack before the method call like an optimization for stackalloc.
Given that the size of structures (and tuples?) can be relatively large there can be a good performance gain.

@AndyAyersMS
Copy link
Member

The .NET analogs of C++'s RVO and NRVO are indeed useful -- but not something the jit is able to do at present. Reverse struct copy-prop (#8887) will eventually allow the jit to construct some structs directly into the hidden return buffer.

However, as @GrabYourPitchforks notes, on the NewGuid example, this optimization would likely be blocked by the call to GetRandomBytes, since the jit would have to assume that method could throw an exception before writing all of its result.

@jakobbotsch
Copy link
Member

As discussed above, the cases we can do this would be very limited. The return buffer can also alias other arguments in the managed calling convention which makes this even harder. I don't think there is much we would be able to do because of that. Will close this.

@jakobbotsch jakobbotsch closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2022
@jakobbotsch
Copy link
Member

There are cases we would still be able to do this, even in the face of there being potential aliasing/overlap between the return buffer and arguments. Reverse copy prop would probably need to handle this in some form anyway, so let's reopen this.

@jakobbotsch jakobbotsch reopened this Dec 7, 2022
@iSazonov
Copy link
Contributor Author

iSazonov commented Dec 8, 2022

I can explain my perplexity about this issue.
Structures are good for high performance code and suddenly copying happens. Immediately the question arises whether it is so necessary.
Perhaps this is not the right analogy but it reminds us of async methods that execute synchronously if possible, and moreover try to return value results to avoid allocation.
So my expectation for structures is similar that the compiler tries to make the code as efficient as possible.
Another analogy is related to static anonymous functions. The compiler does this optimization if possible, but we can specify static explicitly to avoid regression. Perhaps something similar could be possible for structures since there are quite a few shadow copy traps there.

@jakobbotsch
Copy link
Member

@iSazonov It's not just about what is possible but also about what is going to be realistic in the framework of the JIT. The fact that the return buffer is allowed to alias global memory puts the JIT on territory that it generally handles very conservatively today. More powerful alias analysis is quite costly and I don't know how much we could get away with there.

In your original example it is very unlikely we would be able to do anything without making the JIT's alias analysis more powerful. There are two main reasons: 1) Unlike C++, return g does not mean we can make g the return buffer function-wide for the reasons described above, e.g. GetRandomBytes call could partially write it before throwing an exception. 2) Exposing the address of g to GetRandomBytes makes the JIT believe g can alias any other global memory. Removing the copy in this case requires us to optimize something like g = <arbitrary expression involving global memory>; *retbuf = g into *retbuf = <arbitrary expression involving global memory>. That's hard in the framework of the JIT when we know retbuf is allowed to alias.

If the expression does not involve global memory then it becomes easier; so those are the cases where I think we would more easily be able to do something, and why I reopened it. E.g. it may be common that you just have g = <constant assigned to each field>; *retbuf = g which could be optimized.

@iSazonov
Copy link
Contributor Author

iSazonov commented Dec 8, 2022

Thanks for the clarification! I don't ask for the impossible. The main things I noted is (1) that there is undesirable (invisible) copying (2) that there are examples where we can help the compiler make the code better.
So I'd be happy (1) if you can find ways to improve some special cases, (2) and maybe come up with something more fundamental in JIT/C#.

@benaadams
Copy link
Member

benaadams commented Dec 9, 2022

Some methods don't have any throws and are self contained (especially post inlining); could the Jit write directly to the output buffer in these cases?

@benaadams
Copy link
Member

Related issue #12219

@ghost ghost locked as resolved and limited conversation to collaborators May 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

8 participants