Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stack arguments passing via reflection. #46456

Closed
sandreenko opened this issue Dec 29, 2020 · 29 comments
Closed

Fix stack arguments passing via reflection. #46456

sandreenko opened this issue Dec 29, 2020 · 29 comments

Comments

@sandreenko
Copy link
Contributor

This is the last item from #41130. I have extracted it into a separate issue for a cleaner discussion.

Such a method when called via reflection is getting stack arguments with wrong offsets:
public int TestSmallStackArgsMethod(short s1, short s2, short s3, short s4, short s5, short s6, short s7, short s8, short s9, short s10, short s11, short s12)

This minimal test:
sandreenko@678ce4b

because VM logic that moves "Invoke" arguments from the array to the slots does not know about apple specific alignment. Actually, VM operates in "stack slot" terms that do not exist there.

I see two options here:

  1. a small targeted fix for reflection and leave the other parts as is;
  2. teach VM to work with byteSize/byteOffset as I did with the Jit (a non-working prototype is here sandreenko@592b51b)

the first is cheap, the second could still be preferable if any other components depend on them (like if we want varags or debugger needs proper sizes/offsets somehow)

What do you think @sdmaclea , @janvorli , @davidwrighton?

@sandreenko sandreenko added this to the 6.0.0 milestone Dec 29, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Dec 29, 2020
@jkotas
Copy link
Member

jkotas commented Dec 30, 2020

the second could still be preferable if any other components depend on them

Does it have impact on GC stack walking (TransitionFrame::PromoteCallerStack and friends)?

@sandreenko
Copy link
Contributor Author

Does it have impact on GC stack walking (TransitionFrame::PromoteCallerStack and friends)?

Yes, it goes down to ArgIteratorTemplate<ARGITERATOR_BASE>::GetNextOffset() that calls int cbArg = StackElemSize(argSize) that returns incorrect results. To repro that we need something like occcupy all registers, short a, short b, Object o) so GetNextOffset for o returns 16 instead of 8.

In the prototype, my idea was to return the correct result from StackElemSize by adding additional arguments: IsValueType, IsFloatHfa. If we go this way would it be preferable to use such a signature on all platforms or only for arm64 apple?

@janvorli
Copy link
Member

I would go with the option #2. I would make the signature of StackElemSize the same for all platforms and just ignore the argument for non-osx arm64.
Looking at your experimental change, I can see a problem in the change in ArgIteratorTemplate<ARGITERATOR_BASE>::GetNextOffset(). The m_idxStack handling needs to be updated. Since the stack element size is 1, the following is not correct: int argOfs = TransitionBlock::GetOffsetOfArgs() + m_idxStack * 8;. And the m_idxStack += cArgSlots; is not correct either, it would need to take into account the alignment of the argument. If e.g. the first stack passed argument is char and the second stack passed one is int64, then the char should be at stack offset 0 (relative to the start of stack arguments) and the int64 at stack offset 8. As it is now, it would put the int64 argument at offset 1.

@sandreenko sandreenko self-assigned this Dec 30, 2020
@sandreenko
Copy link
Contributor Author

I will fix the prototype according to @janvorli review and publish a PR after the NY.

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Dec 30, 2020
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 11, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 19, 2021
@janvorli
Copy link
Member

@sandreenko I think this can be closed now, right?

@sandreenko
Copy link
Contributor Author

@sandreenko I think this can be closed now, right?

I think it is still not working for crossgen2, like

returns 8 when it should be something like

        /// <summary>
        /// This property is only overridden in Arm64 MacOS variant of the transition block.
        /// </summary>
        public virtual bool IsAppleArm64ABI => false;

        public abstract int PointerSize { get; }

        public int StackElemSize(int size, bool isValueType = false, bool isFloatHfa = false)
        {
            Debug.Assert(isValueType || !isFloatHfa);
            if (IsAppleArm64ABI)
            {
                if (!isValueType)
                {
                    Debug.Assert((size & (size - 1)) == 0);
                    return size;
                }
                if (isFloatHfa)
                {
                    Debug.Assert((size % 4) == 0);
                    return size;
                }
            }

            return (((size) + PointerSize - 1) & -PointerSize);
        }

but I am not sure if we ever crossgen2 any reflection code and if we don't we can ignore this issue for 6.0.

@janvorli
Copy link
Member

Reflection invocation and the related argument passing is handled purely in the native runtime for both jitted and crossgened code.
@trylek do you know what is the managed transition block stuff above used for in crossgen2?

@trylek
Copy link
Member

trylek commented Jul 19, 2021

Not really, according to my knowledge the transition block algorithms are only used for generating GC refmap information for method import cells. For @sandreenko's comment, I'm not really sure what "crossgen2[-ing] any reflection code" alludes to, we certainly do Crossgen2-compile reflection-related library code and arbitrary code that may be called via reflection.

@janvorli
Copy link
Member

Hmm, so if this code doesn't fully match its native counterpart for the Apple Silicon case, we could possibly generate wrong GC refmap. Maybe that could possibly explain the weird null reference exceptions we are hunting. cc: @mangod9, @davidwrighton

@sandreenko
Copy link
Contributor Author

For @sandreenko's comment, I'm not really sure what "crossgen2[-ing] any reflection code" alludes to, we certainly do Crossgen2-compile reflection-related library code and arbitrary code that may be called via reflection.

Do you say that if we have a method like

    object[] parametersArray = new object[] { many args on regs, double, SIMD12, byte, int....  };
    result = methodInfo.Invoke(classInstance, parametersArray);

and we call crossgen2 compiler for it it will compile it into native?

@janvorli
Copy link
Member

Both for the jitted and crossgened case, it will call a runtime native helper that performs the invocation.

@janvorli
Copy link
Member

See RuntimeMethodHandle::InvokeMethod in src/coreclr/vm/reflectioninvocation.cpp

@sandreenko
Copy link
Contributor Author

@janvorli could you please take this issue? I had a prototype for it in #47221 but never had a chance to finish it.

@trylek
Copy link
Member

trylek commented Jul 19, 2021

For Sergey's comment, the call to Invoke certainly doesn't make Crossgen2 perform all the calling convention transforms, AFAIK it should just emit code for constructing the parameter array and calling Invoke which will use runtime reflection code to put the arguments on the stack and in registers and carry out the actual call. Having said that, subtle differences in TransitionBlock or ArgIterator (both of which are highly sensitive to architecture / OS variations) between the runtime and Crossgen2 can potentially cause GC holes causing the OSX arm64 nullrefs we've been seeing for a while now. R2RDump dumps the GC refmap information for all import cells under the --sc command-line option so it should be hopefully trivial to double-check whether a Crossgen2-generated OSX arm64 executable contains methods with the proper GC refmap info.

@janvorli
Copy link
Member

could you please take this issue?

Of course!

@janvorli janvorli self-assigned this Jul 19, 2021
@mangod9
Copy link
Member

mangod9 commented Jul 20, 2021

we should check whether this case is hit within crossgen2 compilation of tests which are intermittently failing. Dont think cg2 itself has any reflection based invocations does it?

@jkotas
Copy link
Member

jkotas commented Jul 20, 2021

It can affect any code, the impact is not limited to reflection.

@trylek
Copy link
Member

trylek commented Jul 20, 2021

This is an interesting point. On the one hand, I'm not aware of any place in Crossgen2 source code explicitly calling the Invoke method; on the other hand, Crossgen2-compiled Crossgen2 certainly does emit GC refmap info for various callees so a discrepancy between CG2 and runtime would still apply. Having said that, I think that CoreCLR testing doesn't actually use CG2-compiled Crossgen2, that gets only produced by the installer build for the purpose of publishing the Crossgen2 nuget package, it's not used for internal testing within the runtime repo (at least not in the coreclr partition). According to my recollection we shouldn't be running it on top of live-built (and CG2-compiled) framework exactly because that used to transform various CG2 bugs in its compilation into subsequent weird compile-time crashes but I'm not 100% sure whether the OSX arm64 build conforms to the same scheme, it could be special-cased to some extent.

@jkotas
Copy link
Member

jkotas commented Jul 20, 2021

I think that CoreCLR testing doesn't actually use CG2-compiled Crossgen2

It uses CG2-compiler CoreLib and the framework, right? The crashes can be triggered by GC tracking problems introduced in the framework.

@janvorli
Copy link
Member

It seems that the test failures that I am seeing streaming in regularly are most if not all with running coreclr tests crossgened with crossgen2.

@mangod9
Copy link
Member

mangod9 commented Jul 20, 2021

I tried to disable R2R while running cg2 but the CI is still showing failures. @trylek this change should have disabled R2R correct: #55983 ?

The NRE while crossgen2-ing are still occuring with this change.

@trylek
Copy link
Member

trylek commented Jul 20, 2021

This change should disable R2R on Linux / OSX, Windows section should be below in the same file.

@mangod9
Copy link
Member

mangod9 commented Jul 20, 2021

yeah only wanted to check whether disabling R2R helps resolve NRE on OSX, doesn't appear to have helped.

@trylek
Copy link
Member

trylek commented Jul 20, 2021

(Which is probably sufficient as you're interested in OSX arm64.) Your findings confirm my suspicion that it's not a R2R bug, it's a runtime bug (somewhere between VM / JIT / GC).

@janvorli
Copy link
Member

Aren't the tests in CG2 legs using runtime crossgened by CG2 too? The change @mangod9 made is just for the test binaries.

@trylek
Copy link
Member

trylek commented Jul 20, 2021

I think the point is that we most often see the NRE in the CG2 compilation itself and the CG2 compiler as such is not crossgenned. Having said that, I'm not sure what runtime it runs on - normally it should be run using a LKG version of the dotnet command, not by the "live" one, but I'm not sure if it's not special cased on OSX arm64 as it's a new platform. If it does use the live-built dotnet on OSX arm64, it would be still susceptible to potential CG2 compilation bugs of S.P.C and the framework, disabling those would require patching two more places in the build scripts.

@janvorli
Copy link
Member

I think it uses the LKG runtime from the .dotnet folder. But doesn't that have runtime assemblies crossgened by crossgen2? I mean, when the SDK is published?

@trylek
Copy link
Member

trylek commented Jul 20, 2021

That is true (assuming it's a sufficiently new dotnet) but COMPlus_ReadyToRun=0 should block that out too and I believe that's what Manish did in his instrumented PR.

@janvorli
Copy link
Member

Closing as this work is done. The only remaining related work is to create matching changes in the managed clone of the ArgIterator and TransitionBlock in CG2.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants