Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing large structs by value asserts/crashes on ARM32/x86 #12441

Closed
mikedn opened this issue Apr 7, 2019 · 9 comments · Fixed by dotnet/coreclr#23881
Closed

Passing large structs by value asserts/crashes on ARM32/x86 #12441

mikedn opened this issue Apr 7, 2019 · 9 comments · Fixed by dotnet/coreclr#23881
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@mikedn
Copy link
Contributor

mikedn commented Apr 7, 2019

using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

class Program
{
    [StructLayout(LayoutKind.Sequential)]
    unsafe struct S
    {
        fixed byte x[65536];
    }

    class C
    {
        public S s;
    }

    static void Main() => Test(new C());

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Call(int r0, int r1, int r2, int r3, int r4, int r5, int r6, S s)
    {
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Test(C c)
    {
        Call(0, 1, 2, 3, 4, 5, 42, c.s);
        Console.WriteLine("done");
    }
}

genPutArgStk is basically doing an unrolled copy and the emitter asserts because the offset is too large. It shouldn't do unroll, even for smaller structs.

@mikedn
Copy link
Contributor Author

mikedn commented Apr 7, 2019

Hmm, and x86 seems to have a problem too. It generates code that's apparently correct:

G_M55886_IG01:
       55           push     ebp
       8BEC         mov      ebp, esp
       57           push     edi
       56           push     esi
       50           push     eax
       33C0         xor      eax, eax
       8945F4       mov      dword ptr [ebp-0CH], eax
G_M55886_IG02:
       6A02         push     2
       6A03         push     3
       6A04         push     4
       6A05         push     5
       6A2A         push     42
       83C104       add      ecx, 4
       894DF4       mov      bword ptr [ebp-0CH], ecx
       81EC00000100 sub      esp, 0x10000
       8B55F4       mov      edx, bword ptr [ebp-0CH]
       8BFC         mov      edi, esp
       8BF2         mov      esi, edx
       B900000100   mov      ecx, 0x10000
       F3A4         rep movsb
       33C9         xor      ecx, ecx
       BA01000000   mov      edx, 1
       FF157C2ED506 call     [Program:Call(int,int,int,int,int,int,int,struct)]
       8B0D7432A705 mov      ecx, gword ptr [05A73274H]      'done'
       FF15D8EDEE06 call     [System.Console:WriteLine(ref)]
G_M55886_IG03:
       59           pop      ecx
       5E           pop      esi
       5F           pop      edi
       5D           pop      ebp
       C3           ret

yes crashes with "Access Violation" on rep movsb. Presumably that's because the stack needs to be expanded and the "check stack" code is missing from prolog in this case.

@mikedn mikedn changed the title Passing large structs by value results in an assert on ARM32 Passing large structs by value asserts/crashes on ARM32/x86 Apr 7, 2019
@AndyAyersMS
Copy link
Member

x86 probably needs to do stack page probing here.

cc @dotnet/jit-contrib

@AndyAyersMS
Copy link
Member

Putting in 3.0 for now.

@BruceForstall
Copy link
Member

I'll take this one.

the "check stack" code is missing from prolog

Looks like we need to do the probing when the argument is getting written to the stack at the call site, not in the prolog. I.e., just before/at the:

sub      esp, 0x10000

I was going to say we could potentially do the probing in the prolog if we figured out the largest such call-site requirement from all call sites, but if there's a dynamic localloc, we wouldn't know the starting point, and we have to probe after the localloc.

@BruceForstall BruceForstall self-assigned this Apr 8, 2019
@mikedn
Copy link
Contributor Author

mikedn commented Apr 8, 2019

Looks like we need to do the probing when the argument is getting written to the stack at the call site, not in the prolog. I.e., just before/at the:

Yep, it looks like this is what VC++ x86 is doing. Well, VC++ is actually calling __chkstk instead of doing inline probing like the JIT does. But the effect is the same, there's one __chkstk call for every "large" argument.

I'm not sure about Linux though, Clang x86 doesn't seem to do any probing.

@BruceForstall
Copy link
Member

I see:

IMPL_LIMITATION("JIT doesn't support offsets larger than 65535 into valuetypes\n");

for arm32 on your test case. We actually don't have anything except "unrolling" implemented for arm32 struct passing, so I don't see that changing (at least in the short term).

For x86, though, on your test case, I see this fire:

    if (compArgSize != (size_t)(unsigned short)compArgSize)
        NO_WAY("Too many arguments for the \"ret\" instruction to pop");

which makes sense, since the argument space is too big. I wonder why you're not seeing this? If I change the array to 65500 bytes, I get the same code you see (modulo this constant).

What's interesting is that I can induce the first assert (JIT doesn't support offsets larger than 65535 into valuetypes) for x86 by using a struct with an object pointer, which forces us to use unrolling for putArgStk, e.g.:

using System;
using System.Runtime.InteropServices;
using System.Runtime.CompilerServices;

namespace BigFrames
{

    [StructLayout(LayoutKind.Explicit)]
    public struct Struct80000ref
    {
        [FieldOffset(0)]
        public int i1;
        [FieldOffset(79996)]
        public Object o1;
    }

    public class Test
    {
        public static int iret = 1;

        [MethodImplAttribute(MethodImplOptions.NoInlining)]
        public static void TestWrite(Struct80000ref s)
        {
            Console.Write("Enter TestWrite: ");
            Console.WriteLine(s.o1.GetHashCode());
            iret = 100;
        }

        [MethodImplAttribute(MethodImplOptions.NoInlining)]
        public static void Test1()
        {
            Console.WriteLine("Enter Test1");
            Struct80000ref s = new Struct80000ref();
            s.o1 = new Object();
            TestWrite(s);
        }

        public static int Main()
        {
            Test1();

            if (iret == 100)
            {
                Console.WriteLine("TEST PASSED");
            }
            else
            {
                Console.WriteLine("TEST FAILED");
            }
            return iret;
        }
    }
}

However, this is going to end up hitting the second assert (Too many arguments for the \"ret\" instruction to pop) at the callee, so it doesn't seem like it needs to be fixed.

@mikedn
Copy link
Contributor Author

mikedn commented Apr 9, 2019

For x86, though, on your test case, I see this fire:

Hmm, tried again, I don't see that assert. That probably means that for rep movsb doesn't crash for some reason and the JIT ends up compiling the Call method and asserts. Perhaps there are differences in the way the OS handles this case so somehow it works on your machine but not on mine (latest Win10 insider preview).

We actually don't have anything except "unrolling" implemented for arm32 struct passing, so I don't see that changing (at least in the short term).

Presumably calling the memcpy JIT helper like genCodeForCpBlk does is problematic during arg setup?

@BruceForstall
Copy link
Member

That probably means that for rep movsb doesn't crash for some reason and the JIT ends up compiling the Call method and asserts.

Ah, yes. I do see the same code as you for Test, but then Call asserts.

Typically, it seems like when I run test cases like this, the stack has a lot of committed pages, probably due to previous JIT compilations or VM work or ???

Presumably you could write some code to first consume a bunch of stack (but not overflow), then pop it, then run your test case, and get the same behavior as me (Call asserts).

Bottom line: we still need some stack probing here.

@mikedn
Copy link
Contributor Author

mikedn commented Apr 10, 2019

Presumably you could write some code to first consume a bunch of stack (but not overflow), then pop it, then run your test case, and get the same behavior as me (Call asserts).

Yep. Just calling another function with a large local variable in it has this effect:

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Escape(ref S s)
    {
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Probe()
    {
        S s = default;
        Escape(ref s);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Call(int r0, int r1, int r2, int r3, int r4, int r5, int r6, S s)
    {
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Test(C c)
    {
        Probe();
        Call(0, 1, 2, 3, 4, 5, 42, c.s);
        Console.WriteLine("done");
    }
Unhandled Exception: System.InvalidProgramException: The JIT compiler encountered invalid IL code or an internal limitation.
   at Program.Call(Int32 r0, Int32 r1, Int32 r2, Int32 r3, Int32 r4, Int32 r5, Int32 r6, S s)
   at Program.Test(C c) in D:\Projects\jit\test.cs:line 44
   at Program.Main() in D:\Projects\jit\test.cs:line 20

Bottom line: we still need some stack probing here.

Or a new IMPL_LIMITATION so that Test itself throws InvalidProgramException when it is being compiled. I doubt that passing such large structs on the stack is useful to anyone but if someone does this then the appropriate behavior would be to either work correctly or throw InvalidProgramException, not crash the process with an access violation error.

BruceForstall referenced this issue in BruceForstall/coreclr Apr 10, 2019
On x86, structs are passed by value on the stack. We copy structs
to the stack in various ways, but one way is to first subtract the
size of the struct and then use a "rep movsb" instruction. If the
struct we are passing is sufficiently large, this can cause us to
miss the stack guard page.

So, introduce stack probes for these struct copies.

It turns out the stack pointer after prolog probing can be sitting
near the very end of the guard page (one `STACK_ALIGN` slot before
the end, which allows a "call" instruction which pushes its
return address to touch the guard page with the return address push).
We don't want to probe with every argument push, though. So change
the prolog probing to insert an "extra" touch at the final SP location
if the previous touch was "too far" away, leaving at least some
buffer zone for un-probed SP adjustments. I chose this to be the
size of the largest SIMD register, which also can get copied to the
argument stack with a "SUB;MOV" sequence.

Added several test case variations showing different large stack
probe situations.

Fixes #23796
BruceForstall referenced this issue in dotnet/coreclr Apr 12, 2019
* Fix x86 stack probing

On x86, structs are passed by value on the stack. We copy structs
to the stack in various ways, but one way is to first subtract the
size of the struct and then use a "rep movsb" instruction. If the
struct we are passing is sufficiently large, this can cause us to
miss the stack guard page.

So, introduce stack probes for these struct copies.

It turns out the stack pointer after prolog probing can be sitting
near the very end of the guard page (one `STACK_ALIGN` slot before
the end, which allows a "call" instruction which pushes its
return address to touch the guard page with the return address push).
We don't want to probe with every argument push, though. So change
the prolog probing to insert an "extra" touch at the final SP location
if the previous touch was "too far" away, leaving at least some
buffer zone for un-probed SP adjustments. I chose this to be the
size of the largest SIMD register, which also can get copied to the
argument stack with a "SUB;MOV" sequence.

Added several test case variations showing different large stack
probe situations.

Fixes #23796

* Increase the argument size probe buffer

* Formatting
davmason referenced this issue in davmason/coreclr Apr 27, 2019
* Fix x86 stack probing

On x86, structs are passed by value on the stack. We copy structs
to the stack in various ways, but one way is to first subtract the
size of the struct and then use a "rep movsb" instruction. If the
struct we are passing is sufficiently large, this can cause us to
miss the stack guard page.

So, introduce stack probes for these struct copies.

It turns out the stack pointer after prolog probing can be sitting
near the very end of the guard page (one `STACK_ALIGN` slot before
the end, which allows a "call" instruction which pushes its
return address to touch the guard page with the return address push).
We don't want to probe with every argument push, though. So change
the prolog probing to insert an "extra" touch at the final SP location
if the previous touch was "too far" away, leaving at least some
buffer zone for un-probed SP adjustments. I chose this to be the
size of the largest SIMD register, which also can get copied to the
argument stack with a "SUB;MOV" sequence.

Added several test case variations showing different large stack
probe situations.

Fixes #23796

* Increase the argument size probe buffer

* Formatting
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants