-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Passing large structs by value asserts/crashes on ARM32/x86 #12441
Comments
Hmm, and x86 seems to have a problem too. It generates code that's apparently correct: G_M55886_IG01:
55 push ebp
8BEC mov ebp, esp
57 push edi
56 push esi
50 push eax
33C0 xor eax, eax
8945F4 mov dword ptr [ebp-0CH], eax
G_M55886_IG02:
6A02 push 2
6A03 push 3
6A04 push 4
6A05 push 5
6A2A push 42
83C104 add ecx, 4
894DF4 mov bword ptr [ebp-0CH], ecx
81EC00000100 sub esp, 0x10000
8B55F4 mov edx, bword ptr [ebp-0CH]
8BFC mov edi, esp
8BF2 mov esi, edx
B900000100 mov ecx, 0x10000
F3A4 rep movsb
33C9 xor ecx, ecx
BA01000000 mov edx, 1
FF157C2ED506 call [Program:Call(int,int,int,int,int,int,int,struct)]
8B0D7432A705 mov ecx, gword ptr [05A73274H] 'done'
FF15D8EDEE06 call [System.Console:WriteLine(ref)]
G_M55886_IG03:
59 pop ecx
5E pop esi
5F pop edi
5D pop ebp
C3 ret yes crashes with "Access Violation" on |
x86 probably needs to do stack page probing here. cc @dotnet/jit-contrib |
Putting in 3.0 for now. |
I'll take this one.
Looks like we need to do the probing when the argument is getting written to the stack at the call site, not in the prolog. I.e., just before/at the:
I was going to say we could potentially do the probing in the prolog if we figured out the largest such call-site requirement from all call sites, but if there's a dynamic localloc, we wouldn't know the starting point, and we have to probe after the localloc. |
Yep, it looks like this is what VC++ x86 is doing. Well, VC++ is actually calling I'm not sure about Linux though, Clang x86 doesn't seem to do any probing. |
I see:
for arm32 on your test case. We actually don't have anything except "unrolling" implemented for arm32 struct passing, so I don't see that changing (at least in the short term). For x86, though, on your test case, I see this fire:
which makes sense, since the argument space is too big. I wonder why you're not seeing this? If I change the array to 65500 bytes, I get the same code you see (modulo this constant). What's interesting is that I can induce the first assert (
However, this is going to end up hitting the second assert ( |
Hmm, tried again, I don't see that assert. That probably means that for
Presumably calling the memcpy JIT helper like |
Ah, yes. I do see the same code as you for Test, but then Call asserts. Typically, it seems like when I run test cases like this, the stack has a lot of committed pages, probably due to previous JIT compilations or VM work or ??? Presumably you could write some code to first consume a bunch of stack (but not overflow), then pop it, then run your test case, and get the same behavior as me (Call asserts). Bottom line: we still need some stack probing here. |
Yep. Just calling another function with a large local variable in it has this effect: [MethodImpl(MethodImplOptions.NoInlining)]
static void Escape(ref S s)
{
}
[MethodImpl(MethodImplOptions.NoInlining)]
static void Probe()
{
S s = default;
Escape(ref s);
}
[MethodImpl(MethodImplOptions.NoInlining)]
static void Call(int r0, int r1, int r2, int r3, int r4, int r5, int r6, S s)
{
}
[MethodImpl(MethodImplOptions.NoInlining)]
static void Test(C c)
{
Probe();
Call(0, 1, 2, 3, 4, 5, 42, c.s);
Console.WriteLine("done");
}
Or a new |
On x86, structs are passed by value on the stack. We copy structs to the stack in various ways, but one way is to first subtract the size of the struct and then use a "rep movsb" instruction. If the struct we are passing is sufficiently large, this can cause us to miss the stack guard page. So, introduce stack probes for these struct copies. It turns out the stack pointer after prolog probing can be sitting near the very end of the guard page (one `STACK_ALIGN` slot before the end, which allows a "call" instruction which pushes its return address to touch the guard page with the return address push). We don't want to probe with every argument push, though. So change the prolog probing to insert an "extra" touch at the final SP location if the previous touch was "too far" away, leaving at least some buffer zone for un-probed SP adjustments. I chose this to be the size of the largest SIMD register, which also can get copied to the argument stack with a "SUB;MOV" sequence. Added several test case variations showing different large stack probe situations. Fixes #23796
* Fix x86 stack probing On x86, structs are passed by value on the stack. We copy structs to the stack in various ways, but one way is to first subtract the size of the struct and then use a "rep movsb" instruction. If the struct we are passing is sufficiently large, this can cause us to miss the stack guard page. So, introduce stack probes for these struct copies. It turns out the stack pointer after prolog probing can be sitting near the very end of the guard page (one `STACK_ALIGN` slot before the end, which allows a "call" instruction which pushes its return address to touch the guard page with the return address push). We don't want to probe with every argument push, though. So change the prolog probing to insert an "extra" touch at the final SP location if the previous touch was "too far" away, leaving at least some buffer zone for un-probed SP adjustments. I chose this to be the size of the largest SIMD register, which also can get copied to the argument stack with a "SUB;MOV" sequence. Added several test case variations showing different large stack probe situations. Fixes #23796 * Increase the argument size probe buffer * Formatting
* Fix x86 stack probing On x86, structs are passed by value on the stack. We copy structs to the stack in various ways, but one way is to first subtract the size of the struct and then use a "rep movsb" instruction. If the struct we are passing is sufficiently large, this can cause us to miss the stack guard page. So, introduce stack probes for these struct copies. It turns out the stack pointer after prolog probing can be sitting near the very end of the guard page (one `STACK_ALIGN` slot before the end, which allows a "call" instruction which pushes its return address to touch the guard page with the return address push). We don't want to probe with every argument push, though. So change the prolog probing to insert an "extra" touch at the final SP location if the previous touch was "too far" away, leaving at least some buffer zone for un-probed SP adjustments. I chose this to be the size of the largest SIMD register, which also can get copied to the argument stack with a "SUB;MOV" sequence. Added several test case variations showing different large stack probe situations. Fixes #23796 * Increase the argument size probe buffer * Formatting
genPutArgStk
is basically doing an unrolled copy and the emitter asserts because the offset is too large. It shouldn't do unroll, even for smaller structs.The text was updated successfully, but these errors were encountered: