[x86/Linux] Stack align 16 bytes for JIT code #8849

seanshpark · 2017-01-09T03:59:39Z

Change JIT code to align stack in 16 byte used in modern compiler

seanshpark · 2017-01-09T04:00:06Z

seanshpark · 2017-01-09T06:14:45Z

parjong · 2017-01-09T07:54:49Z

src/jit/target.h

@@ -484,9 +484,15 @@ typedef unsigned short regPairNoSmall; // arm: need 12 bits
 #define MIN_ARG_AREA_FOR_CALL 0 // Minimum required outgoing argument space for a call.

 #define CODE_ALIGN 1 // code alignment requirement
+#if !defined(FEATURE_PAL)
 #define STACK_ALIGN 4 // stack alignment requirement


This 16-byte stack alignment requirement comes from GCC/Clang (as discussed in #8590). I'm not sure which flag will be better: __GNUC__ or FEATURE_PAL.

Is the stack alignment requirement part of the ABI? Perhaps there should be a UNIX_X86_ABI, similar to UNIX_AMD64_ABI and UNIX_ARM_ABI?

As I understand, it is not a part of standard ABI. It is just a requirement of GCC (> 4.5) and corresponding Clang.

@parjong. Good point. __GNUC__ vs FEATURE_PAL for this case.

I think that in this case, __GNUC__ would be the right choice.

Agree. From https://en.wikipedia.org/wiki/X86_calling_conventions: In Linux, GCC sets the de facto standard for calling conventions. Since GCC version 4.5, the stack must be aligned to a 16-byte boundary.

GNUC is not cross-compilation friendly though: It describes the host compiler; not the target environment. Introducing UNIX_X86_ABI maybe be best.

UNIX_X86_ABI maybe be best.

Thank you!. I'll add this to another PR and later rebase this patch. I think this PR will take some time :)

janvorli · 2017-01-09T10:30:29Z

CC: @dotnet/jit-contrib

jkotas · 2017-01-09T13:30:52Z

Do you see the stack actually aligned with this change? E.g. replace the code to pad the stack in ThePreStub with check that the stack is aligned - does it work?

The x86 calling convention is stdcall (=callee pop) and so you may need to pad each callsite by variable pad instead or in addition to this.

briansull · 2017-01-09T20:37:03Z

src/jit/codegenxarch.cpp

 // Push a single 4-byte zero. This matches the 4-byte STACK_ALIGN value.
 static_assert_no_msg(STACK_ALIGN == REGSIZE_BYTES);
 inst_IV(INS_push_hide, 0); // --- push 4-byte 0
+#else
+ // Push four 4-byte zeros. This matches the 16-byte STACK_ALIGN value.
+ static_assert_no_msg(STACK_ALIGN == REGSIZE_BYTES * 4);


I think that it is better to replace all of these #ifdefs with a loop that pushes STACK_ALIGN bytes of zeros onto the stack.

static_assert_no_msg((STACK_ALIGN % REGSIZE_BYTES) == 0)); unsigned const count = (STACK_ALIGN / REGSIZE_BYTES); for (unsigned i=0; i<count; i++) { inst_IV(INS_push_hide, 0); // --- push REG_SIZE bytes of 0 } // Note that the stack must always be aligned to STACK_ALIGN bytes

Thank you for the comment. It's more clean and nice.

briansull · 2017-01-09T20:53:00Z

src/jit/lclvars.cpp

+ // We add 16 so compLclFrameSize is still a multiple of 16.
+ lvaIncrementFrameSize(STACK_ALIGN);
+ }
+ assert((compLclFrameSize % STACK_ALIGN) == 0);


I think that this need to be written differently:
First if it isn't the final frame layout phase you have to add on the maximum pad of 12 bytes.
Then after that you have to align compLclFrameSize to a multiple of 16.
This can probably be unconditionally as well: (not inside the #else)

if (STACK_ALIGN > REGSIZE_BYTES) { if (lvaDoneFrameLayout != FINAL_FRAME_LAYOUT) { // If we are not doing final layout, we don't know the exact value of compLclFrameSize // and thus do not know how much we will need to add in order to be aligned. // We add the maximum pad that we could ever have (which is 12) lvaIncrementFrameSize(STACK_ALIGN - REGSIZE_BYTES); } // The stack must always be 16 byte aligned. if ((compLclFrameSize % STACK_ALIGN) != 0) { lvaIncrementFrameSize(STACK_ALIGN - (compLclFrameSize % STACK_ALIGN)); } }

seanshpark · 2017-01-10T01:17:34Z

@jkotas , thank you for the comment!

Do you see the stack actually aligned with this change? E.g. replace the code to pad the stack in ThePreStub with check that the stack is aligned - does it work?

I've checked with one unit test case that failed when alignment, but I think it's better to check this in the ThePreStub. I'll do this with @briansull advised code.

The x86 calling convention is stdcall (=callee pop) and so you may need to pad each callsite by variable pad instead or in addition to this.

Yes, I also think so. I've done a unit test compare with 4 byte alignment in toolchain.cmake patch vs this patch and got about 8400 pass vs 8000 pass. So it seems 400 test cases failed are related with stack alignment. I'll do these fixes after this PR, one by one.

seanshpark · 2017-01-10T12:04:16Z

With 967a45b patch, I've done a small test with gentogen01.exe unit test program. I've added a break point at ThePreStub and watched ESP value when entered at right before PUSH EBP command. Normally if it is aligned with 16 byte, stack should hold a RET command and thus address should be something ending with c address like this.

1008	    STUB_PROLOG
=> 0xb70ddc10 <ThePreStub+0>:	55	push   %ebp
(gdb) info register esp
esp            0xbfffe13c	0xbfffe13c

If this is somewhat incorrect, please let me know. I'll start over again.

This method is actually System.AppDomain:SetupDomain(bool,ref,ref,ref,ref):this and the assembly JITted code started like this.

G_M46019_IG01:
       55           push     ebp
       8BEC         mov      ebp, esp
       57           push     edi
       56           push     esi
       83EC30       sub      esp, 48

As there is three push from the start and 48 substraction, it is aligned to 16 when some call is made.

And then the when entering the second JITted code which is System.Threading.Monitor:Enter(ref,byref), it starts like

G_M10401_IG01:
       55           push     ebp
       8BEC         mov      ebp, esp
       83EC10       sub      esp, 16
       894DF8       mov      gword ptr [ebp-08H], ecx
       8955F4       mov      bword ptr [ebp-0CH], edx

and as there is no two pushes compared to SetupDomain method and as of result, it is not aligned to 16 when calling to the next function which is ThePreStub.

1008	    STUB_PROLOG
=> 0xb70ddc10 <ThePreStub+0>:	55	push   %ebp
(gdb) info register esp
esp            0xbfffe124	0xbfffe124
(gdb) bt
#0  ThePreStub () at /home/maxwell/netcore/coreclr/src/vm/i386/asmhelpers.S:1008
#1  0xb5a30201 in ?? ()
#2  0xb5a30103 in ?? ()
#3  0xb70dda27 in CallDescrWorkerInternal () at /home/maxwell/netcore/coreclr/src/vm/i386/asmhelpers.S:444

It seems that pushing two registers somehow gave an 16 byte alignment but not for the second JIT.
So I think saving registers like push esi, push edi needs adjustment with lvaIncrementFrameSize.
I would like to know where to look in to fix this. Anyone knows where, please help me or if I'm wrong it's ok to point it out.

seanshpark · 2017-01-10T12:06:14Z

I need to update again to apply UNIX_X86_ABI after #8863 lands

seanshpark · 2017-01-11T01:16:43Z

I think I've found what I was thinking: compCalleeRegsPushed

seanshpark · 2017-01-24T11:54:34Z

Some changes are from Windows_NT x64 Formatting

seanshpark · 2017-01-24T23:33:02Z

@janvorli , could you please help me with build break on Windows? I cann't find the reason. Line 3175 is empty but says expanded from the macro.

3172:    compiler->unwindEmit(*codePtr, coldCodePtr);
3173:
3174:    /* Finalize the line # tracking logic after we know the exact block sizes/offsets */
3175:
3176:    genIPmappingGen();

d:\j\workspace\x64_release_w---0575cb46\src\jit\codegencommon.cpp(3175): error C2121: '#': invalid character: possibly the result of a macro expansion [D:\j\workspace\x64_release_w---0575cb46\bin\obj\Windows_NT.x64.Release\src\jit\crossgen\clrjit_crossgen.vcxproj]
15:04:50 d:\j\workspace\x64_release_w---0575cb46\src\jit\codegencommon.cpp(3175): error C2059: syntax error: 'if' [D:\j\workspace\x64_release_w---0575cb46\bin\obj\Windows_NT.x64.Release\src\jit\crossgen\clrjit_crossgen.vcxproj]
15:04:50 d:\j\workspace\x64_release_w---0575cb46\src\jit\codegencommon.cpp(3175): error C2143: syntax error: missing ';' before '{' [D:\j\workspace\x64_release_w---0575cb46\bin\obj\Windows_NT.x64.Release\src\jit\crossgen\clrjit_crossgen.vcxproj]

BruceForstall · 2017-02-02T20:54:52Z

src/jit/compiler.h

@@ -1262,6 +1265,9 @@ class fgArgInfo
 unsigned argCount; // Updatable arg count value
 unsigned nextSlotNum; // Updatable slot count value
 unsigned stkLevel; // Stack depth when we make this call (for x86)
+#if defined(UNIX_X86_ABI)
+ unsigned padStkAlign; // Count of padding for stack alignment
+#endif


comment could be improved: Is this bytes of padding? Where is the padding? Is this a sum of all the padding for each individual argument (since fgArgTabEntryPtr also has a padding field), or does it represent some pre-/post- argument padding?

BruceForstall · 2017-02-02T20:56:59Z

src/jit/morph.cpp

+#if defined(UNIX_X86_ABI)
+void fgArgInfo::ArgsAlignPadding()
+{
+ // To get the padding amount, sum up all the slots and get the remainer for padding


remainer [](start = 67, length = 8)

typo: remainer => remainder

Thank you :)

BruceForstall · 2017-02-02T23:59:10Z

src/jit/morph.cpp

+ if (firstArgTabEntry != nullptr)
+ {
+ firstArgTabEntry->padStkAlign = (numSlotsAligned - (numSlots % numSlotsAligned)) % numSlotsAligned;
+ this->padStkAlign = firstArgTabEntry->padStkAlign;


I don't understand this calculation. Can you explain? What are you trying to compute?

To get remainder of slots to align by, multiple of, 4(for x86/linux). For example, if numSlots is 7, we need one more to make it 8, which will make . So, 4 - (7 % 4) = 1. It seems % numSlotsAligned at the end isn't needed. What was I thinking...

I think you can use AlignmentPad(numSlots, numSlotsAligned)

In reply to: 99253550 [](ancestors = 99253550)

When numSlots is 8, it would be 4 - (8 % 4) = 4. If this happens, we don't need to add as it is already aligned. So we need the last % numSlotsAligned

BruceForstall · 2017-02-03T00:00:50Z

src/jit/target.h

+ #define STACK_ALIGN_SHIFT_ALL 4 // Shift-right amount to convert stack size in bytes to size in STACK_ALIGN units
+ #define STACK_ALIGN_PADDING 16 // Shift-right amount for padding and rest for offset
+ #define STACK_ALIGN_STKOFFSET ((1<<STACK_ALIGN_PADDING)-1)
+#endif // !FEATURE_PAL



The comment on the #endif is wrong.

BruceForstall · 2017-02-03T00:06:49Z

src/jit/codegencommon.cpp

- genTypeStSz(TYP_LONG) + // longs/doubles may be transferred via stack, etc
- (compiler->compTailCallUsed ? 4 : 0))); // CORINFO_HELP_TAILCALL args
+ {
+ unsigned accStackDepth = compiler->fgPtrArgCntMax + // Max number of pointer-sized stack arguments.


accStackDepth [](start = 17, length = 13)

I don't know what the prefix "acc" means. Can you spell it out? Maybe, maxAllowedStackDepth?

I was thinking of "accumulation" but I'll go with maxAllowedStackDepth

BruceForstall · 2017-02-03T00:08:58Z

src/jit/codegenxarch.cpp

+ {
+ inst_IV(INS_push_hide, 0); // --- push REG_SIZE bytes of 0
+ }
+ // Note that the stack must always be aligned to STACK_ALIGN bytes



It was from @briansull :)

BruceForstall · 2017-02-03T00:45:03Z

src/jit/lclvars.cpp

+ }
+
+ // The stack must always be 16 byte aligned.
+ int adjustFrameSize = compLclFrameSize;


only true for unix/x86 case, so either state that, or move the comment within the #if that follows.

BruceForstall · 2017-02-03T00:56:30Z

I think I finally understand your code (which perhaps should be commented somewhere). You add a padStkAlign to the fgArgTabEntry array, and also padStkAlign to the fgArgInfo struct. When arg processing is done, you add up the size of the stack arguments by adding the numSlots field, then figure out how much alignment is required based on this. The alignment is stored both in the fgArgInfo->padStkAlign member and the padStkAlign member of the first fgArgTabEntry that has non-zero numSlots, which you assume (and maybe should assert) is a GT_PUTARG_STK node. Then, you generate the alignment codegen (sub sp) using the fgArgTabEntry value in genPutArgStk(). After the call returns, you use the value in the fgArgInfo struct to "pop off" the alignment you added.

It seems like this should work.

I wonder if a "cleaner" implementation would be to create a new GT_PUTARG_ALIGN node that contains the alignment, instead of overloading the GT_PUTARG_STK node and storing a possibly unused value in all the arg node call info structs. Anyway, I wouldn't worry about that possibility, since it has its own drawbacks.

seanshpark · 2017-02-03T01:20:10Z

which perhaps should be commented somewhere

OK, void fgArgInfo::ArgsAlignPadding() seems to be a place to add some explanation.

I wonder if a "cleaner" implementation would be to create a new GT_PUTARG_ALIGN node

It would be more risky for me for now and I was thinking enableFEATURE_FIXED_OUT_ARGS as other platforms. I got stuck with some segment faults so turned to this solution. I hope I could go back with this some day.

seanshpark · 2017-02-06T11:20:45Z

@briansull , @BruceForstall , is there anything else needed?

BruceForstall · 2017-02-06T23:41:36Z

src/jit/morph.cpp

+ // Set stack align pad for the first argument
+ // Padding value will be 3 to 1 when numSlots are from 1 to 3 or 5 to 7.
+ // we need extra '% numSlotsAligned' at the end for numSlots be multiple of 4
+ firstArgTabEntry->padStkAlign = (numSlotsAligned - (numSlots % numSlotsAligned)) % numSlotsAligned;


This should be:

firstArgTabEntry->padStkAlign = AlignmentPad(numSlots, numSlotsAligned);

BruceForstall · 2017-02-06T23:42:03Z

src/jit/morph.cpp

+ * making a "Call". After the Call, stack is re-adjusted to the value it
+ * was with fgArgInfo->padStkAlign value as we cann't use the one in
+ * fgArgTabEntry.
+ */


Please use standard function header comment, using // style

BruceForstall · 2017-02-06T23:43:11Z

A couple nits, but otherwise LGTM

Change JIT code to align stack in 16 byte used in modern compiler

seanshpark · 2017-02-07T01:26:50Z

@dotnet-bot test Ubuntu x64 Checked Build and Test please

BruceForstall · 2017-02-07T01:53:49Z

I can't decode any real failure: https://ci.dot.net/job/dotnet_coreclr/job/master/job/checked_osx_flow_prtest/2581/

try again.
@dotnet-bot test OSX x64 Checked Build and Test

seanshpark · 2017-02-07T04:05:59Z

@BruceForstall , thank you :)

[x86/Linux] Stack align 16 bytes for JIT code Commit migrated from dotnet/coreclr@b05cf50

dnfclas added the cla-already-signed label Jan 9, 2017

seanshpark force-pushed the fixstackalign branch 2 times, most recently from 77a3261 to 8ce3d21 Compare January 9, 2017 04:19

parjong reviewed Jan 9, 2017

View reviewed changes

briansull reviewed Jan 9, 2017

View reviewed changes

seanshpark mentioned this pull request Jan 10, 2017

[x86/Linux] Introduce UNIX_X86_ABI definition #8863

Merged

seanshpark force-pushed the fixstackalign branch 2 times, most recently from a6a7403 to 967a45b Compare January 10, 2017 11:56

seanshpark changed the title ~~[x86/Linux] Stack align 16 bytes for JIT code~~ [x86/Linux] WIP, Stack align 16 bytes for JIT code Jan 10, 2017

seanshpark force-pushed the fixstackalign branch from 967a45b to c918315 Compare January 11, 2017 02:05

seanshpark force-pushed the fixstackalign branch 3 times, most recently from c331058 to e2a6a13 Compare January 24, 2017 11:50

seanshpark force-pushed the fixstackalign branch 2 times, most recently from 22e0d96 to 0a47b2f Compare January 24, 2017 23:00

seanshpark force-pushed the fixstackalign branch 4 times, most recently from 0b6c7b7 to e7bf8c3 Compare January 25, 2017 00:30

BruceForstall reviewed Feb 2, 2017

View reviewed changes

BruceForstall reviewed Feb 3, 2017

View reviewed changes

seanshpark force-pushed the fixstackalign branch 2 times, most recently from 79f184a to 6a0f94c Compare February 3, 2017 00:03

BruceForstall reviewed Feb 3, 2017

View reviewed changes

seanshpark force-pushed the fixstackalign branch from 6a0f94c to 6f4f1de Compare February 3, 2017 00:29

BruceForstall reviewed Feb 3, 2017

View reviewed changes

seanshpark force-pushed the fixstackalign branch 2 times, most recently from a409925 to a1d61f6 Compare February 3, 2017 02:15

seanshpark force-pushed the fixstackalign branch from a1d61f6 to 0c74929 Compare February 6, 2017 23:24

BruceForstall reviewed Feb 6, 2017

View reviewed changes

seanshpark force-pushed the fixstackalign branch 2 times, most recently from abe75ee to 5df4528 Compare February 7, 2017 00:01

[x86/Linux] Stack align 16 bytes for JIT code

5df4528

Change JIT code to align stack in 16 byte used in modern compiler

BruceForstall merged commit b05cf50 into dotnet:master Feb 7, 2017

seanshpark deleted the fixstackalign branch February 9, 2017 11:37

karelz modified the milestone: 2.0.0 Aug 28, 2017

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Merge pull request dotnet/coreclr#8849 from seanshpark/fixstackalign

c06485b

[x86/Linux] Stack align 16 bytes for JIT code Commit migrated from dotnet/coreclr@b05cf50

[x86/Linux] Stack align 16 bytes for JIT code #8849

[x86/Linux] Stack align 16 bytes for JIT code #8849

Conversation

seanshpark commented Jan 9, 2017

seanshpark commented Jan 9, 2017

seanshpark commented Jan 9, 2017

parjong Jan 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janvorli Jan 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janvorli commented Jan 9, 2017

jkotas commented Jan 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

briansull Jan 9, 2017 • edited Loading

Choose a reason for hiding this comment

seanshpark commented Jan 10, 2017

seanshpark commented Jan 10, 2017

seanshpark commented Jan 10, 2017

seanshpark commented Jan 11, 2017

seanshpark commented Jan 24, 2017

seanshpark commented Jan 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BruceForstall commented Feb 3, 2017

seanshpark commented Feb 3, 2017

seanshpark commented Feb 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BruceForstall commented Feb 6, 2017

seanshpark commented Feb 7, 2017

BruceForstall commented Feb 7, 2017

seanshpark commented Feb 7, 2017

parjong Jan 9, 2017 •

edited

Loading

janvorli Jan 9, 2017 •

edited

Loading

briansull Jan 9, 2017 •

edited

Loading