Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[x86/Linux] Stack align 16 bytes for JIT code #8849

Merged
merged 1 commit into from
Feb 7, 2017

Conversation

seanshpark
Copy link

Change JIT code to align stack in 16 byte used in modern compiler

@seanshpark
Copy link
Author

CC @parjong

@seanshpark seanshpark force-pushed the fixstackalign branch 2 times, most recently from 77a3261 to 8ce3d21 Compare January 9, 2017 04:19
@seanshpark
Copy link
Author

Fixes #8590
@jkotas , @janvorli , PTAL

src/jit/target.h Outdated
@@ -484,9 +484,15 @@ typedef unsigned short regPairNoSmall; // arm: need 12 bits
#define MIN_ARG_AREA_FOR_CALL 0 // Minimum required outgoing argument space for a call.

#define CODE_ALIGN 1 // code alignment requirement
#if !defined(FEATURE_PAL)
#define STACK_ALIGN 4 // stack alignment requirement
Copy link

@parjong parjong Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 16-byte stack alignment requirement comes from GCC/Clang (as discussed in #8590). I'm not sure which flag will be better: __GNUC__ or FEATURE_PAL.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the stack alignment requirement part of the ABI? Perhaps there should be a UNIX_X86_ABI, similar to UNIX_AMD64_ABI and UNIX_ARM_ABI?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, it is not a part of standard ABI. It is just a requirement of GCC (> 4.5) and corresponding Clang.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parjong. Good point. __GNUC__ vs FEATURE_PAL for this case.

Copy link
Member

@janvorli janvorli Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in this case, __GNUC__ would be the right choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. From https://en.wikipedia.org/wiki/X86_calling_conventions: In Linux, GCC sets the de facto standard for calling conventions. Since GCC version 4.5, the stack must be aligned to a 16-byte boundary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GNUC is not cross-compilation friendly though: It describes the host compiler; not the target environment. Introducing UNIX_X86_ABI maybe be best.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNIX_X86_ABI maybe be best.

Thank you!. I'll add this to another PR and later rebase this patch. I think this PR will take some time :)

@janvorli
Copy link
Member

janvorli commented Jan 9, 2017

CC: @dotnet/jit-contrib

@jkotas
Copy link
Member

jkotas commented Jan 9, 2017

Do you see the stack actually aligned with this change? E.g. replace the code to pad the stack in ThePreStub with check that the stack is aligned - does it work?

The x86 calling convention is stdcall (=callee pop) and so you may need to pad each callsite by variable pad instead or in addition to this.

// Push a single 4-byte zero. This matches the 4-byte STACK_ALIGN value.
static_assert_no_msg(STACK_ALIGN == REGSIZE_BYTES);
inst_IV(INS_push_hide, 0); // --- push 4-byte 0
#else
// Push four 4-byte zeros. This matches the 16-byte STACK_ALIGN value.
static_assert_no_msg(STACK_ALIGN == REGSIZE_BYTES * 4);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it is better to replace all of these #ifdefs with a loop that pushes STACK_ALIGN bytes of zeros onto the stack.

static_assert_no_msg((STACK_ALIGN % REGSIZE_BYTES) == 0));
unsigned const count = (STACK_ALIGN / REGSIZE_BYTES);

for (unsigned i=0; i<count; i++)
{
 inst_IV(INS_push_hide, 0); // --- push REG_SIZE bytes of 0 
}
// Note that the stack must always be aligned to STACK_ALIGN bytes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comment. It's more clean and nice.

// We add 16 so compLclFrameSize is still a multiple of 16.
lvaIncrementFrameSize(STACK_ALIGN);
}
assert((compLclFrameSize % STACK_ALIGN) == 0);
Copy link

@briansull briansull Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this need to be written differently:
First if it isn't the final frame layout phase you have to add on the maximum pad of 12 bytes.
Then after that you have to align compLclFrameSize to a multiple of 16.
This can probably be unconditionally as well: (not inside the #else)

if (STACK_ALIGN >  REGSIZE_BYTES)
{
   if (lvaDoneFrameLayout != FINAL_FRAME_LAYOUT) 
   {
      // If we are not doing final layout, we don't know the exact value of compLclFrameSize  
      // and thus do not know how much we will need to add in order to be aligned.  
      // We add the maximum pad that we could ever have (which is 12)
      lvaIncrementFrameSize(STACK_ALIGN - REGSIZE_BYTES);
   }

   // The stack must always be 16 byte aligned.  
   if ((compLclFrameSize % STACK_ALIGN) != 0)  
   {  
       lvaIncrementFrameSize(STACK_ALIGN - (compLclFrameSize % STACK_ALIGN));  
   }
}

@seanshpark
Copy link
Author

@jkotas , thank you for the comment!

Do you see the stack actually aligned with this change? E.g. replace the code to pad the stack in ThePreStub with check that the stack is aligned - does it work?

I've checked with one unit test case that failed when alignment, but I think it's better to check this in the ThePreStub. I'll do this with @briansull advised code.

The x86 calling convention is stdcall (=callee pop) and so you may need to pad each callsite by variable pad instead or in addition to this.

Yes, I also think so. I've done a unit test compare with 4 byte alignment in toolchain.cmake patch vs this patch and got about 8400 pass vs 8000 pass. So it seems 400 test cases failed are related with stack alignment. I'll do these fixes after this PR, one by one.

@seanshpark seanshpark force-pushed the fixstackalign branch 2 times, most recently from a6a7403 to 967a45b Compare January 10, 2017 11:56
@seanshpark
Copy link
Author

With 967a45b patch, I've done a small test with gentogen01.exe unit test program. I've added a break point at ThePreStub and watched ESP value when entered at right before PUSH EBP command. Normally if it is aligned with 16 byte, stack should hold a RET command and thus address should be something ending with c address like this.

1008	    STUB_PROLOG
=> 0xb70ddc10 <ThePreStub+0>:	55	push   %ebp
(gdb) info register esp
esp            0xbfffe13c	0xbfffe13c

If this is somewhat incorrect, please let me know. I'll start over again.

This method is actually System.AppDomain:SetupDomain(bool,ref,ref,ref,ref):this and the assembly JITted code started like this.

G_M46019_IG01:
       55           push     ebp
       8BEC         mov      ebp, esp
       57           push     edi
       56           push     esi
       83EC30       sub      esp, 48

As there is three push from the start and 48 substraction, it is aligned to 16 when some call is made.

And then the when entering the second JITted code which is System.Threading.Monitor:Enter(ref,byref), it starts like

G_M10401_IG01:
       55           push     ebp
       8BEC         mov      ebp, esp
       83EC10       sub      esp, 16
       894DF8       mov      gword ptr [ebp-08H], ecx
       8955F4       mov      bword ptr [ebp-0CH], edx

and as there is no two pushes compared to SetupDomain method and as of result, it is not aligned to 16 when calling to the next function which is ThePreStub.

1008	    STUB_PROLOG
=> 0xb70ddc10 <ThePreStub+0>:	55	push   %ebp
(gdb) info register esp
esp            0xbfffe124	0xbfffe124
(gdb) bt
#0  ThePreStub () at /home/maxwell/netcore/coreclr/src/vm/i386/asmhelpers.S:1008
#1  0xb5a30201 in ?? ()
#2  0xb5a30103 in ?? ()
#3  0xb70dda27 in CallDescrWorkerInternal () at /home/maxwell/netcore/coreclr/src/vm/i386/asmhelpers.S:444

It seems that pushing two registers somehow gave an 16 byte alignment but not for the second JIT.
So I think saving registers like push esi, push edi needs adjustment with lvaIncrementFrameSize.
I would like to know where to look in to fix this. Anyone knows where, please help me or if I'm wrong it's ok to point it out.

@seanshpark seanshpark changed the title [x86/Linux] Stack align 16 bytes for JIT code [x86/Linux] WIP, Stack align 16 bytes for JIT code Jan 10, 2017
@seanshpark
Copy link
Author

I need to update again to apply UNIX_X86_ABI after #8863 lands

@seanshpark
Copy link
Author

I think I've found what I was thinking: compCalleeRegsPushed

@seanshpark
Copy link
Author

Some changes are from Windows_NT x64 Formatting

@seanshpark seanshpark force-pushed the fixstackalign branch 2 times, most recently from 22e0d96 to 0a47b2f Compare January 24, 2017 23:00
@seanshpark
Copy link
Author

@janvorli , could you please help me with build break on Windows? I cann't find the reason. Line 3175 is empty but says expanded from the macro.

3172:    compiler->unwindEmit(*codePtr, coldCodePtr);
3173:
3174:    /* Finalize the line # tracking logic after we know the exact block sizes/offsets */
3175:
3176:    genIPmappingGen();
d:\j\workspace\x64_release_w---0575cb46\src\jit\codegencommon.cpp(3175): error C2121: '#': invalid character: possibly the result of a macro expansion [D:\j\workspace\x64_release_w---0575cb46\bin\obj\Windows_NT.x64.Release\src\jit\crossgen\clrjit_crossgen.vcxproj]
15:04:50 d:\j\workspace\x64_release_w---0575cb46\src\jit\codegencommon.cpp(3175): error C2059: syntax error: 'if' [D:\j\workspace\x64_release_w---0575cb46\bin\obj\Windows_NT.x64.Release\src\jit\crossgen\clrjit_crossgen.vcxproj]
15:04:50 d:\j\workspace\x64_release_w---0575cb46\src\jit\codegencommon.cpp(3175): error C2143: syntax error: missing ';' before '{' [D:\j\workspace\x64_release_w---0575cb46\bin\obj\Windows_NT.x64.Release\src\jit\crossgen\clrjit_crossgen.vcxproj]

@seanshpark seanshpark force-pushed the fixstackalign branch 4 times, most recently from 0b6c7b7 to e7bf8c3 Compare January 25, 2017 00:30
@@ -1262,6 +1265,9 @@ class fgArgInfo
unsigned argCount; // Updatable arg count value
unsigned nextSlotNum; // Updatable slot count value
unsigned stkLevel; // Stack depth when we make this call (for x86)
#if defined(UNIX_X86_ABI)
unsigned padStkAlign; // Count of padding for stack alignment
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment could be improved: Is this bytes of padding? Where is the padding? Is this a sum of all the padding for each individual argument (since fgArgTabEntryPtr also has a padding field), or does it represent some pre-/post- argument padding?

#if defined(UNIX_X86_ABI)
void fgArgInfo::ArgsAlignPadding()
{
// To get the padding amount, sum up all the slots and get the remainer for padding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remainer [](start = 67, length = 8)

typo: remainer => remainder

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you :)

if (firstArgTabEntry != nullptr)
{
firstArgTabEntry->padStkAlign = (numSlotsAligned - (numSlots % numSlotsAligned)) % numSlotsAligned;
this->padStkAlign = firstArgTabEntry->padStkAlign;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this calculation. Can you explain? What are you trying to compute?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get remainder of slots to align by, multiple of, 4(for x86/linux). For example, if numSlots is 7, we need one more to make it 8, which will make . So, 4 - (7 % 4) = 1. It seems % numSlotsAligned at the end isn't needed. What was I thinking...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use AlignmentPad(numSlots, numSlotsAligned)


In reply to: 99253550 [](ancestors = 99253550)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When numSlots is 8, it would be 4 - (8 % 4) = 4. If this happens, we don't need to add as it is already aligned. So we need the last % numSlotsAligned

src/jit/target.h Outdated
#define STACK_ALIGN_SHIFT_ALL 4 // Shift-right amount to convert stack size in bytes to size in STACK_ALIGN units
#define STACK_ALIGN_PADDING 16 // Shift-right amount for padding and rest for offset
#define STACK_ALIGN_STKOFFSET ((1<<STACK_ALIGN_PADDING)-1)
#endif // !FEATURE_PAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on the #endif is wrong.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@seanshpark seanshpark force-pushed the fixstackalign branch 2 times, most recently from 79f184a to 6a0f94c Compare February 3, 2017 00:03
genTypeStSz(TYP_LONG) + // longs/doubles may be transferred via stack, etc
(compiler->compTailCallUsed ? 4 : 0))); // CORINFO_HELP_TAILCALL args
{
unsigned accStackDepth = compiler->fgPtrArgCntMax + // Max number of pointer-sized stack arguments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accStackDepth [](start = 17, length = 13)

I don't know what the prefix "acc" means. Can you spell it out? Maybe, maxAllowedStackDepth?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of "accumulation" but I'll go with maxAllowedStackDepth

{
inst_IV(INS_push_hide, 0); // --- push REG_SIZE bytes of 0
}
// Note that the stack must always be aligned to STACK_ALIGN bytes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was from @briansull :)

}

// The stack must always be 16 byte aligned.
int adjustFrameSize = compLclFrameSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only true for unix/x86 case, so either state that, or move the comment within the #if that follows.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@BruceForstall
Copy link
Member

I think I finally understand your code (which perhaps should be commented somewhere). You add a padStkAlign to the fgArgTabEntry array, and also padStkAlign to the fgArgInfo struct. When arg processing is done, you add up the size of the stack arguments by adding the numSlots field, then figure out how much alignment is required based on this. The alignment is stored both in the fgArgInfo->padStkAlign member and the padStkAlign member of the first fgArgTabEntry that has non-zero numSlots, which you assume (and maybe should assert) is a GT_PUTARG_STK node. Then, you generate the alignment codegen (sub sp) using the fgArgTabEntry value in genPutArgStk(). After the call returns, you use the value in the fgArgInfo struct to "pop off" the alignment you added.

It seems like this should work.

I wonder if a "cleaner" implementation would be to create a new GT_PUTARG_ALIGN node that contains the alignment, instead of overloading the GT_PUTARG_STK node and storing a possibly unused value in all the arg node call info structs. Anyway, I wouldn't worry about that possibility, since it has its own drawbacks.

@seanshpark
Copy link
Author

which perhaps should be commented somewhere

OK, void fgArgInfo::ArgsAlignPadding() seems to be a place to add some explanation.

I wonder if a "cleaner" implementation would be to create a new GT_PUTARG_ALIGN node

It would be more risky for me for now and I was thinking enableFEATURE_FIXED_OUT_ARGS as other platforms. I got stuck with some segment faults so turned to this solution. I hope I could go back with this some day.

@seanshpark seanshpark force-pushed the fixstackalign branch 2 times, most recently from a409925 to a1d61f6 Compare February 3, 2017 02:15
@seanshpark
Copy link
Author

@briansull , @BruceForstall , is there anything else needed?

// Set stack align pad for the first argument
// Padding value will be 3 to 1 when numSlots are from 1 to 3 or 5 to 7.
// we need extra '% numSlotsAligned' at the end for numSlots be multiple of 4
firstArgTabEntry->padStkAlign = (numSlotsAligned - (numSlots % numSlotsAligned)) % numSlotsAligned;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be:

firstArgTabEntry->padStkAlign = AlignmentPad(numSlots, numSlotsAligned);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

* making a "Call". After the Call, stack is re-adjusted to the value it
* was with fgArgInfo->padStkAlign value as we cann't use the one in
* fgArgTabEntry.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use standard function header comment, using // style

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sure

@BruceForstall
Copy link
Member

A couple nits, but otherwise LGTM

@seanshpark seanshpark force-pushed the fixstackalign branch 2 times, most recently from abe75ee to 5df4528 Compare February 7, 2017 00:01
Change JIT code to align stack in 16 byte used in modern compiler
@seanshpark
Copy link
Author

@dotnet-bot test Ubuntu x64 Checked Build and Test please

@BruceForstall
Copy link
Member

I can't decode any real failure: https://ci.dot.net/job/dotnet_coreclr/job/master/job/checked_osx_flow_prtest/2581/

try again.
@dotnet-bot test OSX x64 Checked Build and Test

@seanshpark
Copy link
Author

@BruceForstall , thank you :)

@BruceForstall BruceForstall merged commit b05cf50 into dotnet:master Feb 7, 2017
@seanshpark seanshpark deleted the fixstackalign branch February 9, 2017 11:37
@karelz karelz modified the milestone: 2.0.0 Aug 28, 2017
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
[x86/Linux] Stack align 16 bytes for JIT code

Commit migrated from dotnet/coreclr@b05cf50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants