-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[x86/Linux] Stack align 16 bytes for JIT code #8849
Conversation
CC @parjong |
77a3261
to
8ce3d21
Compare
src/jit/target.h
Outdated
@@ -484,9 +484,15 @@ typedef unsigned short regPairNoSmall; // arm: need 12 bits | |||
#define MIN_ARG_AREA_FOR_CALL 0 // Minimum required outgoing argument space for a call. | |||
|
|||
#define CODE_ALIGN 1 // code alignment requirement | |||
#if !defined(FEATURE_PAL) | |||
#define STACK_ALIGN 4 // stack alignment requirement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 16-byte stack alignment requirement comes from GCC/Clang (as discussed in #8590). I'm not sure which flag will be better: __GNUC__
or FEATURE_PAL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the stack alignment requirement part of the ABI? Perhaps there should be a UNIX_X86_ABI, similar to UNIX_AMD64_ABI and UNIX_ARM_ABI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, it is not a part of standard ABI. It is just a requirement of GCC (> 4.5) and corresponding Clang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parjong. Good point. __GNUC__
vs FEATURE_PAL
for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in this case, __GNUC__
would be the right choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. From https://en.wikipedia.org/wiki/X86_calling_conventions: In Linux, GCC sets the de facto standard for calling conventions. Since GCC version 4.5, the stack must be aligned to a 16-byte boundary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GNUC is not cross-compilation friendly though: It describes the host compiler; not the target environment. Introducing UNIX_X86_ABI
maybe be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UNIX_X86_ABI
maybe be best.
Thank you!. I'll add this to another PR and later rebase this patch. I think this PR will take some time :)
CC: @dotnet/jit-contrib |
Do you see the stack actually aligned with this change? E.g. replace the code to pad the stack in ThePreStub with check that the stack is aligned - does it work? The x86 calling convention is stdcall (=callee pop) and so you may need to pad each callsite by variable pad instead or in addition to this. |
src/jit/codegenxarch.cpp
Outdated
// Push a single 4-byte zero. This matches the 4-byte STACK_ALIGN value. | ||
static_assert_no_msg(STACK_ALIGN == REGSIZE_BYTES); | ||
inst_IV(INS_push_hide, 0); // --- push 4-byte 0 | ||
#else | ||
// Push four 4-byte zeros. This matches the 16-byte STACK_ALIGN value. | ||
static_assert_no_msg(STACK_ALIGN == REGSIZE_BYTES * 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it is better to replace all of these #ifdefs with a loop that pushes STACK_ALIGN bytes of zeros onto the stack.
static_assert_no_msg((STACK_ALIGN % REGSIZE_BYTES) == 0));
unsigned const count = (STACK_ALIGN / REGSIZE_BYTES);
for (unsigned i=0; i<count; i++)
{
inst_IV(INS_push_hide, 0); // --- push REG_SIZE bytes of 0
}
// Note that the stack must always be aligned to STACK_ALIGN bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment. It's more clean and nice.
src/jit/lclvars.cpp
Outdated
// We add 16 so compLclFrameSize is still a multiple of 16. | ||
lvaIncrementFrameSize(STACK_ALIGN); | ||
} | ||
assert((compLclFrameSize % STACK_ALIGN) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this need to be written differently:
First if it isn't the final frame layout phase you have to add on the maximum pad of 12 bytes.
Then after that you have to align compLclFrameSize to a multiple of 16.
This can probably be unconditionally as well: (not inside the #else)
if (STACK_ALIGN > REGSIZE_BYTES)
{
if (lvaDoneFrameLayout != FINAL_FRAME_LAYOUT)
{
// If we are not doing final layout, we don't know the exact value of compLclFrameSize
// and thus do not know how much we will need to add in order to be aligned.
// We add the maximum pad that we could ever have (which is 12)
lvaIncrementFrameSize(STACK_ALIGN - REGSIZE_BYTES);
}
// The stack must always be 16 byte aligned.
if ((compLclFrameSize % STACK_ALIGN) != 0)
{
lvaIncrementFrameSize(STACK_ALIGN - (compLclFrameSize % STACK_ALIGN));
}
}
@jkotas , thank you for the comment!
I've checked with one unit test case that failed when alignment, but I think it's better to check this in the
Yes, I also think so. I've done a unit test compare with 4 byte alignment in |
a6a7403
to
967a45b
Compare
With 967a45b patch, I've done a small test with
If this is somewhat incorrect, please let me know. I'll start over again. This method is actually
As there is three And then the when entering the second JITted code which is
and as there is no two pushes compared to
It seems that pushing two registers somehow gave an 16 byte alignment but not for the second JIT. |
I need to update again to apply |
I think I've found what I was thinking: |
967a45b
to
c918315
Compare
c331058
to
e2a6a13
Compare
Some changes are from |
22e0d96
to
0a47b2f
Compare
@janvorli , could you please help me with build break on Windows? I cann't find the reason. Line 3175 is empty but says expanded from the macro.
|
0b6c7b7
to
e7bf8c3
Compare
src/jit/compiler.h
Outdated
@@ -1262,6 +1265,9 @@ class fgArgInfo | |||
unsigned argCount; // Updatable arg count value | |||
unsigned nextSlotNum; // Updatable slot count value | |||
unsigned stkLevel; // Stack depth when we make this call (for x86) | |||
#if defined(UNIX_X86_ABI) | |||
unsigned padStkAlign; // Count of padding for stack alignment | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment could be improved: Is this bytes of padding? Where is the padding? Is this a sum of all the padding for each individual argument (since fgArgTabEntryPtr also has a padding field), or does it represent some pre-/post- argument padding?
src/jit/morph.cpp
Outdated
#if defined(UNIX_X86_ABI) | ||
void fgArgInfo::ArgsAlignPadding() | ||
{ | ||
// To get the padding amount, sum up all the slots and get the remainer for padding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remainer [](start = 67, length = 8)
typo: remainer => remainder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you :)
src/jit/morph.cpp
Outdated
if (firstArgTabEntry != nullptr) | ||
{ | ||
firstArgTabEntry->padStkAlign = (numSlotsAligned - (numSlots % numSlotsAligned)) % numSlotsAligned; | ||
this->padStkAlign = firstArgTabEntry->padStkAlign; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this calculation. Can you explain? What are you trying to compute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get remainder of slots to align by, multiple of, 4(for x86/linux). For example, if numSlots is 7, we need one more to make it 8, which will make . So, 4 - (7 % 4) = 1. It seems % numSlotsAligned
at the end isn't needed. What was I thinking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use AlignmentPad(numSlots, numSlotsAligned)
In reply to: 99253550 [](ancestors = 99253550)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When numSlots is 8, it would be 4 - (8 % 4) = 4. If this happens, we don't need to add as it is already aligned. So we need the last % numSlotsAligned
src/jit/target.h
Outdated
#define STACK_ALIGN_SHIFT_ALL 4 // Shift-right amount to convert stack size in bytes to size in STACK_ALIGN units | ||
#define STACK_ALIGN_PADDING 16 // Shift-right amount for padding and rest for offset | ||
#define STACK_ALIGN_STKOFFSET ((1<<STACK_ALIGN_PADDING)-1) | ||
#endif // !FEATURE_PAL | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on the #endif
is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
79f184a
to
6a0f94c
Compare
src/jit/codegencommon.cpp
Outdated
genTypeStSz(TYP_LONG) + // longs/doubles may be transferred via stack, etc | ||
(compiler->compTailCallUsed ? 4 : 0))); // CORINFO_HELP_TAILCALL args | ||
{ | ||
unsigned accStackDepth = compiler->fgPtrArgCntMax + // Max number of pointer-sized stack arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accStackDepth [](start = 17, length = 13)
I don't know what the prefix "acc" means. Can you spell it out? Maybe, maxAllowedStackDepth
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of "accumulation" but I'll go with maxAllowedStackDepth
{ | ||
inst_IV(INS_push_hide, 0); // --- push REG_SIZE bytes of 0 | ||
} | ||
// Note that the stack must always be aligned to STACK_ALIGN bytes | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was from @briansull :)
6a0f94c
to
6f4f1de
Compare
src/jit/lclvars.cpp
Outdated
} | ||
|
||
// The stack must always be 16 byte aligned. | ||
int adjustFrameSize = compLclFrameSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only true for unix/x86 case, so either state that, or move the comment within the #if
that follows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
I think I finally understand your code (which perhaps should be commented somewhere). You add a It seems like this should work. I wonder if a "cleaner" implementation would be to create a new GT_PUTARG_ALIGN node that contains the alignment, instead of overloading the GT_PUTARG_STK node and storing a possibly unused value in all the arg node call info structs. Anyway, I wouldn't worry about that possibility, since it has its own drawbacks. |
OK,
It would be more risky for me for now and I was thinking enable |
a409925
to
a1d61f6
Compare
@briansull , @BruceForstall , is there anything else needed? |
a1d61f6
to
0c74929
Compare
src/jit/morph.cpp
Outdated
// Set stack align pad for the first argument | ||
// Padding value will be 3 to 1 when numSlots are from 1 to 3 or 5 to 7. | ||
// we need extra '% numSlotsAligned' at the end for numSlots be multiple of 4 | ||
firstArgTabEntry->padStkAlign = (numSlotsAligned - (numSlots % numSlotsAligned)) % numSlotsAligned; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be:
firstArgTabEntry->padStkAlign = AlignmentPad(numSlots, numSlotsAligned);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
src/jit/morph.cpp
Outdated
* making a "Call". After the Call, stack is re-adjusted to the value it | ||
* was with fgArgInfo->padStkAlign value as we cann't use the one in | ||
* fgArgTabEntry. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use standard function header comment, using //
style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sure
A couple nits, but otherwise LGTM |
abe75ee
to
5df4528
Compare
Change JIT code to align stack in 16 byte used in modern compiler
@dotnet-bot test Ubuntu x64 Checked Build and Test please |
I can't decode any real failure: https://ci.dot.net/job/dotnet_coreclr/job/master/job/checked_osx_flow_prtest/2581/ try again. |
@BruceForstall , thank you :) |
[x86/Linux] Stack align 16 bytes for JIT code Commit migrated from dotnet/coreclr@b05cf50
Change JIT code to align stack in 16 byte used in modern compiler