-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[RyuJIT/ARM32] Enabling fast tail call feature #14056
Conversation
hseok-oh
commented
Sep 19, 2017
•
edited
Loading
edited
- Not use fast tail call when callee use floating point register argument (difficult to calculate stack size)
- Not use fast tail call when callee use split struct argument
- Fix importer to compare return type when we check tail call
- Fix codegen bug: ARM32 not support INS_br (use INS_bx)
#13897 needs to merge first. Without that PR, we need more complicate calculations for arm32 in cc/ @dotnet/arm32-contrib |
related issue: #13828 Now it can handle when
|
fyi @dotnet/jit-contrib |
1b1b4f4
to
c587ac6
Compare
src/jit/codegencommon.cpp
Outdated
// Fast tail call. | ||
// Call target = REG_R12. | ||
// Do we need a special encoding for stack walker like rex.w prefix for x64? | ||
getEmitter()->emitIns_R_R(INS_mov, emitTypeSize(TYP_I_IMPL), REG_PC, REG_R12); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use bx r12
to execute the tail call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. I'll change it.
@hseok-oh What's next for this PR? |
@BruceForstall l'll make separate PRs to enable fast tail call, and close this PR. |
c587ac6
to
dcd6e2d
Compare
@dotnet-bot test Windows_NT x86_arm_altjit Checked Build and Test |
dcd6e2d
to
9f23b52
Compare
@dotnet-bot test Windows_NT x86_arm_altjit Checked Build and Test |
9f23b52
to
03679e8
Compare
@dotnet-bot test Windows_NT x86_arm_altjit Checked Build and Test |
03679e8
to
c7cd0b4
Compare
@dotnet-bot test Windows_NT x86_arm_altjit Checked Build and Test |
@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test |
Looks like there are still issues (from the x86_arm_altjit run): https://ci.dot.net/job/dotnet_coreclr/job/master/job/x86_arm_altjit_checked_windows_nt_prtest/22/
|
@dotnet-bot test Windows_NT x86_arm_altjit Checked tailcallstress |
9d05aa1
to
2c62e49
Compare
@dotnet/arm32-contrib @BruceForstall @jashook
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the work. Please update the comment above fgCanFastTailCall to include what is supported on arm32 and what is not.
Please see https://github.com/dotnet/coreclr/blob/master/src/jit/morph.cpp#L7145 for examples.
Also the change does not make use of the existing logic to count regArgs and then make a decision at the end of the function based on the count of callee vs caller arguments. Instead it relies on returning earlier before hitting that code path. In my opinion this will make this already cumbersome function harder to review/edit in the future as it does not follow entirely the old paradigm.
|
||
argAlign = roundUp(argAlign, TARGET_POINTER_SIZE) / TARGET_POINTER_SIZE; | ||
|
||
// We don't care float register because we will not use fast tailcall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it is worth tracking fastTailCall support for callee functions with floating point. Is there an issue to track adding support for this later?
In addition what is the rational for not implementing it now?
argAlign = roundUp(argAlign, TARGET_POINTER_SIZE) / TARGET_POINTER_SIZE; | ||
|
||
// We don't care float register because we will not use fast tailcall | ||
// for callee method using float register |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Do not track floating point registers for arm32. It is NYI.
|
||
if (size > 1) | ||
{ | ||
// hasTwoSlotSizedStruct will determine if the struct value can be passed multiple slot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: two spaces in between passed and multiple
#ifdef _TARGET_ARM_ | ||
if (varTypeIsFloating(argx)) | ||
{ | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add logging to this return using reportFastTailCall decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also adding some sort of NYI or comment here explaining why this returns false would be nice.
// fastTailCall. This is an implementation limitation | ||
// where the callee only is checked for non enregisterable structs. | ||
// It is tracked with https://github.com/dotnet/coreclr/issues/12644. | ||
hasMultiByteStackArgs = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be using hasMultByteStackArgs in a undesired way as this struct can be passed in registers, there are just more callee arguments and we have to spill. Plus, I am not a big fan of the original hasMultiByteStackArgs code path I would like to avoid building on it.
I think the preferred way of dealing with the case of calleeArgRegCount >= MAX_REG_ARG && hasTwoSlotSizedStruct is here: https://github.com/dotnet/coreclr/blob/master/src/jit/morph.cpp#L7524.
} | ||
unsigned size = genTypeStSz(argx->gtType); | ||
|
||
varTypeIsFloating(argx) ? calleeFloatArgRegCount += size : calleeArgRegCount += size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems better to just have this as:
calleeArgRegCount += size
@jashook Sorry for late feedback. I'll fix it soon. |
@jashook I've updated.
PTAL |
@dotnet-bot test Ubuntu x64 Innerloop Formatting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking a little to get to this thank you for the changes.
@BruceForstall ptal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely familiar with the fast tail call implementation, but the changes look good to me.
@dotnet-bot test this please |
@dotnet-bot test this please |
@dotnet-bot test Windows_NT arm Cross Checked jitstress1 Build and Test |
armlb failure is known. @dotnet-bot test Windows_NT x86_arm_altjit Checked tailcallstress |
Looks like a lot of test failures need to be investigated. In a short survey, I saw a lot of access violations in the JitStress runs, and VM asserts in the TailcallStress modes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test failures need to be investigated/fixed.
- Not use fasttailcall when callee use floating point register argument (difficult to calculate stack size) - Not use fasttailcall when callee use split struct argument - Fix importer to compare return type when we check tail call - Fix codegen bug: ARM32 not support INS_br (use INS_bx)
Add comment in fgCanFastTailCall for ARM32
5d6ebf5
to
afcdc72
Compare
@dotnet-bot test Windows_NT arm Cross Checked jitstress1 Build and Test |
@dotnet-bot test Windows_NT arm Cross Checked jitstress1 Build and Test |
Waiting until #16039 is resolved before reviewing this again. |
@hseok-oh Can you fix the conflict so we can trigger retesting? @alpencolt Are you taking this over? |
@BruceForstall I'll take this one too. |
@BruceForstall I've rebased PR but cannot push to GitHub:
It looks there is not enough permission or may be I do something wrong? |
@alpencolt You should just push to your fork and create a new PR. Then close this one (with a link/reference to your new one). |