Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[RyuJIT/ARM32] Enabling fast tail call feature #14056

Closed
wants to merge 2 commits into from

Conversation

hseok-oh
Copy link

@hseok-oh hseok-oh commented Sep 19, 2017

  • Not use fast tail call when callee use floating point register argument (difficult to calculate stack size)
  • Not use fast tail call when callee use split struct argument
  • Fix importer to compare return type when we check tail call
  • Fix codegen bug: ARM32 not support INS_br (use INS_bx)

@hseok-oh
Copy link
Author

hseok-oh commented Sep 19, 2017

#13897 needs to merge first. Without that PR, we need more complicate calculations for arm32 in lvaInitTypeRef() to calculate caller's argument stack & register size.

cc/ @dotnet/arm32-contrib

@hseok-oh
Copy link
Author

related issue: #13828

Now it can handle when

  • callee don't have struct argument
  • callee don't have floating point register argument

@jashook jashook self-requested a review September 19, 2017 15:47
@jashook
Copy link

jashook commented Sep 19, 2017

fyi @dotnet/jit-contrib

// Fast tail call.
// Call target = REG_R12.
// Do we need a special encoding for stack walker like rex.w prefix for x64?
getEmitter()->emitIns_R_R(INS_mov, emitTypeSize(TYP_I_IMPL), REG_PC, REG_R12);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to use bx r12 to execute the tail call?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. I'll change it.

@jashook jashook added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Sep 27, 2017
@BruceForstall
Copy link
Member

@hseok-oh What's next for this PR?

@hseok-oh
Copy link
Author

@BruceForstall l'll make separate PRs to enable fast tail call, and close this PR.

@hseok-oh hseok-oh force-pushed the ryujit/fasttailcall branch from c587ac6 to dcd6e2d Compare October 23, 2017 05:19
@hseok-oh hseok-oh changed the title [RyuJIT/ARM32] [WIP] [DO NOT MERGE] Enabling fast tail call feature [RyuJIT/ARM32] Enabling fast tail call feature Oct 23, 2017
@hseok-oh
Copy link
Author

@dotnet-bot test Windows_NT x86_arm_altjit Checked Build and Test

@hseok-oh hseok-oh force-pushed the ryujit/fasttailcall branch from dcd6e2d to 9f23b52 Compare October 23, 2017 09:32
@hseok-oh
Copy link
Author

@dotnet-bot test Windows_NT x86_arm_altjit Checked Build and Test

@hseok-oh hseok-oh changed the title [RyuJIT/ARM32] Enabling fast tail call feature [RyuJIT/ARM32] [WIP] Enabling fast tail call feature Oct 23, 2017
@hseok-oh hseok-oh force-pushed the ryujit/fasttailcall branch from 9f23b52 to 03679e8 Compare October 30, 2017 08:49
@hseok-oh
Copy link
Author

@dotnet-bot test Windows_NT x86_arm_altjit Checked Build and Test

@hseok-oh hseok-oh force-pushed the ryujit/fasttailcall branch from 03679e8 to c7cd0b4 Compare November 1, 2017 02:00
@hseok-oh
Copy link
Author

hseok-oh commented Nov 1, 2017

@dotnet-bot test Windows_NT x86_arm_altjit Checked Build and Test

@hseok-oh
Copy link
Author

hseok-oh commented Nov 2, 2017

@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test
@dotnet-bot test Windows_NT x86_arm_altjit Checked Build and Test
@dotnet-bot test Windows_NT x64 Innerloop Formatting
@dotnet-bot test Ubuntu x64 Innerloop Formatting

@hseok-oh hseok-oh changed the title [RyuJIT/ARM32] [WIP] Enabling fast tail call feature [RyuJIT/ARM32] Enabling fast tail call feature Nov 2, 2017
@BruceForstall BruceForstall removed the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Nov 2, 2017
@BruceForstall
Copy link
Member

Looks like there are still issues (from the x86_arm_altjit run): https://ci.dot.net/job/dotnet_coreclr/job/master/job/x86_arm_altjit_checked_windows_nt_prtest/22/

Assertion failed 'argOffsetOut <= argOffsetMax' in 'Test:A(int,struct,struct):struct' (IL size 58)

@BruceForstall
Copy link
Member

@dotnet-bot test Windows_NT x86_arm_altjit Checked tailcallstress

@hseok-oh hseok-oh force-pushed the ryujit/fasttailcall branch 2 times, most recently from 9d05aa1 to 2c62e49 Compare November 3, 2017 08:47
@hseok-oh
Copy link
Author

hseok-oh commented Nov 8, 2017

@dotnet/arm32-contrib @BruceForstall @jashook
I think it's time to enable fast tail call on Linux/ARM32. PTAL.

  • Not use fast tail call when callee use floating point register argument (difficult to calculate stack size)
  • Not use fast tail call when callee use split struct argument or struct passed by stack
  • Fix importer to compare return type when we check tail call
  • Fix codegen bug: ARM32 not support INS_br (use INS_bx)

Copy link

@jashook jashook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the work. Please update the comment above fgCanFastTailCall to include what is supported on arm32 and what is not.

Please see https://github.com/dotnet/coreclr/blob/master/src/jit/morph.cpp#L7145 for examples.

Also the change does not make use of the existing logic to count regArgs and then make a decision at the end of the function based on the count of callee vs caller arguments. Instead it relies on returning earlier before hitting that code path. In my opinion this will make this already cumbersome function harder to review/edit in the future as it does not follow entirely the old paradigm.


argAlign = roundUp(argAlign, TARGET_POINTER_SIZE) / TARGET_POINTER_SIZE;

// We don't care float register because we will not use fast tailcall
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it is worth tracking fastTailCall support for callee functions with floating point. Is there an issue to track adding support for this later?

In addition what is the rational for not implementing it now?

argAlign = roundUp(argAlign, TARGET_POINTER_SIZE) / TARGET_POINTER_SIZE;

// We don't care float register because we will not use fast tailcall
// for callee method using float register
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Do not track floating point registers for arm32. It is NYI.


if (size > 1)
{
// hasTwoSlotSizedStruct will determine if the struct value can be passed multiple slot.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: two spaces in between passed and multiple

#ifdef _TARGET_ARM_
if (varTypeIsFloating(argx))
{
return false;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add logging to this return using reportFastTailCall decision.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also adding some sort of NYI or comment here explaining why this returns false would be nice.

// fastTailCall. This is an implementation limitation
// where the callee only is checked for non enregisterable structs.
// It is tracked with https://github.com/dotnet/coreclr/issues/12644.
hasMultiByteStackArgs = true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be using hasMultByteStackArgs in a undesired way as this struct can be passed in registers, there are just more callee arguments and we have to spill. Plus, I am not a big fan of the original hasMultiByteStackArgs code path I would like to avoid building on it.

I think the preferred way of dealing with the case of calleeArgRegCount >= MAX_REG_ARG && hasTwoSlotSizedStruct is here: https://github.com/dotnet/coreclr/blob/master/src/jit/morph.cpp#L7524.

}
unsigned size = genTypeStSz(argx->gtType);

varTypeIsFloating(argx) ? calleeFloatArgRegCount += size : calleeArgRegCount += size;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems better to just have this as:

calleeArgRegCount += size

@hseok-oh
Copy link
Author

@jashook Sorry for late feedback. I'll fix it soon.

@hseok-oh
Copy link
Author

@jashook I've updated.

  • Add comment in fgCanFastTailCall: condition to allow fast tail call on ARM
  • Change return and report position to end of function

PTAL

@hseok-oh
Copy link
Author

hseok-oh commented Dec 3, 2017

@dotnet-bot test Ubuntu x64 Innerloop Formatting

Copy link

@jashook jashook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking a little to get to this thank you for the changes.

@jashook
Copy link

jashook commented Dec 14, 2017

@BruceForstall ptal

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely familiar with the fast tail call implementation, but the changes look good to me.

@jashook
Copy link

jashook commented Dec 14, 2017

@dotnet-bot test this please

@BruceForstall
Copy link
Member

@dotnet-bot test this please

@BruceForstall
Copy link
Member

@dotnet-bot test Windows_NT arm Cross Checked jitstress1 Build and Test
@dotnet-bot test Windows_NT arm Cross Checked jitstress2 Build and Test
@dotnet-bot test Windows_NT arm Cross Checked tailcallstress Build and Test
@dotnet-bot test Windows_NT x86_arm_altjit Checked tailcallstress
@dotnet-bot test Windows_NT x86_arm_altjit Checked Build and Test

@BruceForstall
Copy link
Member

armlb failure is known.

@dotnet-bot test Windows_NT x86_arm_altjit Checked tailcallstress

@BruceForstall
Copy link
Member

Looks like a lot of test failures need to be investigated. In a short survey, I saw a lot of access violations in the JitStress runs, and VM asserts in the TailcallStress modes.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test failures need to be investigated/fixed.

- Not use fasttailcall when callee use floating point register argument
(difficult to calculate stack size)
- Not use fasttailcall when callee use split struct argument
- Fix importer to compare return type when we check tail call
- Fix codegen bug: ARM32 not support INS_br (use INS_bx)
Add comment in fgCanFastTailCall for ARM32
@hseok-oh hseok-oh force-pushed the ryujit/fasttailcall branch from 5d6ebf5 to afcdc72 Compare January 8, 2018 11:35
@hseok-oh
Copy link
Author

hseok-oh commented Jan 8, 2018

@dotnet-bot test Windows_NT arm Cross Checked jitstress1 Build and Test
@dotnet-bot test Windows_NT arm Cross Checked jitstress2 Build and Test
@dotnet-bot test Windows_NT arm Cross Checked tailcallstress Build and Test
@dotnet-bot test Windows_NT x86_arm_altjit Checked tailcallstress
@dotnet-bot test Windows_NT x86_arm_altjit Checked Build and Test

@BruceForstall
Copy link
Member

@dotnet-bot test Windows_NT arm Cross Checked jitstress1 Build and Test
@dotnet-bot test Windows_NT arm Cross Checked jitstress2 Build and Test
@dotnet-bot test Windows_NT arm Cross Checked tailcallstress Build and Test

@BruceForstall
Copy link
Member

Waiting until #16039 is resolved before reviewing this again.

@BruceForstall
Copy link
Member

@hseok-oh Can you fix the conflict so we can trigger retesting?

@alpencolt Are you taking this over?

@alpencolt
Copy link

@BruceForstall I'll take this one too.

@alpencolt
Copy link

@BruceForstall I've rebased PR but cannot push to GitHub:

$ git push origin  ryujit/fasttailcall
Username for 'https://github.com': alpencolt
Password for 'https://alpencolt@github.com': 
remote: Permission to dotnet/coreclr.git denied to alpencolt.
fatal: unable to access 'https://github.com/dotnet/coreclr.git/': The requested URL returned error: 403

It looks there is not enough permission or may be I do something wrong?
If I will push to my repo the new PR will will be created. What would you recommend?

@BruceForstall
Copy link
Member

@alpencolt You should just push to your fork and create a new PR. Then close this one (with a link/reference to your new one).

@alpencolt
Copy link

#16419

@jkotas jkotas closed this Feb 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants