Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Implement genProfilingEnterCallback genProfilingLeaveCallback on Arm64 #26460

Merged
merged 19 commits into from
Sep 20, 2019
Merged

Implement genProfilingEnterCallback genProfilingLeaveCallback on Arm64 #26460

merged 19 commits into from
Sep 20, 2019

Conversation

echesakov
Copy link

@echesakov echesakov commented Aug 31, 2019

Fixes https://github.com/dotnet/coreclr/issues/19368

The following conventions for genProfilingEnterCallback and genProfilingLeaveCallback will be used on arm64:

  • x10 is used to pass functionIDOrClientID to a helper (REG_PROFILER_ENTER_ARG_FUNC_ID or REG_PROFILER_LEAVE_ARG_FUNC_ID)
  • x11 is used to pass callerSp to a helper (REG_PROFILER_ENTER_ARG_CALLER_SP or REG_PROFILER_LEAVE_ARG_CALLER_SP). It's quite tricky to figure out the callerSp on arm64 since we have 5 or 6 different types of the stack frame and fp is not guaranteed to point at the top of the stack frame (as on x64 or arm).
  • x12 is used for call target (i.e. an address of a helper function or jump stub) (REG_DEFAULT_HELPER_CALL_TARGET)
    contains call target (i.e. address of the helper function or jump stub)
  • During the call the following volatile registers are NOT preserved: x9-x17, v16-v31
  • During the call to CORINFO_HELP_PROF_FCN_ENTER all the argument registers (x0-x7, x8, v0-v7) will contain valid values of the arguments and must be preserved during the call. This helps to avoid pre-spilling of the argument register as we do on arm32.

@echesakov echesakov changed the title Implement profiling enter leave callbacks arm64 Implement genProfilingEnterCallback genProfilingLeaveCallback on Arm64 Aug 31, 2019
@echesakov
Copy link
Author

echesakov commented Aug 31, 2019

The following tests are failing:

  • JIT/Directed/VectorABI/VectorMgdMgd_ro/VectorMgdMgd_ro.sh
  • JIT/Directed/VectorABI/VectorMgdMgdStatic_r/VectorMgdMgdStatic_r.sh
  • JIT/Directed/VectorABI/VectorMgdMgdArray_r/VectorMgdMgdArray_r.sh
  • JIT/Directed/VectorABI/VectorMgdMgdArray_ro/VectorMgdMgdArray_ro.sh
  • JIT/Directed/VectorABI/VectorMgdMgd_r/VectorMgdMgd_r.sh
  • JIT/Directed/VectorABI/VectorMgdMgdStatic_ro/VectorMgdMgdStatic_ro.sh

due to an assertion in

coreclr/src/jit/gentree.cpp

Lines 14565 to 14569 in d951f5f

if (!ok)
{
gtDispTree(val);
assert(!"Incompatible types for gtNewTempAssign");
}

Update: when JIT generates profiler hooks:

  1. it creates a "designated" return basic block;
  2. makes all the existing return basic blocks jumping to the "designated" return basic block;
  3. merge the old return basic block result values by inserting assignment to a new temp at the end of these blocks.;
  4. return the new temp value at end of the "designated" return basic block.

During the call to gtNewTempAssign the types of the temp and the expression containing return values are checked for compatibility. In our case, the expression had GT_LCL_FLD of TYP_SIMD16 that was obtained during impFixupStructReturnType on GT_LCL_VAL of TYP_STRUCT where this struct contains one field of Vector128.

The similar assertion happens in synchronous methods during fgCreateMonitorTree and I created a repro - #26518.

One more occurrence of the issue happens (https://github.com/dotnet/coreclr/issues/26491) on linux-arm with enabled profiler hooks for the following methods

.method public static int32 byrefsubi4(class ctest& V_1, int32 V_2)
{
ldarg 0
ldarg 1
sub
ret
}

@echesakov
Copy link
Author

@davmason @noahfalk Do you have any suggestions regarding what ABI should be chosen for CORINFO_HELP_PROF_FCN_ENTER, CORINFO_HELP_PROF_FCN_LEAVE, CORINFO_HELP_PROF_FCN_TAILCALL helpers on Arm64?

I proposed and implemented the calling convention as described in #26460 (comment) since I could not find any documentation that would prescribe what ABI should be used.

The implementation passes Pri1 tests except the few tests as mentioned in the previous post. That issue will be fixed separately since it reproduces not only during profiling hooks code generation.

cc @dotnet/jit-contrib

@davmason
Copy link
Member

davmason commented Sep 5, 2019

@echesakovMSFT What you propose seems reasonable, although I am far from an expert on arm64. I think it is important that we pick a calling convention and document it, as long as it is reasonably performant and allows us to do both fast and slow path ELT the exact ABI is flexible.

@noahfalk
Copy link
Member

noahfalk commented Sep 6, 2019

I agree with @davmason, as long as the ABI is something the JIT can generate reasonable efficiently and we've documented it, the exact choice seems less critical. I am not remotely an ARM64 expert but what you described above seems reasonable to me. If it helps there is an old PR that I never finished that was trying to make some headway documenting some of our pre-existing ELT ABI work: #19560. I don't recall where I got the ARM64 part of that doc so don't feel like you need to give it any weight.

I do strongly recommend that we document the ABI in the clr-abi doc. I know we haven't been good at that in the past but it shouldn't be hard to avoid making the hole any deeper while the info is fresh : )

@echesakov
Copy link
Author

I will still need to implement Windows/arm64 helpers but I suspect they would look almost like the Linux ones

@davmason Can you please take a look at the profiler related changes?


gcInfo.gcMarkRegSetNpt(RBM_PROFILER_LEAVE_ARG_FUNC_ID);

if (compiler->lvaDoneFrameLayout == Compiler::FINAL_FRAME_LAYOUT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this just be an assert? We shouldn't ever call this function before final frame layout, should we?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true for genProfilingEnterCallback - I am not sure about genProfilingLeaveCallback

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with Bruce we think that the pattern

if (compiler->lvaDoneFrameLayout == Compiler::FINAL_FRAME_LAYOUT)
{
}
else
{
}

in genProfilingLeaveCallback was used because of legacy backend.

It looks that we can remove the else-logic and just do an assertion. I will follow up in another PR and remove the logic on x64 and arm64.

src/vm/arm/asmhelpers.S Show resolved Hide resolved
Copy link
Member

@davmason davmason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the changes, and I didn't see any issues. But I am far from an expert in arm64 so I'm not sure I would be able to spot anything.

@echesakov echesakov marked this pull request as ready for review September 17, 2019 23:05
@echesakov
Copy link
Author

/azp run coreclr-outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@echesakov
Copy link
Author

More changes:

  1. Fixed an issue in codegenarm.cpp where helper argument in genProfilingLeaveCallback was ignored
  2. Refactored the code in genProfilingLeaveCallback in codegenarm.cpp to use genTransferRegState
  3. Removed some unused macro in target.h
  4. As suggested removed the non-FINAL_FRAME_LAYOUT for arm64 - we don't need an assert since lvaToCallerSPRelativeOffset has one

@echesakov
Copy link
Author

I have update the first post to indicate what ABI is chosen for the ELT hooks.

@echesakov
Copy link
Author

I am going to follow-up on the @noahfalk's PR #19560 and update the information for Arm32 and Arm64

Copy link
Member

@erozenfeld erozenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RyuJIT/arm64] Implement ELT hooks
5 participants