Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Fix ARM/ARM64 hijacking in tail calls #16039

Merged
merged 7 commits into from
Feb 14, 2018

Conversation

janvorli
Copy link
Member

This change fixes an issue that can happen when a function that has tail
calls is hijacked. There are two potential issues:

  1. When a function that tail calls another one is hijacked, the LR may be
    stored at a different location in the stack frame of the tail call
    target.
    So just by performing tail call, the hijacked location becomes invalid and
    unhijacking would corrupt stack by writing to that location.

  2. There is a small window after the caller pops LR from the stack in its
    epilog and before the tail called function pushes LR in its prolog when
    the hijacked return address would not be not on the stack and so we would
    not be able to unhijack.

The fix is to prevent hijacking of functions that contain tail calls.

This change fixes an issue that can happen when a function that has tail
calls is hijacked. There are two potential issues:

1. When a function that tail calls another one is hijacked, the LR may be
stored at a different location in the stack frame of the tail call
target.
So just by performing tail call, the hijacked location becomes invalid and
unhijacking would corrupt stack by writing to that location.

2. There is a small window after the caller pops LR from the stack in its
epilog and before the tail called function pushes LR in its prolog when
the hijacked return address would not be not on the stack and so we would
not be able to unhijack.

The fix is to prevent hijacking of functions that contain tail calls.
@janvorli
Copy link
Member Author

CC: @davidwrighton

GenTreePtr tailCall = nullptr;
bool fastTailCallsOnly = true;
bool tailCallsConvertibleToLoopOnly = false;
if (!hasTailCalls && block->endsWithTailCall(this, fastTailCallsOnly, tailCallsConvertibleToLoopOnly, &tailCall))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this arm/arm64 specific so that it does not bloat the size of GCInfo on x64?

src/inc/gcinfo.h Outdated
@@ -34,7 +34,7 @@ const unsigned this_OFFSET_FLAG = 0x2; // the offset is "this"
// The current GCInfo Version
//-----------------------------------------------------------------------------

#define GCINFO_VERSION 2
#define GCINFO_VERSION 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you bump the GCInfo version format you have to also bump the version of R2R format. Is this version bump really necessary since this fix is only for platforms that have not shipped yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version bump was not clear to me. I have thought that I need to bump it since there can already be some crossgen-ed 3rd party code that would use the old version. Is that not a problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there can already be some crossgen-ed 3rd party

That's correct for x86/x64. This fix is for ARM/ARM64 that do not have this problem.

GC_INFO_WANTS_REPORT_ONLY_LEAF is useless flag. It just exists for compatibility with JIT64 convention on .NET Framework x64. I would overload this flag for ARM/ARM64 to track the tailcall presence.

@@ -442,6 +442,7 @@ struct InfoHdrSmall {
unsigned char genericsContext : 1;//4 [1] function reports a generics context parameter is present
unsigned char genericsContextIsMethodDesc : 1;//4[2]
unsigned char returnKind : 2; // 4 [4] Available GcInfo v2 onwards, previously undefined
unsigned char hasTailCalls : 1; // 4 [5]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is x86 specific. I do not think we need it here.

gcPrintf("Wants Report Only Leaf: %u\n", hdrdecoder.WantsReportOnlyLeaf());

#else // _TARGET_AMD64_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be #elif defined(_TARGET_ARM_) || defined(_TARGET_ARM64_) to match the rest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the elif only in sources that are compiled for x86.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works only if we do not ever expand the set of target architectures...

@@ -6109,6 +6109,9 @@ struct ExecutionState
IJitManager *m_pJitManager;
METHODTOKEN m_MethodToken;
BOOL m_IsInterruptible; // is this code interruptible?
#if defined(_TARGET_ARM_) || defined(_TARGET_ARM64_)
BOOL m_HasTailCalls; // does this code perform tail calls?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not need to be cached in this structure here. It is computed and used within single method - we can just call it directly right where it is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

m_WantsReportOnlyLeaf = ((headerFlags & GC_INFO_WANTS_REPORT_ONLY_LEAF) != 0);
#else // _TARGET_AMD64_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#elif ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above - this source is not compiled for x86, so I've used just the else

@@ -36,7 +36,7 @@ set_property(TARGET legacynonjit APPEND_STRING PROPERTY LINK_DEPENDS ${JIT_EXPOR

set(RYUJIT_LINK_LIBRARIES
utilcodestaticnohost
gcinfo
gcinfo_arm
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BruceForstall this is a bug in the legacynonjit CMakeLists.txt. I've hit it when making my changes that are ARM / ARM64 specific and was getting missing symbols due to the fact that x86 version of the gcinfo was being linked in.

@@ -180,6 +180,9 @@ CodeGen::CodeGen(Compiler* theCompiler) : CodeGenInterface(theCompiler)
/* Assume that we not fully interruptible */

genInterruptible = false;
#if defined(_TARGET_ARM_) || defined(_TARGET_ARM64_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the JIT, only, but not the VM, you can replace all occurrences of defined(_TARGET_ARM_) || defined(_TARGET_ARM64_) with defined(_TARGET_ARMARCH_).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -520,6 +527,9 @@ class GcInfoDecoder
bool HasMethodTableGenericsInstContext();
bool GetIsVarArg();
bool WantsReportOnlyLeaf();
#if defined(_TARGET_ARM_) || defined(_TARGET_ARM64_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should put the bool WantsReportOnlyLeaf(); above this also under #ifdef

@@ -18969,6 +18969,18 @@ void Compiler::fgSetBlockOrder()
// JIT64 does.
genInterruptible = true;
}

#if defined(_TARGET_ARM_) || defined(_TARGET_ARM64_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the wrong location to check for this, although it probably works in most cases.

For the "hasTailCalls" bool, you only care if a function has any tailcalls at all, after it has been completely generated. If possible, this means we should delay the checking for tailcall existence until as late as possible, to prevent the case where we initially think we'll generate a tailcall, then later we change our mind due to some kind of optimization.

Also, we should clearly define what "hasTailCalls" means. E.g., the "normal" prolog/epilog sequence (for arm32) is "push {...,lr} ... pop {..., pc}", but we can also generate "push {..., lr} ... pop {..., lr}; bx lr". Are both variations ok?

And what about the JMP IL instruction? That is very similar to tailcall, in that we branch to the target function address, and (I think) LR remains live as the return address of the called target function.

I think you might want to put code in the _TARGET_ARMARCH_ version of genFnEpilog() (about line 9574 of codegencommon.cpp), in particular, within the if (jmpEpilog) clause:

if (jmpEpilog)
{
    hasTailCalls = true;
    ...

Note that this code is run for both JMP and fast tail call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BruceForstall for the detailed review. When unhijack happens when the target thread is suspended at the bx lr (which is the only way to return on arm64, since there is no pop pc) and the function that was hijacked is the current function, it just restores the original return address to a location on the stack that's at an address that's less than the SP. It has no effect and causes no harm. When the target thread is resumed, it returns to the OnHijackTripThread. It pushes LR to stack and then calls the OnHijackWorker. This function places the original return address to the place where the OnHijackTripThread stored the LR and marks the thread as not hijacked. Then it does its hijacking related job and finally returns to the OnHijackTripThread that in turn returns to the original address that was hijacked. So it fixes the fact that the unhijack modified a wrong location.
The tail call case is different. We end up entering the tail call target with LR set to the OnHijackTripThread. Now there are several cases that can happen w.r.t. where the unhijack happens:

  1. The target thread is executing the tail call target and it hasn't allocated the stack frame yet. The unhijack has no effect, since it restores the original return address to a location where it will either get overwritten by the current function's frame or to a location that's at lower address than the lowest address of the current function's frame.
  2. The target thread is executing the tail call target and it has already allocated the stack frame. This has three sub-cases:
    a. The tail call caller was storing the LR at a location that's inside of the current function's frame at an address different from the location where the tail call target stores the LR. In this case, unhijack corrupts the stack frame.
    b. The tail call caller was storing the LR at a location that's at lower address than the lowest address of the current function's frame. In this case, the unhijack has no effect.
    c. The tail call caller was storing the LR at a location that's at the same address address as the address where the LR is stored in the current function's frame. In this case, the unhijack actually works correctly.
  3. The target thread is executing a function that was transitively called by the tail call target. Then we are in a situation similar to (2), but it would also cause corruption if the original location of the hijacked return address was inside one of the frames of the call chain from the tail call target and the current function. It could even cause more subtle effect in case that location by a chance matched a location of LR stored by another function in the call chain. Then the unhijack would cause the current function to return to original return address of the tail call caller instead of the real caller of the current function.

Now all of the cases described above where I've said that the unhijack would have no effect would actually "heal" the issue by returning from the tail call target to the OnHijackTripThread that would then return to the original return address that was saved.
But only if another hijack didn't happen before it returned from the tail call target. If another hijack happened, then the original return address would be lost, overwritten by the new one. If the hijack happened in the tail call target, the thread would end up running the OnHijackTripThread in an infinite loop, since it would store the OnHijackTripThread as the original return address. If the hijack happened in some function transitively called by the tail call target, then the tail call target would still return to the OnHijackTripThread, but that one would then return to the caller of the function that was hijacked, likely crashing, hanging or doing other interesting stuff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation. I guess this never happened on ARM because the JIT hasn't implemented fast tail call optimization (although #13828 has been waiting, to enable it). On ARM64, I guess we just haven't noticed it (or tested it well). Looks like test JIT/Methodical/tailcall_v4/hijacking/hijacking.sh is disabled for Linux in testsFailingOnArm64.txt and for Windows in tests\arm64\tests.lst, despite #4122 claiming the original failure was fixed.

I'm guessing your case 2c is the common case, which would greatly reduce the incidence of this bug. I presume cases of multiple hijacks, where we would "lose" the original address, are also rare, since once we hijack we wait for a while for it to kick in, and presumably most code returns via the hijacked address in that time.

As a matter of principle, at least, it seems like the unhijack code should not be writing to the stack (to restore the stack-based LR to its original value) at any address below the current SP. Presumably the VM has the context, and can compare the hijack address (m_ppvHJRetAddrPtr) against SP. It could notice that the hijack won't work in this case (such as the "pop {lr}; ; bx lr" case).

The hijack code saves the MethodDesc* of the function when the hijack is installed. Presumably that could be used in the unhijack case. E.g., assume the hijack happened in a tailcall caller, and now we are in a nested call tree within the tailcall target. If we go to unhijack, we could walk the stack looking for a function that contains the hijacked return address (m_pvHJRetAddr). If we reach this address, and the frame one lower on the stack (call it the "unhijack frame") isn't the saved MethodDesc*, then we have a mismatch. We could use the stack walker to tell us where in the "unhijack frame" LR is stored, and unhijack to that location instead of the saved location. So, for A --call--> B --tailcall--> C --call--> D --call--> E, with hijack in B, and unhijack in E: the stack frame looks like: A C D E (since the stack frame for B disappears). If we walk the stack, when we get to A, the saved return address will be in the code for function A. But we never found the saved MethodDesc* corresponding to B. So, we know B is gone. We could figure out where the return address is stored in the stack frame for C, and set it to the return address in A. Presumably, the saved LR in the stack frame for C must be OnHijackTripThread. (Actually, can we walk the stack if the saved LR is "incorrect"?)

Perhaps this is too complex (or there are holes in my logic).

bool GcInfoDecoder::WantsReportOnlyLeaf()
{
// Only AMD64 with JIT64 can return false here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you just want to put the whole function under #ifdef _TARGET_AMD64_?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have intentionally kept it on the GCInfoDecoder and just always return true for non-amd64 platforms. The places in the code that call this method would need to have ifdefs around parts of conditions otherwise and it seemed less readable.

Move the hasTailCall detection into the CodeGen::genFnEpilog and also
let it cover the JMP IL instruction case, since its behavior is
equivalent w.r.t. the hijacking.
@janvorli
Copy link
Member Author

janvorli commented Feb 1, 2018

@BruceForstall I've moved the hasTailCalls detection as per your suggestion and also let it cover the JMP IL instruction case. I've kept the hasTailCalls name since the JMP behavior is equivalent to a tail call where the same parameters as the caller has are passed to the target function.

Instead of defined(_TARGET_ARM_) || defined(_TARGET_ARM64_), use
defined(_TARGET_ARMARCH_).
Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to capture all the details of this issue (or, specifically, the ABI requirements surrounding tailcall and hijacking), from the conversations here, in the CLR ABI document, maybe with a new "Hijacking" section here: https://github.com/dotnet/coreclr/blob/master/Documentation/botr/clr-abi.md#general-unwindframe-layout. Note that hijacking is briefly mentioned there, specifically that LR is always saved to the stack.

@@ -27,7 +27,8 @@
hasPSPSymStackSlot,
hasGenericsInstContextStackSlot,
hasStackBaseregister,
wantsReportOnlyLeaf,
wantsReportOnlyLeaf (AMD64 JIT64 use only),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say "AMD64 use only", not just JIT64. Yes, if we didn't have to support JIT64 anymore, we could remove it, but per the ABI doc, "(JIT32 and RyuJIT set it, JIT64 doesn't)".

@janvorli
Copy link
Member Author

@BruceForstall, I'm back from my vacation, so I'd like to respond to few of your ideas here:

It would be useful to capture all the details of this issue (or, specifically, the ABI requirements surrounding tailcall and hijacking), from the conversations here, in the CLR ABI document

Good point, I'll do that as a separate change.

I guess this never happened on ARM because the JIT hasn't implemented fast tail call optimization (although #13828 has been waiting, to enable it). On ARM64, I guess we just haven't noticed it (or tested it well). Looks like test JIT/Methodical/tailcall_v4/hijacking/hijacking.sh is disabled for Linux in testsFailingOnArm64.txt and for Windows in tests\arm64\tests.lst, despite #4122 claiming the original failure was fixed.

Actually, I've hit the issue when running the JIT/Methodical/tailcall_v4/hijacking test on Windows ARM64 and that tests hits it pretty reliably in 50..75% of runs. The reason is that this test is basically a tight loop of two functions that tail call each other. The issue surfaced as the case when we loose the original hijacked address and get into infinite loop in that test.

As a matter of principle, at least, it seems like the unhijack code should not be writing to the stack (to restore the stack-based LR to its original value) at any address below the current SP. Presumably the VM has the context, and can compare the hijack address (m_ppvHJRetAddrPtr) against SP. It could notice that the hijack won't work in this case (such as the "pop {lr}; ; bx lr" case).

We can check that, but it would not help since unhijack is not allowed to fail. Unlike hijacking, we cannot resume the target thread and try later.

The hijack code saves the MethodDesc* of the function when the hijack is installed. Presumably that could be used in the unhijack case. E.g., assume the hijack happened in a tailcall caller, and now we are in a nested call tree within the tailcall target. If we go to unhijack, we could walk the stack looking for a function that contains the hijacked return address (m_pvHJRetAddr). If we reach this address, and the frame one lower on the stack (call it the "unhijack frame") isn't the saved MethodDesc*, then we have a mismatch. We could use the stack walker to tell us where in the "unhijack frame" LR is stored, and unhijack to that location instead of the saved location.

I have tried similar solution in my previous attempt to fix the issue that I've abandoned(#16011). See the comment https://github.com/dotnet/coreclr/pull/16011/files#r163866342 that describes the reasons why it was abandoned.

The test JIT/Methodical/tailcall_v4/hijacking should be passing now on
ARM64.
@janvorli
Copy link
Member Author

I have added enabling the JIT/Methodical/tailcall_v4/hijacking test for ARM64, since it should be passing with my fix.

@janvorli
Copy link
Member Author

@dotnet-bot test Ubuntu arm64 Cross Checked normal Build and Test

@janvorli
Copy link
Member Author

@dotnet-bot test Windows_NT arm64 Cross Checked forcerelocs Build and Test

@janvorli janvorli merged commit 4fb9851 into dotnet:master Feb 14, 2018
@@ -27,7 +27,8 @@
hasPSPSymStackSlot,
hasGenericsInstContextStackSlot,
hasStackBaseregister,
wantsReportOnlyLeaf,
wantsReportOnlyLeaf (AMD64 use only),
hasTailCalls (ARM/ARM64 only)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comma is missed.

@janvorli janvorli deleted the fix-arm-tailcall-hijacking branch February 14, 2018 19:34
bool m_WantsReportOnlyLeaf;
#elif defined(_TARGET_ARM_) || defined(_TARGET_ARM64_)
bool m_HasTailCalls;
#endif // _TARGET_AMD64_
Copy link

@sandreenko sandreenko Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong comment for #endif, it should be #endif defined(_TARGET_ARM_) || defined(_TARGET_ARM64_) according to https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/clr-jit-coding-conventions.md#75-commenting-ifdefs, but it is not jit sources, so maybe we do not care.

and defined(_TARGET_ARM_) || defined(_TARGET_ARM64_) == _TARGET_ARMARCH_

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the coding convention doc actually doesn't cover the #elif pattern and in coreclr runtime, we use the pattern I've used (the comment on #endif matching the #if and not the #elif comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants