Fix ARM/ARM64 hijacking in tail calls #16039

janvorli · 2018-01-26T19:54:29Z

This change fixes an issue that can happen when a function that has tail
calls is hijacked. There are two potential issues:

When a function that tail calls another one is hijacked, the LR may be
stored at a different location in the stack frame of the tail call
target.
So just by performing tail call, the hijacked location becomes invalid and
unhijacking would corrupt stack by writing to that location.
There is a small window after the caller pops LR from the stack in its
epilog and before the tail called function pushes LR in its prolog when
the hijacked return address would not be not on the stack and so we would
not be able to unhijack.

The fix is to prevent hijacking of functions that contain tail calls.

This change fixes an issue that can happen when a function that has tail calls is hijacked. There are two potential issues: 1. When a function that tail calls another one is hijacked, the LR may be stored at a different location in the stack frame of the tail call target. So just by performing tail call, the hijacked location becomes invalid and unhijacking would corrupt stack by writing to that location. 2. There is a small window after the caller pops LR from the stack in its epilog and before the tail called function pushes LR in its prolog when the hijacked return address would not be not on the stack and so we would not be able to unhijack. The fix is to prevent hijacking of functions that contain tail calls.

janvorli · 2018-01-26T19:54:48Z

CC: @davidwrighton

jkotas · 2018-01-26T20:05:59Z

src/jit/flowgraph.cpp

+        GenTreePtr tailCall                 = nullptr;
+        bool fastTailCallsOnly              = true;
+        bool tailCallsConvertibleToLoopOnly = false;        
+        if (!hasTailCalls && block->endsWithTailCall(this, fastTailCallsOnly, tailCallsConvertibleToLoopOnly, &tailCall))


Should we make this arm/arm64 specific so that it does not bloat the size of GCInfo on x64?

jkotas · 2018-01-26T20:08:32Z

src/inc/gcinfo.h

@@ -34,7 +34,7 @@ const unsigned   this_OFFSET_FLAG  = 0x2;  // the offset is "this"
 // The current GCInfo Version
 //-----------------------------------------------------------------------------

-#define GCINFO_VERSION 2
+#define GCINFO_VERSION 3


If you bump the GCInfo version format you have to also bump the version of R2R format. Is this version bump really necessary since this fix is only for platforms that have not shipped yet?

The version bump was not clear to me. I have thought that I need to bump it since there can already be some crossgen-ed 3rd party code that would use the old version. Is that not a problem?

there can already be some crossgen-ed 3rd party

That's correct for x86/x64. This fix is for ARM/ARM64 that do not have this problem.

GC_INFO_WANTS_REPORT_ONLY_LEAF is useless flag. It just exists for compatibility with JIT64 convention on .NET Framework x64. I would overload this flag for ARM/ARM64 to track the tailcall presence.

jkotas · 2018-01-26T20:10:53Z

src/inc/gcinfotypes.h

@@ -442,6 +442,7 @@ struct InfoHdrSmall {
    unsigned char  genericsContext : 1;//4 [1]      function reports a generics context parameter is present
    unsigned char  genericsContextIsMethodDesc : 1;//4[2]
    unsigned char  returnKind : 2; // 4 [4]  Available GcInfo v2 onwards, previously undefined 
+    unsigned char  hasTailCalls : 1; // 4 [5]


This is x86 specific. I do not think we need it here.

jkotas · 2018-01-29T13:49:44Z

src/gcdump/gcdumpnonx86.cpp

    gcPrintf("Wants Report Only Leaf: %u\n", hdrdecoder.WantsReportOnlyLeaf());
-
+#else // _TARGET_AMD64_


This should be #elif defined(_TARGET_ARM_) || defined(_TARGET_ARM64_) to match the rest.

I used the elif only in sources that are compiled for x86.

This works only if we do not ever expand the set of target architectures...

jkotas · 2018-01-29T14:00:14Z

src/vm/threadsuspend.cpp

@@ -6109,6 +6109,9 @@ struct ExecutionState
    IJitManager    *m_pJitManager;
    METHODTOKEN     m_MethodToken;
    BOOL            m_IsInterruptible;  // is this code interruptible?
+#if defined(_TARGET_ARM_) || defined(_TARGET_ARM64_)
+    BOOL            m_HasTailCalls;     // does this code perform tail calls?


This does not need to be cached in this structure here. It is computed and used within single method - we can just call it directly right where it is needed.

Makes sense

jkotas · 2018-01-29T14:06:21Z

src/vm/gcinfodecoder.cpp

    m_WantsReportOnlyLeaf      = ((headerFlags & GC_INFO_WANTS_REPORT_ONLY_LEAF) != 0);
+#else // _TARGET_AMD64_


The same as above - this source is not compiled for x86, so I've used just the else

janvorli · 2018-01-29T14:30:14Z

src/jit/legacynonjit/CMakeLists.txt

@@ -36,7 +36,7 @@ set_property(TARGET legacynonjit APPEND_STRING PROPERTY LINK_DEPENDS ${JIT_EXPOR

 set(RYUJIT_LINK_LIBRARIES
   utilcodestaticnohost
-   gcinfo
+   gcinfo_arm


@BruceForstall this is a bug in the legacynonjit CMakeLists.txt. I've hit it when making my changes that are ARM / ARM64 specific and was getting missing symbols due to the fact that x86 version of the gcinfo was being linked in.

BruceForstall · 2018-01-30T22:14:48Z

src/jit/codegencommon.cpp

@@ -180,6 +180,9 @@ CodeGen::CodeGen(Compiler* theCompiler) : CodeGenInterface(theCompiler)
    /* Assume that we not fully interruptible */

    genInterruptible = false;
+#if defined(_TARGET_ARM_) || defined(_TARGET_ARM64_)


In the JIT, only, but not the VM, you can replace all occurrences of defined(_TARGET_ARM_) || defined(_TARGET_ARM64_) with defined(_TARGET_ARMARCH_).

BruceForstall · 2018-01-30T22:18:21Z

src/inc/gcinfodecoder.h

@@ -520,6 +527,9 @@ class GcInfoDecoder
    bool    HasMethodTableGenericsInstContext();
    bool    GetIsVarArg();
    bool    WantsReportOnlyLeaf();
+#if defined(_TARGET_ARM_) || defined(_TARGET_ARM64_)


You should put the bool WantsReportOnlyLeaf(); above this also under #ifdef

BruceForstall · 2018-01-30T22:40:17Z

src/jit/flowgraph.cpp

@@ -18969,6 +18969,18 @@ void Compiler::fgSetBlockOrder()
            // JIT64 does.
            genInterruptible = true;
        }
+
+#if defined(_TARGET_ARM_) || defined(_TARGET_ARM64_)


I think this is the wrong location to check for this, although it probably works in most cases.

For the "hasTailCalls" bool, you only care if a function has any tailcalls at all, after it has been completely generated. If possible, this means we should delay the checking for tailcall existence until as late as possible, to prevent the case where we initially think we'll generate a tailcall, then later we change our mind due to some kind of optimization.

Also, we should clearly define what "hasTailCalls" means. E.g., the "normal" prolog/epilog sequence (for arm32) is "push {...,lr} ... pop {..., pc}", but we can also generate "push {..., lr} ... pop {..., lr}; bx lr". Are both variations ok?

And what about the JMP IL instruction? That is very similar to tailcall, in that we branch to the target function address, and (I think) LR remains live as the return address of the called target function.

I think you might want to put code in the _TARGET_ARMARCH_ version of genFnEpilog() (about line 9574 of codegencommon.cpp), in particular, within the if (jmpEpilog) clause:

if (jmpEpilog) { hasTailCalls = true; ...

Note that this code is run for both JMP and fast tail call.

Thanks @BruceForstall for the detailed review. When unhijack happens when the target thread is suspended at the bx lr (which is the only way to return on arm64, since there is no pop pc) and the function that was hijacked is the current function, it just restores the original return address to a location on the stack that's at an address that's less than the SP. It has no effect and causes no harm. When the target thread is resumed, it returns to the OnHijackTripThread. It pushes LR to stack and then calls the OnHijackWorker. This function places the original return address to the place where the OnHijackTripThread stored the LR and marks the thread as not hijacked. Then it does its hijacking related job and finally returns to the OnHijackTripThread that in turn returns to the original address that was hijacked. So it fixes the fact that the unhijack modified a wrong location.
The tail call case is different. We end up entering the tail call target with LR set to the OnHijackTripThread. Now there are several cases that can happen w.r.t. where the unhijack happens:

The target thread is executing the tail call target and it hasn't allocated the stack frame yet. The unhijack has no effect, since it restores the original return address to a location where it will either get overwritten by the current function's frame or to a location that's at lower address than the lowest address of the current function's frame.

The target thread is executing the tail call target and it has already allocated the stack frame. This has three sub-cases:
a. The tail call caller was storing the LR at a location that's inside of the current function's frame at an address different from the location where the tail call target stores the LR. In this case, unhijack corrupts the stack frame.
b. The tail call caller was storing the LR at a location that's at lower address than the lowest address of the current function's frame. In this case, the unhijack has no effect.
c. The tail call caller was storing the LR at a location that's at the same address address as the address where the LR is stored in the current function's frame. In this case, the unhijack actually works correctly.

The target thread is executing a function that was transitively called by the tail call target. Then we are in a situation similar to (2), but it would also cause corruption if the original location of the hijacked return address was inside one of the frames of the call chain from the tail call target and the current function. It could even cause more subtle effect in case that location by a chance matched a location of LR stored by another function in the call chain. Then the unhijack would cause the current function to return to original return address of the tail call caller instead of the real caller of the current function.

Now all of the cases described above where I've said that the unhijack would have no effect would actually "heal" the issue by returning from the tail call target to the OnHijackTripThread that would then return to the original return address that was saved.
But only if another hijack didn't happen before it returned from the tail call target. If another hijack happened, then the original return address would be lost, overwritten by the new one. If the hijack happened in the tail call target, the thread would end up running the OnHijackTripThread in an infinite loop, since it would store the OnHijackTripThread as the original return address. If the hijack happened in some function transitively called by the tail call target, then the tail call target would still return to the OnHijackTripThread, but that one would then return to the caller of the function that was hijacked, likely crashing, hanging or doing other interesting stuff.

Thanks for the detailed explanation. I guess this never happened on ARM because the JIT hasn't implemented fast tail call optimization (although #13828 has been waiting, to enable it). On ARM64, I guess we just haven't noticed it (or tested it well). Looks like test JIT/Methodical/tailcall_v4/hijacking/hijacking.sh is disabled for Linux in testsFailingOnArm64.txt and for Windows in tests\arm64\tests.lst, despite #4122 claiming the original failure was fixed.

I'm guessing your case 2c is the common case, which would greatly reduce the incidence of this bug. I presume cases of multiple hijacks, where we would "lose" the original address, are also rare, since once we hijack we wait for a while for it to kick in, and presumably most code returns via the hijacked address in that time.

As a matter of principle, at least, it seems like the unhijack code should not be writing to the stack (to restore the stack-based LR to its original value) at any address below the current SP. Presumably the VM has the context, and can compare the hijack address (m_ppvHJRetAddrPtr) against SP. It could notice that the hijack won't work in this case (such as the "pop {lr}; ; bx lr" case).

The hijack code saves the MethodDesc* of the function when the hijack is installed. Presumably that could be used in the unhijack case. E.g., assume the hijack happened in a tailcall caller, and now we are in a nested call tree within the tailcall target. If we go to unhijack, we could walk the stack looking for a function that contains the hijacked return address (m_pvHJRetAddr). If we reach this address, and the frame one lower on the stack (call it the "unhijack frame") isn't the saved MethodDesc*, then we have a mismatch. We could use the stack walker to tell us where in the "unhijack frame" LR is stored, and unhijack to that location instead of the saved location. So, for A --call--> B --tailcall--> C --call--> D --call--> E, with hijack in B, and unhijack in E: the stack frame looks like: A C D E (since the stack frame for B disappears). If we walk the stack, when we get to A, the saved return address will be in the code for function A. But we never found the saved MethodDesc* corresponding to B. So, we know B is gone. We could figure out where the return address is stored in the stack frame for C, and set it to the return address in A. Presumably, the saved LR in the stack frame for C must be OnHijackTripThread. (Actually, can we walk the stack if the saved LR is "incorrect"?)

Perhaps this is too complex (or there are holes in my logic).

BruceForstall · 2018-01-30T22:44:06Z

src/vm/gcinfodecoder.cpp

 bool GcInfoDecoder::WantsReportOnlyLeaf()
 {
+    // Only AMD64 with JIT64 can return false here.


Don't you just want to put the whole function under #ifdef _TARGET_AMD64_?

I have intentionally kept it on the GCInfoDecoder and just always return true for non-amd64 platforms. The places in the code that call this method would need to have ifdefs around parts of conditions otherwise and it seemed less readable.

Move the hasTailCall detection into the CodeGen::genFnEpilog and also let it cover the JMP IL instruction case, since its behavior is equivalent w.r.t. the hijacking.

janvorli · 2018-02-01T13:59:52Z

@BruceForstall I've moved the hasTailCalls detection as per your suggestion and also let it cover the JMP IL instruction case. I've kept the hasTailCalls name since the JMP behavior is equivalent to a tail call where the same parameters as the caller has are passed to the target function.

Instead of defined(_TARGET_ARM_) || defined(_TARGET_ARM64_), use defined(_TARGET_ARMARCH_).

BruceForstall

It would be useful to capture all the details of this issue (or, specifically, the ABI requirements surrounding tailcall and hijacking), from the conversations here, in the CLR ABI document, maybe with a new "Hijacking" section here: https://github.com/dotnet/coreclr/blob/master/Documentation/botr/clr-abi.md#general-unwindframe-layout. Note that hijacking is briefly mentioned there, specifically that LR is always saved to the stack.

BruceForstall · 2018-02-03T02:14:26Z

src/inc/gcinfoencoder.h

@@ -27,7 +27,8 @@
                hasPSPSymStackSlot,
                hasGenericsInstContextStackSlot, 
                hasStackBaseregister,
-                wantsReportOnlyLeaf,
+                wantsReportOnlyLeaf (AMD64 JIT64 use only),


I would say "AMD64 use only", not just JIT64. Yes, if we didn't have to support JIT64 anymore, we could remove it, but per the ABI doc, "(JIT32 and RyuJIT set it, JIT64 doesn't)".

janvorli · 2018-02-13T20:56:30Z

@BruceForstall, I'm back from my vacation, so I'd like to respond to few of your ideas here:

It would be useful to capture all the details of this issue (or, specifically, the ABI requirements surrounding tailcall and hijacking), from the conversations here, in the CLR ABI document

Good point, I'll do that as a separate change.

I guess this never happened on ARM because the JIT hasn't implemented fast tail call optimization (although #13828 has been waiting, to enable it). On ARM64, I guess we just haven't noticed it (or tested it well). Looks like test JIT/Methodical/tailcall_v4/hijacking/hijacking.sh is disabled for Linux in testsFailingOnArm64.txt and for Windows in tests\arm64\tests.lst, despite #4122 claiming the original failure was fixed.

Actually, I've hit the issue when running the JIT/Methodical/tailcall_v4/hijacking test on Windows ARM64 and that tests hits it pretty reliably in 50..75% of runs. The reason is that this test is basically a tight loop of two functions that tail call each other. The issue surfaced as the case when we loose the original hijacked address and get into infinite loop in that test.

As a matter of principle, at least, it seems like the unhijack code should not be writing to the stack (to restore the stack-based LR to its original value) at any address below the current SP. Presumably the VM has the context, and can compare the hijack address (m_ppvHJRetAddrPtr) against SP. It could notice that the hijack won't work in this case (such as the "pop {lr}; ; bx lr" case).

We can check that, but it would not help since unhijack is not allowed to fail. Unlike hijacking, we cannot resume the target thread and try later.

The hijack code saves the MethodDesc* of the function when the hijack is installed. Presumably that could be used in the unhijack case. E.g., assume the hijack happened in a tailcall caller, and now we are in a nested call tree within the tailcall target. If we go to unhijack, we could walk the stack looking for a function that contains the hijacked return address (m_pvHJRetAddr). If we reach this address, and the frame one lower on the stack (call it the "unhijack frame") isn't the saved MethodDesc*, then we have a mismatch. We could use the stack walker to tell us where in the "unhijack frame" LR is stored, and unhijack to that location instead of the saved location.

I have tried similar solution in my previous attempt to fix the issue that I've abandoned(#16011). See the comment https://github.com/dotnet/coreclr/pull/16011/files#r163866342 that describes the reasons why it was abandoned.

The test JIT/Methodical/tailcall_v4/hijacking should be passing now on ARM64.

janvorli · 2018-02-13T21:00:45Z

I have added enabling the JIT/Methodical/tailcall_v4/hijacking test for ARM64, since it should be passing with my fix.

janvorli · 2018-02-14T00:00:53Z

@dotnet-bot test Ubuntu arm64 Cross Checked normal Build and Test

janvorli · 2018-02-14T00:02:07Z

@dotnet-bot test Windows_NT arm64 Cross Checked forcerelocs Build and Test

sandreenko · 2018-02-14T19:31:52Z

src/inc/gcinfoencoder.h

@@ -27,7 +27,8 @@
                hasPSPSymStackSlot,
                hasGenericsInstContextStackSlot, 
                hasStackBaseregister,
-                wantsReportOnlyLeaf,
+                wantsReportOnlyLeaf (AMD64 use only),
+                hasTailCalls (ARM/ARM64 only)


comma is missed.

sandreenko · 2018-02-14T19:35:36Z

src/inc/gcinfoencoder.h

    bool   m_WantsReportOnlyLeaf;
+#elif defined(_TARGET_ARM_) || defined(_TARGET_ARM64_)
+    bool   m_HasTailCalls;
+#endif // _TARGET_AMD64_


wrong comment for #endif, it should be #endif defined(_TARGET_ARM_) || defined(_TARGET_ARM64_) according to https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/clr-jit-coding-conventions.md#75-commenting-ifdefs, but it is not jit sources, so maybe we do not care.

and defined(_TARGET_ARM_) || defined(_TARGET_ARM64_) == _TARGET_ARMARCH_

It seems that the coding convention doc actually doesn't cover the #elif pattern and in coreclr runtime, we use the pattern I've used (the comment on #endif matching the #if and not the #elif comment)

janvorli added area-VM arch-arm64 arch-arm32 labels Jan 26, 2018

janvorli added this to the Future milestone Jan 26, 2018

janvorli self-assigned this Jan 26, 2018

janvorli requested review from jkotas and BruceForstall January 26, 2018 19:54

jkotas requested a review from swaroop-sridhar January 26, 2018 20:03

jkotas reviewed Jan 26, 2018

View reviewed changes

Reflect PR feedback

91642b7

jkotas reviewed Jan 29, 2018

View reviewed changes

Fix formatting issues and reflect PR feedback

176abd4

janvorli commented Jan 29, 2018

View reviewed changes

Reflect more PR feedback

9fe28cc

jkotas approved these changes Jan 29, 2018

View reviewed changes

swaroop-sridhar approved these changes Jan 29, 2018

View reviewed changes

BruceForstall suggested changes Jan 30, 2018

View reviewed changes

Move the hasTailCall detection as per PR feedback

77f5aba

Move the hasTailCall detection into the CodeGen::genFnEpilog and also let it cover the JMP IL instruction case, since its behavior is equivalent w.r.t. the hijacking.

Use _TARGET_ARMARCH_ in JIT

8b91fcb

Instead of defined(_TARGET_ARM_) || defined(_TARGET_ARM64_), use defined(_TARGET_ARMARCH_).

BruceForstall approved these changes Feb 3, 2018

View reviewed changes

BruceForstall mentioned this pull request Feb 13, 2018

[RyuJIT/ARM32] Enabling fast tail call feature #14056

Closed

Enable the tailcall hijacking test for ARM64

91b5c6d

The test JIT/Methodical/tailcall_v4/hijacking should be passing now on ARM64.

janvorli merged commit 4fb9851 into dotnet:master Feb 14, 2018

sandreenko reviewed Feb 14, 2018

View reviewed changes

janvorli deleted the fix-arm-tailcall-hijacking branch February 14, 2018 19:34

sandreenko reviewed Feb 14, 2018

View reviewed changes

janvorli mentioned this pull request Feb 11, 2022

AddRemoveFromDifferentThreads<string>.ConcurrentStack benchmark hangs on ARM64 dotnet/runtime#64980

Closed

tomeksowi mentioned this pull request Nov 4, 2024

[RISC-V][LoongArch64] Encode missing tailcall in GC info dotnet/runtime#109512

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ARM/ARM64 hijacking in tail calls #16039

Fix ARM/ARM64 hijacking in tail calls #16039

janvorli commented Jan 26, 2018

janvorli commented Jan 26, 2018

jkotas Jan 26, 2018

jkotas Jan 26, 2018

janvorli Jan 26, 2018

jkotas Jan 26, 2018

jkotas Jan 26, 2018

jkotas Jan 29, 2018

janvorli Jan 29, 2018

jkotas Jan 29, 2018

jkotas Jan 29, 2018

janvorli Jan 29, 2018

jkotas Jan 29, 2018

janvorli Jan 29, 2018

janvorli Jan 29, 2018

BruceForstall Jan 30, 2018

janvorli Feb 1, 2018

BruceForstall Jan 30, 2018

BruceForstall Jan 30, 2018

janvorli Jan 31, 2018

BruceForstall Feb 3, 2018

BruceForstall Jan 30, 2018

janvorli Feb 1, 2018

janvorli commented Feb 1, 2018

BruceForstall left a comment

BruceForstall Feb 3, 2018

janvorli commented Feb 13, 2018

janvorli commented Feb 13, 2018

janvorli commented Feb 14, 2018

janvorli commented Feb 14, 2018

sandreenko Feb 14, 2018

sandreenko Feb 14, 2018 •

edited

Loading

janvorli Feb 14, 2018

		gcPrintf("Wants Report Only Leaf: %u\n", hdrdecoder.WantsReportOnlyLeaf());

		#else // _TARGET_AMD64_

		m_WantsReportOnlyLeaf = ((headerFlags & GC_INFO_WANTS_REPORT_ONLY_LEAF) != 0);
		#else // _TARGET_AMD64_

Fix ARM/ARM64 hijacking in tail calls #16039

Fix ARM/ARM64 hijacking in tail calls #16039

Conversation

janvorli commented Jan 26, 2018

janvorli commented Jan 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janvorli commented Feb 1, 2018

BruceForstall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janvorli commented Feb 13, 2018

janvorli commented Feb 13, 2018

janvorli commented Feb 14, 2018

janvorli commented Feb 14, 2018

Choose a reason for hiding this comment

sandreenko Feb 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandreenko Feb 14, 2018 •

edited

Loading