-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix ARM/ARM64 hijacking in tail calls #16039
Conversation
This change fixes an issue that can happen when a function that has tail calls is hijacked. There are two potential issues: 1. When a function that tail calls another one is hijacked, the LR may be stored at a different location in the stack frame of the tail call target. So just by performing tail call, the hijacked location becomes invalid and unhijacking would corrupt stack by writing to that location. 2. There is a small window after the caller pops LR from the stack in its epilog and before the tail called function pushes LR in its prolog when the hijacked return address would not be not on the stack and so we would not be able to unhijack. The fix is to prevent hijacking of functions that contain tail calls.
CC: @davidwrighton |
src/jit/flowgraph.cpp
Outdated
GenTreePtr tailCall = nullptr; | ||
bool fastTailCallsOnly = true; | ||
bool tailCallsConvertibleToLoopOnly = false; | ||
if (!hasTailCalls && block->endsWithTailCall(this, fastTailCallsOnly, tailCallsConvertibleToLoopOnly, &tailCall)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this arm/arm64 specific so that it does not bloat the size of GCInfo on x64?
src/inc/gcinfo.h
Outdated
@@ -34,7 +34,7 @@ const unsigned this_OFFSET_FLAG = 0x2; // the offset is "this" | |||
// The current GCInfo Version | |||
//----------------------------------------------------------------------------- | |||
|
|||
#define GCINFO_VERSION 2 | |||
#define GCINFO_VERSION 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you bump the GCInfo version format you have to also bump the version of R2R format. Is this version bump really necessary since this fix is only for platforms that have not shipped yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version bump was not clear to me. I have thought that I need to bump it since there can already be some crossgen-ed 3rd party code that would use the old version. Is that not a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there can already be some crossgen-ed 3rd party
That's correct for x86/x64. This fix is for ARM/ARM64 that do not have this problem.
GC_INFO_WANTS_REPORT_ONLY_LEAF
is useless flag. It just exists for compatibility with JIT64 convention on .NET Framework x64. I would overload this flag for ARM/ARM64 to track the tailcall presence.
src/inc/gcinfotypes.h
Outdated
@@ -442,6 +442,7 @@ struct InfoHdrSmall { | |||
unsigned char genericsContext : 1;//4 [1] function reports a generics context parameter is present | |||
unsigned char genericsContextIsMethodDesc : 1;//4[2] | |||
unsigned char returnKind : 2; // 4 [4] Available GcInfo v2 onwards, previously undefined | |||
unsigned char hasTailCalls : 1; // 4 [5] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is x86 specific. I do not think we need it here.
src/gcdump/gcdumpnonx86.cpp
Outdated
gcPrintf("Wants Report Only Leaf: %u\n", hdrdecoder.WantsReportOnlyLeaf()); | ||
|
||
#else // _TARGET_AMD64_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be #elif defined(_TARGET_ARM_) || defined(_TARGET_ARM64_)
to match the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the elif only in sources that are compiled for x86.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works only if we do not ever expand the set of target architectures...
src/vm/threadsuspend.cpp
Outdated
@@ -6109,6 +6109,9 @@ struct ExecutionState | |||
IJitManager *m_pJitManager; | |||
METHODTOKEN m_MethodToken; | |||
BOOL m_IsInterruptible; // is this code interruptible? | |||
#if defined(_TARGET_ARM_) || defined(_TARGET_ARM64_) | |||
BOOL m_HasTailCalls; // does this code perform tail calls? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not need to be cached in this structure here. It is computed and used within single method - we can just call it directly right where it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
src/vm/gcinfodecoder.cpp
Outdated
m_WantsReportOnlyLeaf = ((headerFlags & GC_INFO_WANTS_REPORT_ONLY_LEAF) != 0); | ||
#else // _TARGET_AMD64_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#elif
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above - this source is not compiled for x86, so I've used just the else
@@ -36,7 +36,7 @@ set_property(TARGET legacynonjit APPEND_STRING PROPERTY LINK_DEPENDS ${JIT_EXPOR | |||
|
|||
set(RYUJIT_LINK_LIBRARIES | |||
utilcodestaticnohost | |||
gcinfo | |||
gcinfo_arm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BruceForstall this is a bug in the legacynonjit CMakeLists.txt. I've hit it when making my changes that are ARM / ARM64 specific and was getting missing symbols due to the fact that x86 version of the gcinfo was being linked in.
src/jit/codegencommon.cpp
Outdated
@@ -180,6 +180,9 @@ CodeGen::CodeGen(Compiler* theCompiler) : CodeGenInterface(theCompiler) | |||
/* Assume that we not fully interruptible */ | |||
|
|||
genInterruptible = false; | |||
#if defined(_TARGET_ARM_) || defined(_TARGET_ARM64_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the JIT, only, but not the VM, you can replace all occurrences of defined(_TARGET_ARM_) || defined(_TARGET_ARM64_)
with defined(_TARGET_ARMARCH_)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
@@ -520,6 +527,9 @@ class GcInfoDecoder | |||
bool HasMethodTableGenericsInstContext(); | |||
bool GetIsVarArg(); | |||
bool WantsReportOnlyLeaf(); | |||
#if defined(_TARGET_ARM_) || defined(_TARGET_ARM64_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should put the bool WantsReportOnlyLeaf();
above this also under #ifdef
src/jit/flowgraph.cpp
Outdated
@@ -18969,6 +18969,18 @@ void Compiler::fgSetBlockOrder() | |||
// JIT64 does. | |||
genInterruptible = true; | |||
} | |||
|
|||
#if defined(_TARGET_ARM_) || defined(_TARGET_ARM64_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the wrong location to check for this, although it probably works in most cases.
For the "hasTailCalls" bool, you only care if a function has any tailcalls at all, after it has been completely generated. If possible, this means we should delay the checking for tailcall existence until as late as possible, to prevent the case where we initially think we'll generate a tailcall, then later we change our mind due to some kind of optimization.
Also, we should clearly define what "hasTailCalls" means. E.g., the "normal" prolog/epilog sequence (for arm32) is "push {...,lr} ... pop {..., pc}", but we can also generate "push {..., lr} ... pop {..., lr}; bx lr". Are both variations ok?
And what about the JMP IL instruction? That is very similar to tailcall, in that we branch to the target function address, and (I think) LR remains live as the return address of the called target function.
I think you might want to put code in the _TARGET_ARMARCH_
version of genFnEpilog()
(about line 9574 of codegencommon.cpp), in particular, within the if (jmpEpilog)
clause:
if (jmpEpilog)
{
hasTailCalls = true;
...
Note that this code is run for both JMP and fast tail call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BruceForstall for the detailed review. When unhijack happens when the target thread is suspended at the bx lr
(which is the only way to return on arm64, since there is no pop pc
) and the function that was hijacked is the current function, it just restores the original return address to a location on the stack that's at an address that's less than the SP. It has no effect and causes no harm. When the target thread is resumed, it returns to the OnHijackTripThread. It pushes LR to stack and then calls the OnHijackWorker. This function places the original return address to the place where the OnHijackTripThread stored the LR and marks the thread as not hijacked. Then it does its hijacking related job and finally returns to the OnHijackTripThread that in turn returns to the original address that was hijacked. So it fixes the fact that the unhijack modified a wrong location.
The tail call case is different. We end up entering the tail call target with LR set to the OnHijackTripThread. Now there are several cases that can happen w.r.t. where the unhijack happens:
- The target thread is executing the tail call target and it hasn't allocated the stack frame yet. The unhijack has no effect, since it restores the original return address to a location where it will either get overwritten by the current function's frame or to a location that's at lower address than the lowest address of the current function's frame.
- The target thread is executing the tail call target and it has already allocated the stack frame. This has three sub-cases:
a. The tail call caller was storing the LR at a location that's inside of the current function's frame at an address different from the location where the tail call target stores the LR. In this case, unhijack corrupts the stack frame.
b. The tail call caller was storing the LR at a location that's at lower address than the lowest address of the current function's frame. In this case, the unhijack has no effect.
c. The tail call caller was storing the LR at a location that's at the same address address as the address where the LR is stored in the current function's frame. In this case, the unhijack actually works correctly. - The target thread is executing a function that was transitively called by the tail call target. Then we are in a situation similar to (2), but it would also cause corruption if the original location of the hijacked return address was inside one of the frames of the call chain from the tail call target and the current function. It could even cause more subtle effect in case that location by a chance matched a location of LR stored by another function in the call chain. Then the unhijack would cause the current function to return to original return address of the tail call caller instead of the real caller of the current function.
Now all of the cases described above where I've said that the unhijack would have no effect would actually "heal" the issue by returning from the tail call target to the OnHijackTripThread that would then return to the original return address that was saved.
But only if another hijack didn't happen before it returned from the tail call target. If another hijack happened, then the original return address would be lost, overwritten by the new one. If the hijack happened in the tail call target, the thread would end up running the OnHijackTripThread in an infinite loop, since it would store the OnHijackTripThread as the original return address. If the hijack happened in some function transitively called by the tail call target, then the tail call target would still return to the OnHijackTripThread, but that one would then return to the caller of the function that was hijacked, likely crashing, hanging or doing other interesting stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation. I guess this never happened on ARM because the JIT hasn't implemented fast tail call optimization (although #13828 has been waiting, to enable it). On ARM64, I guess we just haven't noticed it (or tested it well). Looks like test JIT/Methodical/tailcall_v4/hijacking/hijacking.sh is disabled for Linux in testsFailingOnArm64.txt and for Windows in tests\arm64\tests.lst, despite #4122 claiming the original failure was fixed.
I'm guessing your case 2c is the common case, which would greatly reduce the incidence of this bug. I presume cases of multiple hijacks, where we would "lose" the original address, are also rare, since once we hijack we wait for a while for it to kick in, and presumably most code returns via the hijacked address in that time.
As a matter of principle, at least, it seems like the unhijack code should not be writing to the stack (to restore the stack-based LR to its original value) at any address below the current SP. Presumably the VM has the context, and can compare the hijack address (m_ppvHJRetAddrPtr) against SP. It could notice that the hijack won't work in this case (such as the "pop {lr}; ; bx lr" case).
The hijack code saves the MethodDesc*
of the function when the hijack is installed. Presumably that could be used in the unhijack case. E.g., assume the hijack happened in a tailcall caller, and now we are in a nested call tree within the tailcall target. If we go to unhijack, we could walk the stack looking for a function that contains the hijacked return address (m_pvHJRetAddr). If we reach this address, and the frame one lower on the stack (call it the "unhijack frame") isn't the saved MethodDesc*
, then we have a mismatch. We could use the stack walker to tell us where in the "unhijack frame" LR is stored, and unhijack to that location instead of the saved location. So, for A --call--> B --tailcall--> C --call--> D --call--> E
, with hijack in B, and unhijack in E: the stack frame looks like: A C D E
(since the stack frame for B disappears). If we walk the stack, when we get to A, the saved return address will be in the code for function A. But we never found the saved MethodDesc*
corresponding to B. So, we know B is gone. We could figure out where the return address is stored in the stack frame for C, and set it to the return address in A. Presumably, the saved LR in the stack frame for C must be OnHijackTripThread. (Actually, can we walk the stack if the saved LR is "incorrect"?)
Perhaps this is too complex (or there are holes in my logic).
bool GcInfoDecoder::WantsReportOnlyLeaf() | ||
{ | ||
// Only AMD64 with JIT64 can return false here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you just want to put the whole function under #ifdef _TARGET_AMD64_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have intentionally kept it on the GCInfoDecoder and just always return true for non-amd64 platforms. The places in the code that call this method would need to have ifdefs around parts of conditions otherwise and it seemed less readable.
Move the hasTailCall detection into the CodeGen::genFnEpilog and also let it cover the JMP IL instruction case, since its behavior is equivalent w.r.t. the hijacking.
@BruceForstall I've moved the hasTailCalls detection as per your suggestion and also let it cover the JMP IL instruction case. I've kept the hasTailCalls name since the JMP behavior is equivalent to a tail call where the same parameters as the caller has are passed to the target function. |
Instead of defined(_TARGET_ARM_) || defined(_TARGET_ARM64_), use defined(_TARGET_ARMARCH_).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to capture all the details of this issue (or, specifically, the ABI requirements surrounding tailcall and hijacking), from the conversations here, in the CLR ABI document, maybe with a new "Hijacking" section here: https://github.com/dotnet/coreclr/blob/master/Documentation/botr/clr-abi.md#general-unwindframe-layout. Note that hijacking is briefly mentioned there, specifically that LR is always saved to the stack.
src/inc/gcinfoencoder.h
Outdated
@@ -27,7 +27,8 @@ | |||
hasPSPSymStackSlot, | |||
hasGenericsInstContextStackSlot, | |||
hasStackBaseregister, | |||
wantsReportOnlyLeaf, | |||
wantsReportOnlyLeaf (AMD64 JIT64 use only), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say "AMD64 use only", not just JIT64. Yes, if we didn't have to support JIT64 anymore, we could remove it, but per the ABI doc, "(JIT32 and RyuJIT set it, JIT64 doesn't)".
@BruceForstall, I'm back from my vacation, so I'd like to respond to few of your ideas here:
Good point, I'll do that as a separate change.
Actually, I've hit the issue when running the JIT/Methodical/tailcall_v4/hijacking test on Windows ARM64 and that tests hits it pretty reliably in 50..75% of runs. The reason is that this test is basically a tight loop of two functions that tail call each other. The issue surfaced as the case when we loose the original hijacked address and get into infinite loop in that test.
We can check that, but it would not help since unhijack is not allowed to fail. Unlike hijacking, we cannot resume the target thread and try later.
I have tried similar solution in my previous attempt to fix the issue that I've abandoned(#16011). See the comment https://github.com/dotnet/coreclr/pull/16011/files#r163866342 that describes the reasons why it was abandoned. |
The test JIT/Methodical/tailcall_v4/hijacking should be passing now on ARM64.
I have added enabling the |
@dotnet-bot test Ubuntu arm64 Cross Checked normal Build and Test |
@dotnet-bot test Windows_NT arm64 Cross Checked forcerelocs Build and Test |
@@ -27,7 +27,8 @@ | |||
hasPSPSymStackSlot, | |||
hasGenericsInstContextStackSlot, | |||
hasStackBaseregister, | |||
wantsReportOnlyLeaf, | |||
wantsReportOnlyLeaf (AMD64 use only), | |||
hasTailCalls (ARM/ARM64 only) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comma is missed.
bool m_WantsReportOnlyLeaf; | ||
#elif defined(_TARGET_ARM_) || defined(_TARGET_ARM64_) | ||
bool m_HasTailCalls; | ||
#endif // _TARGET_AMD64_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong comment for #endif
, it should be #endif defined(_TARGET_ARM_) || defined(_TARGET_ARM64_)
according to https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/clr-jit-coding-conventions.md#75-commenting-ifdefs, but it is not jit sources, so maybe we do not care.
and defined(_TARGET_ARM_) || defined(_TARGET_ARM64_)
== _TARGET_ARMARCH_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the coding convention doc actually doesn't cover the #elif pattern and in coreclr runtime, we use the pattern I've used (the comment on #endif matching the #if and not the #elif comment)
This change fixes an issue that can happen when a function that has tail
calls is hijacked. There are two potential issues:
When a function that tail calls another one is hijacked, the LR may be
stored at a different location in the stack frame of the tail call
target.
So just by performing tail call, the hijacked location becomes invalid and
unhijacking would corrupt stack by writing to that location.
There is a small window after the caller pops LR from the stack in its
epilog and before the tail called function pushes LR in its prolog when
the hijacked return address would not be not on the stack and so we would
not be able to unhijack.
The fix is to prevent hijacking of functions that contain tail calls.