-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix stack walking for new EH #99624
Fix stack walking for new EH #99624
Conversation
Tagging subscribers to this area: @mangod9 |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
There is an edge case when the StackFrameIterator was incorrectly setting the fShouldParentFrameUseUnwindTargetPCforGCReporting, which resulted in a 100% reproducible GC hole in the baseservices\exceptions\unittests\ThrowInFinallyNestedInTry with GC stress C and tiered compilation off. The fix is to delete code that is already present in an "if" branch above the change and that should really be executed only when the funclet parent is the caller of the actual handler frame.
975dedc
to
0d71a2f
Compare
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
581cdcd
to
fcbfa2c
Compare
The fix for the last issue needs to be a bit involved, it seems - it is hitting an assert I've added there, so there seem to be a case when a skipped frame isn't one that needs register updates or it doesn't point to the managed frame. I need to check what is such case. |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
2858a9b
to
850a0c4
Compare
InlinedCallFrame needs to be excluded from updating the context pointers, as it restores PC / SP to a different location than the callsite of the pinvoke and it doesn't update the context pointers anyways.
850a0c4
to
49a2b8b
Compare
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
Only two GC stress legs have failures due to timeout in a test The arm32 happens even without the new EH enabled - see #99410 |
Windows x64 improvements: dotnet/perf-autofiling-issues#31934 |
There are three issues caused by the new exception handling implementation causing GC holes. These were discovered by the CI GC stress tests and mostly occur only with GC stress 3 and tiered compilation disabled.
There is an edge case when the StackFrameIterator was incorrectly setting the
fShouldParentFrameUseUnwindTargetPCforGCReporting, which resulted in a 100% reproducible GC hole in the
baseservices\exceptions\unittests\ThrowInFinallyNestedInTry with GC stress C and tiered compilation off.
The fix is to delete code that is already present in an "if" branch above the change and that should really be executed only when the funclet parent is the caller of the actual handler frame.
When GC occurs when a finally funclet is on the stack and then next GC happens when we are in the managed code of the second pass of the same exception handling, it introduces a GC hole. The latter GC still needs to report some slots in the parent of the finallys so that next GC scan when there is a finally again needs to find those alive. The current code works when the parent of the finally is the same as the parent of the actual exception handling catch funclet, but in other cases, the stack walk was not reporting the slots. It cannot just report all slots in the parent, as some of them would already be dead. Only slots that the funclets would report should be reported.
The fix is to remember the PC, frame pointer and flags of the last GC reporting of a finally funclet and use that to report slots based off the frame pointer when no funclet of the same exception handling is found during a future GC.
Last issue is related to the usage of SEH to pass through native frames when stack walking hits CallDescrWorkerInternal frame. The SEH stack unwinding doesn't update and doesn't even pass out the context pointers, so if a non-volatile register stored in a native frame contains a reference to an object that got relocated, a GC hole occurs. The Interop/Interop test was hitting that problem on Windows.
The fix is to use context pointers from a skipped explicit frame to update context pointers in the current REGDISPLAY. When the SEH hits a managed frame in the 2nd pass, we restart exception handling on the context of the managed frame, so the explicit frame that was on the transition boundary between that managed frame and the just unwound native frames contains the context pointers we need.