-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix return address hijacking in the stack probing loop #28119
Fix return address hijacking in the stack probing loop #28119
Conversation
When return address hijacking occurs when the target thread is running in the stack probing loop in a method with large frame, the unwinder cannot unwind to the caller frame correctly. That results in a wrong stack slot being patched by the modified return address, leading to corruption of locals of the method being executed. This change fixes the problem by not attempting to hijack a method that's running in prolog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. Please get a cr and we can take for servicing this week.
There is some missing package problem unrelated to this change:
I wonder why the 1.0.1-prerelease-00005 version disappeared. @echesakovMSFT do you know if we have somehow disabled the previous source of the package after your change that created the new version 00006? |
@janvorli Hmm, I don't know. We shouldn't delete them, since these packages are used by 5.0 and 3.1 |
I can only find said preview 5 package in the dotnet5-trasport feed. My guess is it lived in the dotnet-core myget feed which was shutdown earlier today/yesterday. Not sure it would be safe to add a feed like Microsoft.NETCore.CoreDisTools @ dotnet5-transport to a 3.1 release branch. (cc: @mmitche, @garath, @clairernovotny) |
Looks like that's correct. The dotnet-core feed from dotnet.myget.org is now available as a sleet feed. That should work instead. If it makes sense and @mmitche agrees, I could manually publish it to the dotnet3.1-transport feed. |
/azp run coreclr-ci |
Azure Pipelines successfully started running 1 pipeline(s). |
When return address hijacking occurs when the target thread is running
in the stack probing loop in a method with large frame, the unwinder cannot
unwind to the caller frame correctly. That results in a wrong stack slot
being patched by the modified return address, leading to corruption of
locals of the method being executed.
This change fixes the problem by not attempting to hijack a method that's
running in prolog.
This change is not a port, as the .NET 5.0 uses a different method to do stack
probing. That method would require a too large change to be ported though,
so this is a targeted fix.
Customer impact
Customer (RavenDB) database servers processing customer data are crashing on a regular basis (few times a week) with SIGSEGV.
There is no workaround for the problem.
See dotnet/runtime#42885 for more details.
Regression?
No
Testing
Customer testing a custom libcoreclr.so based on 3.1.8 containing this fix. They were testing the change on their production servers for about a month. The intermittent crashes they were seeing on a regular basis have disappeared and their systems were working flawlessly.
Risk
Low, the change just prevents return address hijacking to happen in prolog of methods with large frames. Hijacking attempt is not required to succeed, there are other cases when we cannot hijack a method already, like when running in an epilog, when executing native code etc.