-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[x86/Linux] Fix crash when GC triggered inside funclets #10188
Conversation
Current patch makes those two TCs pass, I just copied the added code from same code of USE_GC_INFO_DECODER on. Here are a list that I'm not sure about.
|
src/vm/eetwain.cpp
Outdated
@@ -4249,6 +4249,18 @@ bool EECodeManager::EnumGcRefs( PREGDISPLAY pContext, | |||
|
|||
#endif // _DEBUG | |||
|
|||
#ifdef WIN64EXCEPTIONS // funclets | |||
// | |||
// If we're in a funclet, we do not want to report the incoming varargs. This is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is skipping way more than just the incoming varargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It crashes while Process the untracked frame variable table. Would funclets need to do something with these untracked variables? I thought they are handled by main function frame.
By the way does untracked frame variable mean ref-type local variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wateret I believe the only difference between untracked and tracked variables is that the untracked are assumed to be alive in the whole function while the tracked ones liveness is tracked precisely (by instruction address ranges). And only locals containing references or byrefs are reported. @BruceForstall please correct me if I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvorli that's correct
In the early days, CLR was written just for x86 and the GC info encoding was very x86 specific. When we have started porting to other platforms, portable GcInfoDecoder was introduced but x86 was never switched to it because of the GcInfoDecoder would not be as efficient on x86 and it is work to switch it. |
@jkotas I updated the commit to skip untracked frame variables for funclets. Still not sure if this is a right way though. (Indentation is not modified for diff readability) |
When scanning the stack roots it skips scanning untracked frame variables for funclets
The delta looks good to me now. Are you doing any GC stress testing on Linux x86? It is the best way to verify whether the GC reporting works correctly. |
@jkotas I once tried, but there were too many stack alignment issues. I'm currently working on fixing them. |
@jkotas The current patch only skips |
Agree. This looks like a problem. |
@jkotas Fixed that and the indentation. |
@wateret, @jkotas so is the current state of the PR correct? Looking at what we do for the target architectures where we use GCInfoDecoder, we only skip reporting untracked variables for filter funclets there. The comment in that code says:
But this code skips reporting untracked variables for all funclets. |
No, the non-exceptional finallys are called via a call on the other architectures as well. The reporting here needs to be in sync with what the JIT does. The best way to verify whether it is in sync is to run GC stress. |
@jkotas @janvorli @wateret I found that this PR is necessary to run 'Hello, World' example under GCStress=3. Without this, CLR hits the following assert failure:
|
@janvorli Oh I see. That means we shouldn't skip untracked variables. However wouldn't we double report 2nd pass funclets as well as filters? I thought as long as we have a funclet frame, we have main function's frame upper in stack. |
No matter it double reports roots or not, it is strange that it crashes here. This is tested without this PR.
Crashed when (gcCount = 2) |
@wateret could you please additionally set trEnumGCRefs to true (and also keep the dspPtr set to true as you did now) and get me the full GC log both with and without your fix via gist (https://gist.github.com/)? |
@janvorli Here is the log and backtrace you mentioned. https://gist.github.com/wateret/43c18a5006ff52a9621b81c2c10caaf3 |
@janvorli In the log I put up on Gist, is this double-relocation for one GC? https://gist.github.com/wateret/43c18a5006ff52a9621b81c2c10caaf3#file-dump-L3853 (funclet)
It looks like relocation happened twice in GC count 1. Am I right? The crash occurs when promoting |
@janvorli It seems like there is a way not to double-report for those that use GCInfoDecoder. On arm32/Linux, I get the below message in main function. The funclet reports but main function does not.
|
@wateret so it seems that an oposite approach to the one you have used in this PR is used there. // In order to make ARM more x86-like we only ever report the leaf frame
// of any given function. We accomplish this by having the stackwalker
// pass a flag whenever walking the frame of a method where it has
// previously visited a child funclet
if (WantsReportOnlyLeaf() && (inputFlags & ParentOfFuncletStackFrame))
{
LOG((LF_GCROOTS, LL_INFO100000, "Not reporting this frame because it was already reported via another funclet.\n"));
return true;
} The WantsReportOnlyLeaf() is a bit set by the GC encoder on request from the JIT if The ParentOfFuncletStackFrame in the input flags is passed by the stack walker to the EECodeManager::EnumGcRefs when it finds a frame that's a parent of a funclet. So I think we should do the same thing for x86 Linux. I'm actually not sure why the WantsReportOnlyLeaf() check is necessary, since if the stack walker reports that the current frame is a parent of a funclet, then there must be a funclet in the function. So I would suggest that instead of your fix, we would add the following to the beginning of EECodeManager::EnumGcRefs for x86 Unix: if (inputFlags & ParentOfFuncletStackFrame)
{
LOG((LF_GCROOTS, LL_INFO100000, "Not reporting this frame because it was already reported via another funclet.\n"));
return true;
} |
@janvorli Thank you, I update the code. For the sake of lifetime and arg reg table, GCInfoDecoder's approach(funclet does report, main function does not) looks more suitable. |
@dotnet-bot test OSX10.12 x64 Checked Build and Test please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
@wateret I assume the change fixed the problem and it can be merged, right? |
@janvorli Yes, I found no regression on my local unit test(pri0). |
@dotnet-bot test Ubuntu x64 Checked Build and Test please |
@janvorli Could we merge this please? |
@wateret I am sorry for the delay. |
cc: @jkotas @janvorli @parjong
Related Issue : #10187