Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[x86/Linux] Fix crash when GC triggered inside funclets #10188

Merged
merged 3 commits into from
Apr 4, 2017

Conversation

wateret
Copy link
Member

@wateret wateret commented Mar 15, 2017

cc: @jkotas @janvorli @parjong

Related Issue : #10187

@wateret
Copy link
Member Author

wateret commented Mar 15, 2017

Current patch makes those two TCs pass, I just copied the added code from same code of USE_GC_INFO_DECODER on.

Here are a list that I'm not sure about.

  • Currently funclets has nothing on its stack so there is nothing that can be a GC root. If so, we still have REG_EXCEPTION_OBJECT(eax), but I'm not sure that it is handled.
  • Not sure if the added code is at the right place in EnumGcRefs().
  • Why does not X86/Linux use GcInfoDecoder?

@@ -4249,6 +4249,18 @@ bool EECodeManager::EnumGcRefs( PREGDISPLAY pContext,

#endif // _DEBUG

#ifdef WIN64EXCEPTIONS // funclets
//
// If we're in a funclet, we do not want to report the incoming varargs. This is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is skipping way more than just the incoming varargs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It crashes while Process the untracked frame variable table. Would funclets need to do something with these untracked variables? I thought they are handled by main function frame.

By the way does untracked frame variable mean ref-type local variables?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wateret I believe the only difference between untracked and tracked variables is that the untracked are assumed to be alive in the whole function while the tracked ones liveness is tracked precisely (by instruction address ranges). And only locals containing references or byrefs are reported. @BruceForstall please correct me if I am wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janvorli that's correct

@jkotas
Copy link
Member

jkotas commented Mar 15, 2017

Why does not X86/Linux use GcInfoDecoder?

In the early days, CLR was written just for x86 and the GC info encoding was very x86 specific. When we have started porting to other platforms, portable GcInfoDecoder was introduced but x86 was never switched to it because of the GcInfoDecoder would not be as efficient on x86 and it is work to switch it.

@wateret
Copy link
Member Author

wateret commented Mar 20, 2017

@jkotas I updated the commit to skip untracked frame variables for funclets. Still not sure if this is a right way though. (Indentation is not modified for diff readability)

When scanning the stack roots
it skips scanning untracked frame variables for funclets
@jkotas
Copy link
Member

jkotas commented Mar 20, 2017

The delta looks good to me now. Are you doing any GC stress testing on Linux x86? It is the best way to verify whether the GC reporting works correctly.

@parjong
Copy link

parjong commented Mar 21, 2017

@jkotas I once tried, but there were too many stack alignment issues. I'm currently working on fixing them.

@wateret
Copy link
Member Author

wateret commented Mar 21, 2017

@jkotas The current patch only skips Process the untracked frame variable table part, but it still do Process the frame variable lifetime table for funclets. I'm not sure we need this for funclet.
I also thought this is a bit strange because we need to (at least) move table by skipping untracked table so that we can correctly process variable lifetime table.

@jkotas
Copy link
Member

jkotas commented Mar 21, 2017

I also thought this is a bit strange because we need to (at least) move table by skipping untracked table

Agree. This looks like a problem.

@wateret
Copy link
Member Author

wateret commented Mar 22, 2017

@jkotas Fixed that and the indentation.

@janvorli
Copy link
Member

@wateret, @jkotas so is the current state of the PR correct? Looking at what we do for the target architectures where we use GCInfoDecoder, we only skip reporting untracked variables for filter funclets there. The comment in that code says:

        // Filters are the only funclet that run during the 1st pass, and must have
        // both the leaf and the parent frame reported.  In order to avoid double
        // reporting of the untracked variables, do not report them for the filter.

But this code skips reporting untracked variables for all funclets.
Thinking about it, it seems that the same rule that the comment mentions should apply to the non-exceptionally invoked finallys (both their frame and all the frames upto the containing function frame are alive on the stack), which would cover the issue this PR is fixing.
@jkotas are the non-exceptionally called finallys always inlined on the other architectures so that they don't require the special treatment or am I missing something?

@jkotas
Copy link
Member

jkotas commented Mar 27, 2017

@jkotas are the non-exceptionally called finallys always inlined on the other architectures

No, the non-exceptional finallys are called via a call on the other architectures as well.

The reporting here needs to be in sync with what the JIT does. The best way to verify whether it is in sync is to run GC stress.

@BruceForstall BruceForstall self-requested a review March 27, 2017 16:42
@parjong
Copy link

parjong commented Mar 28, 2017

@jkotas @janvorli @wateret I found that this PR is necessary to run 'Hello, World' example under GCStress=3. Without this, CLR hits the following assert failure:

TID 7f35: STACKWALK    TransitionFrame::UpdateRegDisplay(ip:F25D141F, sp:FFFFC3B0)
TID 7f35: STACKWALK: [002] CONSIDER: FRAMELESS: PC= f25d141f  SP= ffffc3b0  method=IL_STUB_PInvoke [funclet]
TID 7f35: STACKWALK: Found Non-Filter funclet @ SP: FFFFC3B0, m_crawl.pFunc = F6338D34; FuncletParentCallerSP: FFFFC430
TID 7f35: STACKWALK: [002] CALLBACK: FRAMELESS: PC= f25d141f  SP= ffffc3b0  method=IL_STUB_PInvoke [funclet]
TID 7f35: Scanning Frameless method F6338D34M EIP = F25D141F &EIP = FFFFC3AC
TID 7f35: Scanning Frame for method DomainBoundILStubClass:IL_STUB_PInvoke

Assert failure(PID 32565 [0x00007f35], Thread: 32565 [0x7f35]): !CREATE_CHECK_STRING(pMT && pMT->Validate())
    File: /home/parjong/projects/dotnet/coreclr/src/vm/object.cpp Line: 1719
    Image: /home/parjong/projects/dotnet/dotnet-overlay/Linux.x86.Debug.Debug/corerun

@wateret
Copy link
Member Author

wateret commented Mar 29, 2017

Looking at what we do for the target architectures where we use GCInfoDecoder, we only skip reporting untracked variables for filter funclets there.

@janvorli Oh I see. That means we shouldn't skip untracked variables. However wouldn't we double report 2nd pass funclets as well as filters? I thought as long as we have a funclet frame, we have main function's frame upper in stack.

@wateret
Copy link
Member Author

wateret commented Mar 29, 2017

No matter it double reports roots or not, it is strange that it crashes here.
Last part of output with GC logs. Backtrace is in #10187

This is tested without this PR.

TID 0008:     IGCHeap::Promote: Promote GC Root *FFFFC9F0 = F2E1757C MT = F63B81BCT
TID 0008: STACKWALK: [007] CONSIDER: FRAMELESS: PC= f4f6c379  SP= ffffca10  method=WriteLine
TID 0008: STACKWALK: [007] CALLBACK: FRAMELESS: PC= f4f6c379  SP= ffffca10  method=WriteLine
TID 0008: Scanning Frameless method F255E434M EIP = F4F6C379 &EIP = FFFFC93C
TID 0008: Scanning Frame for method System.IO.SyncTextWriter:WriteLine
TID 0008:     IGCHeap::Promote: Promote GC Root *FFFFCA1C = F2E17654 MT = F255E590T
TID 0008: STACKWALK: [008] CONSIDER: FRAMELESS: PC= f4f6c338  SP= ffffca30  method=WriteLine
TID 0008: STACKWALK: [008] CALLBACK: FRAMELESS: PC= f4f6c338  SP= ffffca30  method=WriteLine
TID 0008: Scanning Frameless method F63B5ED8M EIP = F4F6C338 &EIP = FFFFC93C
TID 0008: Scanning Frame for method System.Console:WriteLine
TID 0008: STACKWALK: [009] CONSIDER: FRAMELESS: PC= f4f6ca1f  SP= ffffca40  method=foo
TID 0008: STACKWALK: [009] CALLBACK: FRAMELESS: PC= f4f6ca1f  SP= ffffca40  method=foo
TID 0008: Scanning Frameless method F63B25A8M EIP = F4F6CA1F &EIP = FFFFC93C
TID 0008: Scanning Frame for method Class1:foo
    Frame  local at [EBP-0CH]:     Untracked  local at [EBP-1CH]:
TID 0008:     IGCHeap::Promote: Promote GC Root *FFFFCA9C = F31991E0 MT = F6348384T
    Untracked  local at [EBP-20H]:
    Untracked  local at [EBP-24H]:
    Untracked  local at [EBP-28H]:
    Untracked  local at [EBP-2CH]:
    Untracked  local at [EBP-30H]:
    Untracked  local at [EBP-34H]:
    Untracked  local at [EBP-38H]:
    Untracked  local at [EBP-3CH]:
    Untracked  local at [EBP-40H]:
    Untracked  local at [EBP-44H]:
    Untracked  local at [EBP-48H]:
    Untracked  local at [EBP-4CH]:
    Untracked  local at [EBP-50H]:
    Untracked  local at [EBP-54H]:
    Untracked  local at [EBP-58H]:
    Untracked  local at [EBP-5CH]:
    Untracked  local at [EBP-60H]:
    Untracked  local at [EBP-64H]:
    Untracked  local at [EBP-68H]:
    Untracked  local at [EBP-6CH]:
    Untracked  local at [EBP-70H]:
TID 0008:     IGCHeap::Promote: Promote GC Root *FFFFCA48 = F2E17750 MT = F59E3238T
    Untracked  local at [EBP-74H]:
TID 0008:     IGCHeap::Promote: Promote GC Root *FFFFCA44 = F31991E0 MT = F6348384T
TID 0008: STACKWALK: [00a] CONSIDER: FRAMELESS: PC= f4f6ce1a  SP= ffffcac0  method=foo [funclet]
TID 0008: STACKWALK: Found Non-Filter funclet @ SP: FFFFCAC0, m_crawl.pFunc = F63B25A8; FuncletParentCallerSP: FFFFCB50
TID 0008: STACKWALK: [00a] CALLBACK: FRAMELESS: PC= f4f6ce1a  SP= ffffcac0  method=foo [funclet]
TID 0008: Scanning Frameless method F63B25A8M EIP = F4F6CE1A &EIP = FFFFC93C
TID 0008: Scanning Frame for method Class1:foo
    Untracked  local at [EBP-1CH]:

Program received signal SIGSEGV, Segmentation fault.
0xf6d5873a in MethodTable::GetFlag (this=0x61006c,
    flag=MethodTable::enum_flag_HasComponentSize)
    at /home/hanjoung/ws/dotnet/coreclr/src/vm/methodtable.h:3997
3997            return m_dwFlags & flag;

Crashed when (gcCount = 2)

@janvorli
Copy link
Member

@wateret could you please additionally set trEnumGCRefs to true (and also keep the dspPtr set to true as you did now) and get me the full GC log both with and without your fix via gist (https://gist.github.com/)?
Also, in the failure case, could you please add stacktrace at the time of failure? I would like to see the parameter values in the callstack in relation to the GC log.

@wateret
Copy link
Member Author

wateret commented Mar 29, 2017

@janvorli Here is the log and backtrace you mentioned. https://gist.github.com/wateret/43c18a5006ff52a9621b81c2c10caaf3

@wateret
Copy link
Member Author

wateret commented Mar 30, 2017

@janvorli In the log I put up on Gist, is this double-relocation for one GC?

https://gist.github.com/wateret/43c18a5006ff52a9621b81c2c10caaf3#file-dump-L3853 (funclet)
https://gist.github.com/wateret/43c18a5006ff52a9621b81c2c10caaf3#file-dump-L3889 (main function)

3853 TID 0008:     GC Root FFFFCB1C RELOCATED F2E35E9C -> F2E177CC  MT = F6348384T
3889 TID 0008:     GC Root FFFFCB1C RELOCATED F2E177CC -> F2E094FC  MT = 006E0068T

It looks like relocation happened twice in GC count 1. Am I right? The crash occurs when promoting *FFFFCB1C on GC count 2.

@wateret
Copy link
Member Author

wateret commented Mar 30, 2017

@janvorli It seems like there is a way not to double-report for those that use GCInfoDecoder.
https://github.com/dotnet/coreclr/blob/release/1.0.0-rc2/src/vm/gcinfodecoder.cpp#L574-L582

On arm32/Linux, I get the below message in main function. The funclet reports but main function does not.

TID 5798: Not reporting this frame because it was already reported via another funclet.

@janvorli
Copy link
Member

@wateret so it seems that an oposite approach to the one you have used in this PR is used there.
If you look at the , you'll see this code at the beginning:

    // In order to make ARM more x86-like we only ever report the leaf frame
    // of any given function. We accomplish this by having the stackwalker
    // pass a flag whenever walking the frame of a method where it has
    // previously visited a child funclet
    if (WantsReportOnlyLeaf() && (inputFlags & ParentOfFuncletStackFrame))
    {
        LOG((LF_GCROOTS, LL_INFO100000, "Not reporting this frame because it was already reported via another funclet.\n"));
        return true;
    }

The WantsReportOnlyLeaf() is a bit set by the GC encoder on request from the JIT if compiler->ehAnyFunclets() returns true, which means that it is set for all functions that contain any funclets.

The ParentOfFuncletStackFrame in the input flags is passed by the stack walker to the EECodeManager::EnumGcRefs when it finds a frame that's a parent of a funclet.

So I think we should do the same thing for x86 Linux. I'm actually not sure why the WantsReportOnlyLeaf() check is necessary, since if the stack walker reports that the current frame is a parent of a funclet, then there must be a funclet in the function.

So I would suggest that instead of your fix, we would add the following to the beginning of EECodeManager::EnumGcRefs for x86 Unix:

    if (inputFlags & ParentOfFuncletStackFrame)
    {
        LOG((LF_GCROOTS, LL_INFO100000, "Not reporting this frame because it was already reported via another funclet.\n"));
        return true;
    }

@wateret
Copy link
Member Author

wateret commented Mar 31, 2017

@janvorli Thank you, I update the code. For the sake of lifetime and arg reg table, GCInfoDecoder's approach(funclet does report, main function does not) looks more suitable.

@janvorli
Copy link
Member

janvorli commented Apr 1, 2017

@dotnet-bot test OSX10.12 x64 Checked Build and Test please

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@janvorli
Copy link
Member

janvorli commented Apr 1, 2017

@wateret I assume the change fixed the problem and it can be merged, right?

@wateret
Copy link
Member Author

wateret commented Apr 3, 2017

@janvorli Yes, I found no regression on my local unit test(pri0).

@wateret wateret changed the title [WIP] [x86/Linux] Fix crash when GC triggered inside funclets [x86/Linux] Fix crash when GC triggered inside funclets Apr 3, 2017
@wateret
Copy link
Member Author

wateret commented Apr 3, 2017

@dotnet-bot test Ubuntu x64 Checked Build and Test please

@wateret
Copy link
Member Author

wateret commented Apr 4, 2017

@janvorli Could we merge this please?

@janvorli janvorli merged commit 8525460 into dotnet:master Apr 4, 2017
@janvorli
Copy link
Member

janvorli commented Apr 4, 2017

@wateret I am sorry for the delay.

@karelz karelz modified the milestone: 2.0.0 Aug 28, 2017
@wateret wateret deleted the wip/x86-gc branch October 13, 2017 07:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants