Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crashreport for stack overflow #57428

Merged
merged 4 commits into from
Aug 17, 2021
Merged

Fix crashreport for stack overflow #57428

merged 4 commits into from
Aug 17, 2021

Conversation

mikem8361
Copy link
Member

Issue: #57032

The stack overflow managed exception wasn't been set/written out.

Changed the "unmanaged_frames" value to "stack_frames" because it includes both native and managed frames.

Aggregated the repeated stack frames adding a "repeated"/repeated_frames value/array for the time of times the sequence was repeated.

Looks like this:

{
  "repeated" : "0x1542",
  "repeated_frames" : [
   {
    "is_managed" : "true",
    "module_address" : "0x10e402000",
    "stack_pointer" : "0x7000045ce020",
    "native_address" : "0x11b9ee8c9",
    "native_offset" : "0x29",
    "token" : "0x600006f",
    "il_offset" : "0x0",
    "method_name" : "Macson.Client.Diagnostics.CrashyClass.Foo2()",
    "timestamp" : "0xa8561820",
    "sizeofimage" : "0xe000",
    "filename" : "Macson.Client.dll",
    "guid" : "49cd869a682942bc95c0c34ca206c61d"
   },
   {
    "is_managed" : "true",
    "module_address" : "0x10e402000",
    "stack_pointer" : "0x7000045ce040",
    "native_address" : "0x11b9ee879",
    "native_offset" : "0x29",
    "token" : "0x6000070",
    "il_offset" : "0x0",
    "method_name" : "Macson.Client.Diagnostics.CrashyClass.Foo1()",
    ...
   },
  ]
}

@ghost
Copy link

ghost commented Aug 15, 2021

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

Issue: #57032

The stack overflow managed exception wasn't been set/written out.

Changed the "unmanaged_frames" value to "stack_frames" because it includes both native and managed frames.

Aggregated the repeated stack frames adding a "repeated"/repeated_frames value/array for the time of times the sequence was repeated.

Looks like this:

{
  "repeated" : "0x1542",
  "repeated_frames" : [
   {
    "is_managed" : "true",
    "module_address" : "0x10e402000",
    "stack_pointer" : "0x7000045ce020",
    "native_address" : "0x11b9ee8c9",
    "native_offset" : "0x29",
    "token" : "0x600006f",
    "il_offset" : "0x0",
    "method_name" : "Macson.Client.Diagnostics.CrashyClass.Foo2()",
    "timestamp" : "0xa8561820",
    "sizeofimage" : "0xe000",
    "filename" : "Macson.Client.dll",
    "guid" : "49cd869a682942bc95c0c34ca206c61d"
   },
   {
    "is_managed" : "true",
    "module_address" : "0x10e402000",
    "stack_pointer" : "0x7000045ce040",
    "native_address" : "0x11b9ee879",
    "native_offset" : "0x29",
    "token" : "0x6000070",
    "il_offset" : "0x0",
    "method_name" : "Macson.Client.Diagnostics.CrashyClass.Foo1()",
    ...
   },
  ]
}
Author: mikem8361
Assignees: mikem8361
Labels:

area-Diagnostics-coreclr

Milestone: -

@mikem8361
Copy link
Member Author

/cc: @kdubau

Issue: dotnet#57032

The stack overflow managed exception wasn't been set/written out.

Changed the "unmanaged_frames" value to "stack_frames" because it includes both native and managed frames.

Aggregated the repeated stack frames adding a "repeated"/repeated_frames value/array for the time of times the sequence was repeated.

Looks like this:

    {
      "repeated" : "0x1542",
      "repeated_frames" : [
       {
        "is_managed" : "true",
        "module_address" : "0x10e402000",
        "stack_pointer" : "0x7000045ce020",
        "native_address" : "0x11b9ee8c9",
        "native_offset" : "0x29",
        "token" : "0x600006f",
        "il_offset" : "0x0",
        "method_name" : "Macson.Client.Diagnostics.CrashyClass.Foo2()",
        "timestamp" : "0xa8561820",
        "sizeofimage" : "0xe000",
        "filename" : "Macson.Client.dll",
        "guid" : "49cd869a682942bc95c0c34ca206c61d"
       },
       {
        "is_managed" : "true",
        "module_address" : "0x10e402000",
        "stack_pointer" : "0x7000045ce040",
        "native_address" : "0x11b9ee879",
        "native_offset" : "0x29",
        "token" : "0x6000070",
        "il_offset" : "0x0",
        "method_name" : "Macson.Client.Diagnostics.CrashyClass.Foo1()",
        ...
       },
      ]
    }
docs/design/coreclr/botr/xplat-minidump-generation.md Outdated Show resolved Hide resolved
src/coreclr/debug/createdump/main.cpp Show resolved Hide resolved
src/coreclr/debug/createdump/threadinfo.h Outdated Show resolved Hide resolved
src/coreclr/debug/createdump/threadinfo.h Outdated Show resolved Hide resolved
src/coreclr/debug/createdump/threadinfo.h Outdated Show resolved Hide resolved
src/coreclr/debug/createdump/crashreportwriter.cpp Outdated Show resolved Hide resolved
src/coreclr/debug/createdump/threadinfo.h Outdated Show resolved Hide resolved
{
// Aggregated the repeated stack frames only for stack overflow exceptions
if (m_exceptionHResult == STACK_OVERFLOW_EXCEPTION)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will still fail to detect native stack overflows I believe (or stack overflows with the top frames being native). Do our partners care much about this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way that I know to reliably see this is if the fault is a segv within a page of the stack limit the pal gives you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it won't detect native stack overflow unless the native overflow gets turned into a managed stack overflow exception. I'll have to check the exception code.

// Check for repeats through all the stack frames so far until we find one
if (m_beginRepeat == m_frames.end())
{
for (std::set<StackFrame>::iterator iterator = m_frames.begin(); iterator != m_frames.end(); ++iterator)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the const recommendations apply here too.

Can you please also run it through this one? This one doesn't do the shortening when writing to console.

using System.Runtime.CompilerServices;

Console.WriteLine("Hello, World!");
await SillyMethod(1);

async Task SillyMethod(int depth)
{
    if (depth > 3000) A();
    await SillyMethod(depth + 1);

    return;
}

[MethodImplAttribute (MethodImplOptions.NoInlining)]
void A() => B();

[MethodImplAttribute (MethodImplOptions.NoInlining)]
void B() => C();

[MethodImplAttribute (MethodImplOptions.NoInlining)]
void C() => D();

[MethodImplAttribute (MethodImplOptions.NoInlining)]
void D() => E();

[MethodImplAttribute (MethodImplOptions.NoInlining)]
void E() => F();

[MethodImplAttribute (MethodImplOptions.NoInlining)]
void F() {};

Do you think we should consider capping the number of frames even if we don't find any sequence to avoid unmanageable json files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above test generates the repeated frames correctly.

     "repeated" : "0x7fca",
      "repeated_frames" : [
       {
        "is_managed" : "true",
        "module_address" : "0x109d25000",
        "stack_pointer" : "0x70000a28b010",
        "native_address" : "0x117e3b1fe",
        "native_offset" : "0x1e",
        "token" : "0x6000007",
        "il_offset" : "0x6",
        "method_name" : "ConsoleApp2.Program.SOF()",
        "timestamp" : "0xe3e86a72",
        "sizeofimage" : "0x8000",
        "filename" : "stackover2.dll",
        "guid" : "45550742ab2a4934b9df93f507a166d6"
       }
      ]
     },

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think about the number of frames cap. I assume you mean some limit on the threadinfo.cpp line 301 for that is searching for the first match? The problem is picking the number. Is a 1000 enough frames? This code is typically just searching through the initial native frames on the top of stack which shouldn't really be that many.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to the state machine in that case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE the cap: Say you are in this case of stack overflow for weird reasons. At most you should be trying to write a manageable number of frames into the report. After that it's not particularly useful and just causes more problems than helps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should cap the number of repeated frames once we have a begin stack frame if that is what your are suggesting. I'm suggesting limiting the number of times the loop at line 301 is searching for the begin repeat. If we hit that limit, it will just add it like a normal frame, but it unless I set some flag, it will iterate through all the frames again (to the limit) on the next frame added.

src/coreclr/debug/createdump/threadinfo.cpp Show resolved Hide resolved
@Anipik
Copy link
Contributor

Anipik commented Aug 17, 2021

@hoyosjs can you take a final look here ?

@Anipik
Copy link
Contributor

Anipik commented Aug 17, 2021

@mikem8361 Is there any more work here that needs to be done or can we go ahead and merge this one ?

@Anipik
Copy link
Contributor

Anipik commented Aug 17, 2021

The failure in the previous attempt in the ongoing leg is #57452

@mikem8361
Copy link
Member Author

mikem8361 commented Aug 17, 2021 via email

@Anipik Anipik merged commit f81e82d into dotnet:main Aug 17, 2021
@mikem8361 mikem8361 deleted the macosreport branch August 17, 2021 20:33
@ghost ghost locked as resolved and limited conversation to collaborators Sep 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix stack overflow problems in createdump crash report generation
3 participants