Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM: Include generic instantiations in stack frames #96440

Closed
wants to merge 33 commits into from

Conversation

hez2010
Copy link
Contributor

@hez2010 hez2010 commented Jan 3, 2024

We currently don't have generic instantiations in the stack frames, which sometimes makes it hard to locate and identify an exception by only seeing the stack trace without actually debugging the program.

Adding generic instantiations in the stack frames to resolve this issue.

Test code:

using System.Diagnostics;

new Foo<int, string>().Bar<float, string>();

class Foo<T, U>
{
    public void Bar<V, W>()
    {
        Console.WriteLine(new StackTrace());
    }
}

Before:

   at Foo`2[T,U].Bar[V,W]()
   at Program.<Main>$(String[] args)

After:

   at Foo`2[System.Int32,System.String].Bar[System.Single,System.String]()
   at Program.<Main>$(String[] args)

This can also help const generics (#89730) if we add the support for it in the future, as displaying the const value of const type arguments in the stack trace could be really helpful while diagnosing an exception.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 3, 2024
@ghost
Copy link

ghost commented Jan 3, 2024

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

We currently don't have generic instantiations in the stack frames, which sometimes makes it hard to locate an exception by only seeing the stack trace without actually debugging the program.

Adding generic instantiations in the stack frames to resolve this issue.

Test code:

using System.Diagnostics;

new Foo<int, string>().Bar<float, string>();

class Foo<T, U>
{
    public void Bar<V, W>()
    {
        Console.WriteLine(new StackTrace());
    }
}

Before:

   at Foo`2[T,U].Bar[V,W]()
   at Program.<Main>$(String[] args)

After:

   at Foo`2[[System.Int32, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Bar[Single,__Canon]()
   at Program.<Main>$(String[] args)

For a shared generic argument, it returns the canonical type as is because we didn't encode the exact type info in the crawl stack frame. But still better then nothing.

Author: hez2010
Assignees: -
Labels:

area-System.Diagnostics, community-contribution

Milestone: -

@hez2010 hez2010 force-pushed the feature/generic-stacktrace branch from b411c6f to 869f439 Compare January 4, 2024 03:32
@hez2010 hez2010 force-pushed the feature/generic-stacktrace branch from aa027dd to 2048e41 Compare January 4, 2024 04:20
@hez2010
Copy link
Contributor Author

hez2010 commented Feb 25, 2024

Now it's ready for review.

@hez2010 hez2010 marked this pull request as ready for review February 25, 2024 08:21
@hez2010
Copy link
Contributor Author

hez2010 commented Mar 14, 2024

I start to think that reporting System.__Canon is fine because we are already doing this for fatal internal clr errors.
Example:

   Fatal error. Internal CLR error. (0x80131506)
 -   at System.Collections.Generic.Dictionary`2[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Resize(Int32, Boolean)
 -   at System.Collections.Generic.Dictionary`2[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].TryInsert(System.__Canon, System.__Canon, System.Collections.Generic.InsertionBehavior)
 at MyCustomDictionaryExtensions.MergeDictionary(System.Collections.Generic.Dictionary`2<System.String,System.String>, System.Collections.Generic.Dictionary`2<System.String,System.String>, Boolean)

@MichalStrehovsky
Copy link
Member

is fine because we are already doing this for fatal internal clr errors.

Fatal CLR errors capture details that are important for the developers of the runtime because they're runtime bugs. They are not expected to be hit/consumed/processed by users. Mono has entirely different outputs for fatal runtime errors with even more confusing details. Stacks in fatal errors serve different purposes for different audiences.

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user facing behavior seemed fine to me though I still want to nail down what happens for optimized shared generic code frames.

In terms of the correctness and performance of the type system generics and reflection code I'm guessing @davidwrighton or @jkotas is probably a better person than I to look at that part.

@@ -40,27 +46,27 @@ public void StackTraceTest()
}
}

[MethodImpl(MethodImplOptions.NoInlining)]
[MethodImpl(MethodImplOptions.NoOptimization | MethodImplOptions.NoInlining)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to weaken the existing stackwalking test by only validating it against unoptimized code. You could create some alternate frames that are unoptimized and call those instead if the switch is set.

(The same thing applies in the other tests)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we should explicitly have a test that demonstrates the behavior of a stack trace that occurs in optimized shared generic code where the shared generics context is no longer in scope. Even if it doesn't have exact type naming we still want to ensure it degrades gracefully rather than doing anything bad (like crashing).

Was there ever an agreed upon behavior for what text gets shown for the generic parameters in an optimized shared generic frame? I believe the text "System.__Canon" has already leaked out in a few other places such as the error handling text @hez2010 mentioned or it has shown up in VS debugger callstacks for many years when debugging optimized generic code:
image

I feel like the cat is already out of the bag in terms of the revealing that type name so using it here is aligned with precedent elsewhere. I'd also be fine reverting to use the type parameter name if you think its worth trying to get the cat back into the bag @jkotas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not leak __Canon through public APIs. We have always treated leaking __Canon through public APIs as bug.

I agree with #96440 (comment) .

@@ -116,8 +116,16 @@ public static IEnumerable<object[]> ToString_TestData()
yield return new object[] { new StackFrame(), "MoveNext at offset {offset} in file:line:column {fileName}:{lineNumber}:{column}" + Environment.NewLine };
yield return new object[] { new StackFrame("FileName", 1, 2), "MoveNext at offset {offset} in file:line:column FileName:1:2" + Environment.NewLine };
yield return new object[] { new StackFrame(int.MaxValue), "<null>" + Environment.NewLine };
yield return new object[] { GenericMethod<string>(), "GenericMethod<T> at offset {offset} in file:line:column {fileName}:{lineNumber}:{column}" + Environment.NewLine };
yield return new object[] { GenericMethod<string, int>(), "GenericMethod<T,U> at offset {offset} in file:line:column {fileName}:{lineNumber}:{column}" + Environment.NewLine };
if (AppContext.TryGetSwitch("Switch.System.Diagnostics.StackTrace.ShowGenericInstantiations", out var showGenericInstantiations) && showGenericInstantiations)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this code that checks for the switch, but is there something which is setting the switch? I want to ensure the test is explicitly validating the behavior both with the switch enabled and disabled.

fProfilerRequiresGenericsContextForEnterLeave = TRUE;
}
END_PROFILER_CALLBACK();
methInfo->options = CorInfoOptions(methInfo->options|CORINFO_GENERICS_CTXT_KEEP_ALIVE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndyAyersMS - Do you know if the JIT is currently preserving the generic context when optimizations are disabled somehow? I expected this would be something we already do but looking around at how CORINFO_GENERICS_CTXT_KEEP_ALIVE is used now that didn't seem to be the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not.

Generally speaking the generic context is only kept alive if we see the CORINFO_GENERICS_CTXT_KEEP_ALIVE flag from the jithost, or if there is a runtime lookup that uses the context.

src/coreclr/vm/jitinterface.cpp Show resolved Hide resolved
@jkotas
Copy link
Member

jkotas commented Mar 15, 2024

the correctness and performance of the type system generics and reflection code I'm guessing @davidwrighton or @jkotas is probably a better person than I to look at that part.

Reliability of this opt-in switch is my top concern. This change takes a dependency on generic context being tracked correctly in new places. If we want do productize this switch, we need to have a test strategy for how we ensure that it is reliable.

@hez2010
Copy link
Contributor Author

hez2010 commented Apr 6, 2024

Seems that something around EH has changed after my last main branch merging resulting in GC hole issue happening again.
We need to fetch the inst under cooperative gc mode after fetching the generic context immediately, otherwise the GC may move the generic context and results in AV.

@hez2010
Copy link
Contributor Author

hez2010 commented Apr 6, 2024

While quite interesting that the AV only happens on Windows x86. Investigating.

@hez2010
Copy link
Contributor Author

hez2010 commented Apr 8, 2024

I was trying to load the exact MethodDesc while initializing the crawl frame, but loading it can possibly trigger a throw or a GC as it may need to create a new MethodDesc, so it's not the right place to load the exact MethodDesc.

@jkotas Do you have any suggestion on this?

Comment on lines +318 to +323
InstantiatedMethodDesc *pInstMD = InstantiatedMethodDesc::FindLoadedInstantiatedMethodDesc(th.GetMethodTable(), pConstructedFunc->GetMemberDef(), Instantiation(), false);

if (pInstMD != NULL)
{
pConstructedFunc = pInstMD;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should really be

pConstructedFunc = InstantiatedMethodDesc::FindOrCreateExactClassMethod(th.GetMethodTable(), pConstructedFunc);

but this would possibly throw or trigger a GC.

@jkotas
Copy link
Member

jkotas commented Apr 8, 2024

@jkotas Do you have any suggestion on this?

Avoid changing behavior of regular stackwalk - it would introduce both reliability bugs and performance regressions. Any extra work should be done only during the stackwalk that produces the diagnostic stacktrace and only when the special mode is enabled.

This goes back to my comment #96440 (comment) . I will leave it to @dotnet/area-system-diagnostics area owners on they would like to approach it.

In the current shape, the feature added by this PR addresses very niche scenario (it only helps when running debug build outside debugger, and one has to opt-in). The cost to productize it just may not be worth it.

@noahfalk
Copy link
Member

Hi @hez2010, I feel bad saying this after the PR has been open for so long but I don't think this one is going to work out. I was investigating and talking a bit with Jan and Andy - the problem looks deeper than just testing. In order for this approach to work the generic context needs to be reliable always and that isn't a guarantee the JIT currently makes in its tracking info. Even if great testing found every case where it doesn't currently work, its a cost both now and in the future for the entire JIT codebase to enforce stronger rules around that tracking. This feature appears too niche to make that tradeoff worthwhile. Since we didn't see any clear path to overcoming the reliability issue I think our best option is to close this PR. Sorry to be the bearer of bad news.

cc @jkotas @AndyAyersMS

@hez2010
Copy link
Contributor Author

hez2010 commented Apr 16, 2024

Okay. I'm closing the PR.
Thanks for all the advice and review!

@hez2010 hez2010 closed this Apr 16, 2024
@github-actions github-actions bot locked and limited conversation to collaborators May 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Diagnostics community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants