Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mono]: Add dotnet-gcdump support. #88634

Merged
merged 10 commits into from
Jul 16, 2023

Conversation

lateralusX
Copy link
Member

@lateralusX lateralusX commented Jul 10, 2023

Add support for dotnet-gcdump on Mono.

The bulk of the change in ep-rt-mono.c is splitting the EventPipe provider implementation of Microsoft-DotNETRuntimeMonoProfiler and Microsoft-Windows-DotNETRuntime into separate source files.

New GC events consumed by dotnet-gcdump has been added to the new ep-rt-mono-runtime-provider.c source file.

PR disable experimental EventPipe mono profiler provider by default to preserve resources. It is possible to re-enable it by setting the following env variable, MONO_DIAGNOSTICS=--diagnostic-mono-profiler=enable

dotnet-gcdump depends on event order and that events gets emitted to EventPipe session after GCStop, or dotnet-gcdump will block on read, timeout and closing session that in turn will triggering flush of TraceEvent event cache. This is less of a problem on CoreCLR since it emits many more GC events compared to Mono's implementation (just emitting the needed events used by dotnet-gcdump), but underlying issues is that there is no way to make sure an event emitted into EventPipe will be seen by client, even if its getting flushed into IPC transport on runtime side. This happens because in nettrace v4 only first event after a sequence point will be directly delivered to client through TraceEvent library, rest of events will be cached until next sequence point is hit and then sorted delivered to TraceEvent client, but if there are no more events emitted into the session, then client will block on read, even if there are events in the TraceEvent client side cache.

If we are going to support start/stop event like expected by dotnet-gcdump more deterministically we would need to emit sequence points into EventPipe sessions acting like "barriers" to make sure a specific event gets delivered all the way to the client, pass TraceEvent cache. Until that happens, there is a small risk that tools like dotnet-gcdump won't see stop/end events triggering session timeout. If that happens the gcdump will still be complete, but user will need to wait for dotnet-gcdump to timeout (default 30 seconds) before writing file to disk.

@ghost ghost assigned lateralusX Jul 10, 2023
@lateralusX lateralusX changed the title WIP: [Mono] Add gcdump support. WIP: [Mono] Add dotnet-gcdump support. Jul 10, 2023
Switch to calculate live keywords and heap dump trigger.
Make Mono profiler provider disabled by default (experimental provider).
Issue introduced by old commit,
dotnet@136dfcd

Case where string len is 0 would fail the utf8 -> utf16 conversion
that in case would fail emitting any EventPipe event doing the conversion.

Mono only issue since CoreClr uses different code in generated
eventpipehepler source file.

.net7 backport candidate.
@lateralusX lateralusX marked this pull request as ready for review July 13, 2023 14:43
Copy link
Member

@lambdageek lambdageek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I primarily concentrated on the new heap dump code and assumed that move of the code that moved between files didn't change

@lambdageek
Copy link
Member

I guess my one comment is to be careful about assuming that read/write will write everything you expect in a single call. Usually we do a loop

Copy link
Member

@lambdageek lambdageek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except read/write

@lateralusX
Copy link
Member Author

LGTM, except read/write

Thanks, will fix, I overlooked it since its async safe, but could fail for other reasons that needs to re-write or re-read until all data is written or read. We have similar loops elsewhere so I will just adopt one of existing implementations.

@lateralusX
Copy link
Member Author

lateralusX commented Jul 13, 2023

@davmason, will try to get this merged by tomorrow, if you want to take a look before that it would be great. The majority of this is Mono only changes, but there is a small extension to session type, a new field has been added, indicating that the streaming thread has started (none streaming sessions will set it just before returning from ep_session_start_streaming). We need it on Mono side to make sure we give sessions a decent chance to start the streaming thread, before we fire the gc start event and perform a STW as part of foreground GC (suspending all managed threads, including the streaming threads on Mono). Unfortunately tools like dotnet-gcdump depends on this event being delivered (and not cached in buffer manager nor in TraceEvent library cache) within 5 seconds (normally not a problem), but if we get to STW before streaming thread gets a chance to start, start will hang in buffer manager until complete GC is done and we start to write out the other GC events.

@lateralusX lateralusX requested a review from vargaz as a code owner July 14, 2023 15:04
@lateralusX
Copy link
Member Author

Test failures are either known issues or failing on other PR's as well.

@lateralusX lateralusX merged commit f0d1e53 into dotnet:main Jul 16, 2023
lateralusX added a commit to lateralusX/runtime that referenced this pull request Jul 28, 2023
Subtle changes like:
dotnet@2f2aaae

Knocked out a couple of rundown events on Mono that was detected and
fixed in dotnet#88634.

It would been ideal to catch this issue as part of CI, but turns out
that our existing rundownvalidation test only tests a couple of rundown
events, and not any of the module/assembly events that was affected by
above change.

This commit adds validation that the following additional rundown events
are included in the rundown event stream:

RuntimeStart
MethodDCStopInit
MethodDCStopComplete
LoaderModuleDCStop
LoaderDomainModuleDCStop
LoadAssemblyDCStop
LoadAppDomainDCStop
lateralusX added a commit that referenced this pull request Aug 2, 2023
Subtle changes like:
2f2aaae

Knocked out a couple of rundown events on Mono that was detected and
fixed in #88634.

It would been ideal to catch this issue as part of CI, but turns out
that our existing rundownvalidation test only tests a couple of rundown
events, and not any of the module/assembly events that was affected by
above change.

This commit adds validation that the following additional rundown events
are included in the rundown event stream:

RuntimeStart
MethodDCStopInit
MethodDCStopComplete
LoaderModuleDCStop
LoaderDomainModuleDCStop
LoadAssemblyDCStop
LoadAppDomainDCStop
@ghost ghost locked as resolved and limited conversation to collaborators Aug 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants