Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 2 issues when triggering a garbage collection via EventPipe #102612

Closed
wants to merge 2 commits into from

Conversation

chrisnas
Copy link
Contributor

@chrisnas chrisnas commented May 23, 2024

Fix the following issues:

I've tested with dotnet-gcstats (for the client sequence number + number of GCs) and dotnet-fullgc (to trigger a GC with a custom client sequence number)

@noahfalk / @brianrob: I'm not sure to understand why the enabled/disabled state was based on the number of session being > 0 because this number is always >= 1 when the functions are called.

- dotnet#99487: avoid double GC
- dotnet#102572: handle client sequence number
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 23, 2024
@tommcdon
Copy link
Member

@noahfalk @brianrob ptal

@brianrob
Copy link
Member

Thanks @chrisnas! One question - can you also test this with ETW? I suspect that the ETW runtime is going to set FilterData differently. I'll also have to defer to @noahfalk on the EventPipe specific file.

@davmason davmason self-requested a review June 18, 2024 20:45
@chrisnas
Copy link
Contributor Author

Thanks @chrisnas! One question - can you also test this with ETW? I suspect that the ETW runtime is going to set FilterData differently.

I did not try @brianrob...
Do you know the command line I should use with perfview?

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to take a PR to fix this, but we need to modify the approach. Suggestions inline. Thanks!

@@ -2403,18 +2403,30 @@ VOID EtwCallbackCommon(
// Special check for the runtime provider's ManagedHeapCollectKeyword. Profilers
// flick this to force a full GC.
if (g_fEEStarted && !g_fEEShutDown && bIsPublicTraceHandle &&
((MatchAnyKeyword & CLR_MANAGEDHEAPCOLLECT_KEYWORD) != 0))
#if !defined(HOST_UNIX)
(ControlCode == EVENT_CONTROL_CODE_ENABLE_PROVIDER) && // only when the provider is enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unforetunately checking the control code for ENABLE_PROVIDER is only going to give you a partial fix. ETW convention (which EventPipe also follows) is that if there is at least one session still active for a given provider then disable still invokes the callback with ControlCode=ENABLE_PROVIDER.

I think there are two things that would potentially work:

  1. For ETW we could bring back filtering for the CAPTURE_STATE+ENABLE control code. It looks like c16fd29 removed all the filtering on GC heap snapshots so after that they would have been triggered by any enable, disable or capture state. We could narrow it back down to what it was originally: enable and capture state only. Enable on ETW is still broader than ideal, but it preserves back compat and ETW profilers can always use CAPTURE_STATE instead if they want to do it precisely.

  2. For EventPipe, you can distinguish a session disable from a session enable based on whether the sourceId parameter is NULL (Enable=not-null,Disable=null). That parameter isn't currently exposed in this CallbackCommon function but it is present in EventPipeEtwCallbackDotNETRuntime so you could change the signature on this common method to pass it through. Something like:

enum SessionChange
{
    EventPipeSessionEnable,
    EventPipeSessionDisable,
    EtwSessionChangeUnknown
}

EventPipeEtwCallbackDotNETRuntime(sourceId, ...)
{
  SessionChange change = sourceId != null ? EventPipeSessionEnable : EventPipeSessionDisable;
  EtwCallbackCommon(change, ...);
}

EtwCallback(...)
{
   EtwCallbackCommon(EtwSessionChangeUnknown, ...);
}

if ((FilterData != NULL) &&
(FilterData->Type == 1) &&
(FilterData->Size == sizeof(l64ClientSequenceNumber)))
if ((FilterData != NULL) && (FilterData->Type == 0) && (FilterData->Size > 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to define a new convention to pass this data which works for EventPipe (thats fine), but breaks the previous convention that was used by ETW profilers. I'm fine to add a new EventPipe-friendly convention but the old one needs to keep working.

clientSequenceNumberAsString += strlen(clientSequenceNumberAsString) + 1;

// extract the client sequence number from the string
int result = _snscanf_s(clientSequenceNumberAsString, FilterData->Size - (strlen((char*)FilterData->Ptr) + 1), "%I64u", &l64ClientSequenceNumber);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to create a new convention based on keys and values I think it would be good to handle potentially multiple key-value pairs being specified at the same time of which 0,1, or multiple of those keys might have the matching key name. For example we could define the key name as 'GCSeqNumber' and then the parsing code would need to determine to be able to handle inputs like these:

Foo\0Bar\0GCSeqNumber\025\0  -> 25
Foo\0Bar\0Baz\025\0 -> not present, default to 0
Foo\0GCSeqNumber\0Baz\025\0 -> not present, default to 0
Baz\025\0GCSeqNumber\0NotNumber\0 -> invalid number text, default to 0
GCSeqNumber\025\0Baz\0Foo\0GCSeqNumber\019\0 -> 25 (first one wins?)

@@ -673,6 +673,7 @@ disable_helper (EventPipeSessionID id)

while (ep_provider_callback_data_queue_try_dequeue (provider_callback_data_queue, &provider_callback_data)) {
ep_rt_prepare_provider_invoke_callback (&provider_callback_data);
provider_callback_data.enabled = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not modify enabled here. EventPipe is following a convention established by ETW. Modifying when we emit the PROVIDER_ENABLED control code makes ETW/EventPipe behave differently which makes it more difficult for anyone to write logic in a provider callback that works correctly for both.

@tommcdon
Copy link
Member

Hi @chrisnas! Do you plan on moving forward with this PR in the short term, or would it be OK if we moved this to "draft" mode while development is still in progress?

@tommcdon
Copy link
Member

Hi @chrisnas! I am moving this PR to "draft" mode - please let us know if you plan on addressing feedback in this PR.

@tommcdon tommcdon marked this pull request as draft August 12, 2024 20:35
@chrisnas
Copy link
Contributor Author

Hi @chrisnas! I am moving this PR to "draft" mode - please let us know if you plan on addressing feedback in this PR.

Sorry for the delay @tommcdon!
It's good to move it to draft because it seems that I underestimate the impact of the change for ETW.
When @noahfalk will have time, I will need to validate what should be done (especially defining the convention of exchanges between a tool and the runtime)

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Tracing-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants