Support getting EventPipe events via ICorProfile callbacks #11750

noahfalk · 2019-01-04T04:18:01Z

Historically we had an unnecessary divide between ETW events that were available asynchronously out-of-proc and ICorProfile API callbacks that were available synchronously in-proc. Some events were available via one source, some via the other, and some via both. Although it is technically possible to create profilers that register for both streams of events it is needlessly complex and it is impossible to make the async ETW events become synchronous again. To mitigate these issues we want to make all of the ETW events available synchronously via an ICorProfile callback. And because we are cross platform now, ETW events are better abstracted as EventPipe events so that this will work properly on non-Windows platforms too.

We already had a past effort at doing this in PR dotnet/coreclr#19157 and a little bit of discussion of it in #5638. The dotnet/coreclr#19157 attempt handled receiving the EventPipe events fine, but it didn't include a mechanism to enable the events. To use it, a profiler would need to configure the set of desired events via an out-of-band mechanism such as environment variables or managed EventListener APIs. Although this would work for some scenarios, I expected it would be onerous in the common case and it would be awkward to migrate from that form of API to one where ICorProfile also had direct control over the set of events it was receiving.

Work to do:

Add multi-session support to EventPipe so that the ICorProfile client's event settings need not interfere with other EventPipe clients. Create the Concept of Multiple EventPipe Sessions #9388, Implement Session-based Filtering for EventSource #11435
Add EventPipe event control APIs to ICorProfile so that the profiler can select which events it wants to receive
Update/complete any work necessary in the EventPipe listening APIs implemented in PR Add support for writing EventPipeEvents to ProfAPI coreclr#19157

noahfalk · 2019-01-04T04:19:59Z

cc @mjsabby @vancem @tommcdon @jorive @sywhang @discostu105

mjsabby · 2019-05-30T05:16:34Z

The first work item in this I believe is complete.

noahfalk · 2019-05-30T20:40:16Z

yep : )

davmason · 2020-01-31T20:55:18Z

@mjsabby (or anyone else interested) I have thought about this recently, and there is the questions of whether the EventPipe callback should be synchronous or asynchronous. The current EventPipe setup is that events are fired asynchronously. They are written to a buffer by the thread that writes the event, and then we have a separate thread that waits for either the buffer to get full or a maximum timeout to be reached, then bulk writes the event to the stream/file/etc.

The path of least resistance when listening via profiler would be to have the events also fired asynchronously to the profiler. However, then you would lose the ability to block on an event to pause the runtime. I.e. you couldn't block on the GC started event or JIT started event to delay the beginning of the GC/jitting.

If the profiler EventPipe events are fired synchronously it would introduce complexity to the profiler. You would have to be concerned with blocking too long, or risk losing events. It also means that it would be fairly trivial to introduce perf regressions to the profiled app unintentionally. Both of these are concerns with the profiler APIs, but since we think carefully about every addition to the profiler APIs it is easy to offer guidance on best practices. Once you have the ability to listen to random events there certainly will be call sequences we haven't reasoned about and have to figure out as we go.

My thinking is that profilers want the ability to block the runtime from completing and are willing to pay the extra complexity to achieve this. I.e., as long as the processing logic isn't extremely cumbersome they would desire to be able to block the runtime strategically.

Let me know about thoughts/concerns.

noahfalk · 2020-02-01T01:05:53Z

or risk losing events

My assumption is that if we are doing synchronous delivery then we wouldn't buffer, and without the buffer we'd never lose events. Blocking would still be a risk though.

mjsabby · 2020-02-01T05:08:01Z

I also prefer synchronous events because the value for us is to react in real time. Further buffering or not blocking is left to the profiler.

Profilers already have to be written in sophisticated ways or the profilee is adversely impacted.

mjsabby · 2020-02-04T15:27:50Z

@davmason @noahfalk this is something we’d really like for 5.0. Given that we have a reasonable amount of time before 5.0 ships can we have this be a feature we’re committing too?

noahfalk · 2020-02-07T01:49:10Z

I'd need to chat with David to see what work remains here and how much he thinks it would cost. I haven't seen him in the office last couple days so just waiting for that.

Can you describe what you'd do with the API if it was available in 5.0?

mjsabby · 2020-02-07T02:37:03Z

If we had this it would unify our eventing story that is fragmented between ETW/EventPipe and Profiler APIs. This would allow us to also react with other APIs in our app that need real time feedback.

davmason · 2020-02-08T00:56:34Z

@mjsabby it is on the backlog for 5.0 but not currently being worked on. My intent is to have it in for 5.0, but I don't think I can promise that it will for sure be there. There's always the risk of me being pulled to other high priority projects.

mjsabby · 2020-02-20T05:10:25Z

@davmason is it possible to do this in steps? Maybe we can add the ability to also write to the profiler api and the control side of things can be added later? Like I did in dotnet/coreclr#19157

@noahfalk your thoughts?

I'm just trying to see how it can land in 5.0 because it missed, 2.2, 3.0 and 3.1 .. and if we miss 5.0 it'll be yet another year.

mjsabby · 2020-02-20T05:13:34Z

I also meant to add given that we a diagnostic control library that maybe adding an api to enable the specific events using profiler api might not be as pressing it was previously, so that requirement as a whole could go away.

In any case, I'll revive my PR on this new repo and see where it goes.

davmason · 2020-02-20T21:54:45Z

@mjsabby since the change to EventPipe where each listener has its own session, it wouldn't really make sense to enable listening without being able to specify what events you want to receive. I don't think it makes sense to use the diagnostics library for that. The API to specify what events should be enabled is one of the more straightforward parts of this work.

There are two parts I'm concerned about taking longer. The first is that synchronously delivering events means we have a new, profiler only, path in EventPipe that might be a constant source of bugs.

The second is that the EventPipe events are delivered encoded, so the profiler would need a way to decode the payload. I believe all of our decoding logic is in managed code, so we would have to either provide a decoding API or a sample profiler that does the decoding that other profiler implementers could use for guidance.

I think the runtime work could come as a first part, then the decoding could happen later if we had to do it in parts.

noahfalk · 2020-02-21T01:42:50Z

Apologies, I meant to respond earlier and got distracted.

it wouldn't really make sense to enable listening without being able to specify what events you want to receive

+1

The first is that synchronously delivering events means we have a new, profiler only, path in EventPipe that might be a constant source of bugs

FWIW I think it would be worthwhile long-term to have the option of creating an EventPipe session that delivers its events synchronously. The profiler might be the first user of such functionality but I could imagine out-of-proc tools wanting to do it too.

so we would have to either provide a decoding API or a sample profiler that does the decoding that other profiler implementers could use for guidance.

I think it is legitimate for the runtime to provide an API that spits out bytes in an encoded format as long as we have a specification for the format (such as the already specified nettrace format). I agree having a sample or a decoding API would make it easier and potentially more efficient, but if Mukul was motivated to write a separate decoder then the overall solution would still satisfy the goal.

In any case, I'll revive my PR on this new repo and see where it goes

Did we ever get regression tests completed for the last set of profiler features you added Mukul? : ) If not I think we should ensure that the previous features have tests before we taking on new work. I know its not sexy, but its the only way to keep ourselves honest and test debt in check.

If we had this it would unify our eventing story that is fragmented between ETW/EventPipe and Profiler APIs. This would allow us to also react with other APIs in our app that need real time feedback.

Is there a story that we could attach to this that frames it as business impact rather than implementation impact? Right now I'd be telling managers the benefit is "Bing gets to make improvements to their profiler codebase and they have the opportunity for unspecified new profiler functionality in the future". A more compelling message might be "Bing will implement new profiler features X, Y, Z on top of this and that lets them half their MTTR on a hard set of production issues" or "This feature lets Bing do improved performance evaluation technique X that is required to identify their next set of potential latency improvements".

I'm just trying to see how it can land in 5.0 because it missed, 2.2, 3.0 and 3.1 .. and if we miss 5.0 it'll be yet another year.

Yeah I feel guilty suggesting it years ago and then having it keep lingering on without getting completed : ( In earlier releases we discovered we were missing some of the foundational stuff like multiple session support and that got addressed in 3.0 so progress was made. Now I think it is just a matter of priority to keep it moving forward. Either we find a good reason that the work should be high on the runtime list for 5.0 or we work with you to get it as another contribution.

mjsabby · 2020-02-21T02:31:20Z

I think it's fair to require me to do the test work for the previous profiler features before taking up this new feature :)

I will formulate how exactly it'll help us beyond improvements to the codebase.

Hopefully I get to do it well before 5.0 stops taking checkins!

noahfalk · 2020-02-21T08:49:31Z

Hopefully I get to do it well before 5.0 stops taking checkins!

Last I heard we should be free to add things through May and then the bar might start rising in June. In 3.0 the exact dates/bar kept being adjusted so I can't give a hard date, but this should be the right ballpark.

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

davmason mentioned this issue May 26, 2020

Add the ability for profilers to listen to EventPipe events #37002

Merged

davmason closed this as completed in #37002 Jun 27, 2020

ghost locked as resolved and limited conversation to collaborators Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support getting EventPipe events via ICorProfile callbacks #11750

Support getting EventPipe events via ICorProfile callbacks #11750

noahfalk commented Jan 4, 2019

noahfalk commented Jan 4, 2019 •

edited

Loading

mjsabby commented May 30, 2019

noahfalk commented May 30, 2019

davmason commented Jan 31, 2020

noahfalk commented Feb 1, 2020

mjsabby commented Feb 1, 2020

mjsabby commented Feb 4, 2020

noahfalk commented Feb 7, 2020

mjsabby commented Feb 7, 2020

davmason commented Feb 8, 2020

mjsabby commented Feb 20, 2020

mjsabby commented Feb 20, 2020

davmason commented Feb 20, 2020

noahfalk commented Feb 21, 2020

mjsabby commented Feb 21, 2020

noahfalk commented Feb 21, 2020

Support getting EventPipe events via ICorProfile callbacks #11750

Support getting EventPipe events via ICorProfile callbacks #11750

Comments

noahfalk commented Jan 4, 2019

noahfalk commented Jan 4, 2019 • edited Loading

mjsabby commented May 30, 2019

noahfalk commented May 30, 2019

davmason commented Jan 31, 2020

noahfalk commented Feb 1, 2020

mjsabby commented Feb 1, 2020

mjsabby commented Feb 4, 2020

noahfalk commented Feb 7, 2020

mjsabby commented Feb 7, 2020

davmason commented Feb 8, 2020

mjsabby commented Feb 20, 2020

mjsabby commented Feb 20, 2020

davmason commented Feb 20, 2020

noahfalk commented Feb 21, 2020

mjsabby commented Feb 21, 2020

noahfalk commented Feb 21, 2020

noahfalk commented Jan 4, 2019 •

edited

Loading