-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiling: How to get GC Metrics in-process #5638
Comments
Btw: I'm not sure if it's better to ask such questions on StackOverflow or on Github. Please let me if I should use SO instead. |
cc @Maoni0 |
If you look deeper in some of the categories you've mentioned, there's more info. perf counters is an exception as we made a decision to not invest more in perf counters when we did background GC in 4.0. So you get whatever support it offered before (which means not much support for background GC) when it's enabled. COR_GC_STATS works off of the perf counter info so is also not available if perf counter is not enabled. For both profiling and ETW, you can certainly get more info - For profiling, ICorProfilerCallback2::GarbageCollectionStarted/Finished tells you exactly when a GC starts and finishes and with a bunch of other info - the generation it collects, generation bounds and etc. For ETW, it's a low level mechanism so by design it does not give you statistics (unless of course you are firing stats info in your events). That's why we have tools that build on top of this mechanism to give you stats. Take a look at TraceEvent and PerfView which let you capture/interpret the events (in managed code) and look at stats. Let me know if these still do not solve your problems. |
Although ICorProfilerCallback2::GarbageCollectionStarted/Finished would provide the information I'm looking for, I know for a fact that it causes too much overhead. It needs to have a certain profiler capability enabled (COR_PRF_MONITOR_GC). Just by enabling it, Concurrent GC is disabled for example. Since I'm looking for a low-overhead solution, this is not a (good) option for me. Too bad, COR_GC_STATS does not fit, it looked so good :). Thanks for your info by the way. Also, I did not know that perfcounters don't provide good values when background GC is active, so thanks for that as well. TraceEvent looks like a nice managed library for certain ETW event sources (also CLR event's I am looking for). Maybe we can extract what we need from there. Does it use private logger sessions? Does it apply to all other limitations of ETW (mentioned in my initial post)? We did some fair research on ETW, and there is an in-memory mode (real-time mode), where events can be processed from memory and do not need to go to a file (which is great). But unfortunately such real-time mode session can only be made system-wide and there is a limitation of 64 sessions per system (what if I want to monitor more than 64 .net processes?). Unfortunately, that mode is not combineable with private logger sessions which on the other hand does not require special permissions, and where you can make isolated sessions for each process, and there is no 64 session limit. I'm starting to get the feeling that ETW was not exactly made for my requirements :). |
We have some tentative plans to wire all current ETW events back into managed code so that they look like and EventSource. The idea is that events like GCStats would be available to you in-proc. However this does not help you today. As you have discovered, ETW does not really work well for 'self-monitoring' scenarios. Another possibility is that it is also reasonable for use to have a profiler API that gets called any time an ETW event happens. This does not exist, but is reasonably straightforward addition. With this you could use the profiler API to get at the information you want in-proc. Both of these options requires new code to be written in coreclr, but that is what pull requests are about. If you want to pursue either of these there should be some discussion about the details of the design first. |
I have been advocating that we should have a low overhead GC profiling mode (but haven't succeeded :-) apparently this hasn't been an urgent thing to have for most of our customers who use the profiling API) - for some people it's enough to know some stats that are very cheap for the runtime to provide (eg, they don't need the root reference, or moved references and etc). And profiling has the very nice attribute of being real time. It shouldn't be much work to allow an additional mode that gives you a subset of things GC profiling currently offers (you don't have to disable concurrent GC - it was just work that hasn't been done to enable it). If you want, you are certainly welcome to open an issue for it to discuss if someone (could be yourself :)) wants work on it. |
All in all, it seems like there are not perfect options which will help me short-term. I guess we'll need to go with ETW for now (with it's limitations). Looking forward, the ideas you mentioned are definitively interesting to explore for future versions.
That certainly sounds interesting. My profiler has native+managed parts, so I could also consume stuff managed. Though, I have a slight preference for consuming such an API natively, because pure native threads are not subject to GC suspensions themself, so they would be (a little bit) more deterministic.
Sounds cool. In order to control overhead, some up-front filtering mechanism would be needed I guess. Given such a fine-grained filtering mechanism, this approach would be very flexible (and thus useful for my specific use-case). Also (not knowing the internals of ETW) my naive mind asks: where do the buffers live and how large will they be? Would this still go through Windows Kernel?
We've actually used those detailed infos that the Profiling API provides (root references, moved references, ...), but for a very different use-case. In that use-case we would only enable COR_PRF_MONITOR_GC for a short period of time, then disable it again (which actually works since some .NET version - thanks for that :)). But the use-case I am aiming for right now, is to know only the very basic metrics (gen x collection count, duration and as well size of all heaps). I just want to report these metrics periodically (every X seconds), that's it. So, some API which just gives me a struct with those values would be sufficient. One question by the way: If any of those features discussed here would be implemented for CoreCLR, would it flow back to Full CLR and be released with some future version? Is it basically the same code with different ifdef definitions or is there a very separate repository for it? |
Whether a feature flows back to full clr is determined case by case - it's all based on how much value it can bring to full clr customers and how much work it takes to flow back. |
I wanted to bump this and ask whether anything had changed on this topic? Specifically, is there any way to get GC stats from within the process? My environment is .NET Core running on Linux. I'm aware of My current plan is to have a side care process running a profiling session and reporting various metrics (using perf and LTTng) about the monitored process. Is there an easier way that I missed to go about doing this? Being used to simply querying Performance counters on Windows, I really wish there was. |
@lmeenhardt, using perf and/or LTTng is a reasonable way to go about this, especially if you want very detailed information. This has been done before by others as well. There are APIs to get the collection count as well as the amount of memory that has been allocated on the current thread, but if you want data per-GC then LTTng is likely your best bet. Note that if you don't care about collecting CPU and/or blocked time stacks then you just need LTTng. It can give you all of the runtime events as well as kernel events without stacks. Also, as @vancem noted above, we are moving towards making all of the runtime events, including GC events, available in-process through EventListeners. There has been much work done on the internals of the runtime and logging infrastructure to make this possible, but we are not there yet. You can read some about the plans at https://github.com/dotnet/designs/blob/master/accepted/cross-platform-performance-monitoring.md. |
@mleenhardt did your plan worked? (side care process with LTTng) Or maybe you found an easier solution? I am facing same challenge now. Looking for ways to get GC "perf counters" on Linux. |
Haven't gotten around working on this yet but unless there is now something better, that's still the plan. |
The work I described above around exposing runtime data through EventListener is in-progress and so hopefully by the time you're ready it will be as well. I don't have an ETA yet though. |
I have a need for a similar feature as @mleenhardt - I will need to collect different GC stats in-proc in a cross platform manner (so a side-car LTTng process won't quite cut it...). |
can you tell me how you would use the CPU time spent in GC if it was available? obviously, some of this CPU time was consumed while the user threads were running while the rest wasn't. if you can tell me the usage scenario it would be helpful to understand if the CPU time spent in GC is the right thing to provide. |
In our native ( I am concerned that permanently enabled My question is does enabling ETW session and listening events hurt CLR performance in any way, or it's overhead on the runtime is much lower than of |
If you just capture GC events, then I would expect that the overhead of the ETW events is going to be much lower than using the profiler API, which is much more invasive in how it impacts runtime execution. The cost of ETW events on the process is directly proportional to how many events are emitted and the cost of producing the payload to be emitted. These GC events are quite low in verbosity, and the arguments don't cost much to produce. There are others, such as the GC heap survival and movement events that are much more verbose, and also extend GC pauses. This would be an example of an event that has a much higher impact. But for what you're trying to do, expect the overhead to be minimal. |
Is it possible, using native profiler API or ETW, get the measurement, which corresponds to the # Total committed Bytes performance counter?
As I understand, this value represents heap memory allocated by the GC for all generations. |
there is something in the ETW event that tells you this info, it's called ExtraGen0Commit in the per heap history event. this is how much is committed on top of the heap size info you already get (ie, on the GCHeapStats). please note that in .NET 5 we may decommit memory while the user threads are running. so this was the data at the end of a GC, you can think of it as the peak committed usage before that GC happened. if you are using .NET 5 there's another way to get this data which is the TotalCommittedBytes property of the |
@discostu105 @kalikin does the new property on GCMemoryInfo help with your requirements? |
@mangod9 I haven't researched this feature yet, because it is a managed API and, aside from a few instrumentations, our profiler is unmanaged. And I'm not sure how and when to start a managed thread from a profiler - is there an official documentation on that topic? |
It looks like there is a now a way to get GC events in-process through the new runtime event source. However, how can I get the duration of a GC? I can't seem to find a way to correlate the GCEnd events and the GCStart events. I would like to get "percent time in GC" like the counter that is available in .NET Framework. |
Maybe the runtime GC events would help, see the docs for details https://docs.microsoft.com/en-us/dotnet/fundamentals/diagnostics/runtime-garbage-collection-events, and you can take this project as an example. |
@jonso4 there are various ways you can get it -
if you can't use either you'll need to calculate it yourself - the event sequence is described here and you can look at TraceEvent src to see how it calculate it. for blocking GCs this is trivial; for BGCs not so much. |
Let's say, you have a profiler which is loaded inside of the process via Profiling API. Let's say you want to capture the impact the GC has on your process (just the CLR in the process your profiler is loaded in). Stats that would be interesting are: Gen0,1,2,LOH collection count and duration. Which native API would I use to properly get such statistics periodically with low overhead?
Some thoughts:
ICorProfilerCallback::RuntimeSuspendStarted
) that indicate "suspensions". However, these callbacks don't give much details on the reason of suspension (e.g. no info about which generation is being collected). Also, I am unsure about background/concurrent GC. I guess those won't trigger runtime suspensions (in all cases).COR_GC_STATS
to be an interesting datastructure. It contains pretty much what I am looking. It's available viaCorGCHost::GetStats
. Is there any way for a profiler to get to a CorGCHost instance (if the CLR is not self-hosted)?[0] https://social.msdn.microsoft.com/Forums/en-US/2664a9cf-172a-45c3-81cc-5a4b93b3be7d/etw-private-session-without-logfile?forum=clr
The text was updated successfully, but these errors were encountered: