Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add runtime metrics in System.Diagnostics.DiagnosticSource #104680

Merged
merged 20 commits into from
Jul 18, 2024

Conversation

stevejgordon
Copy link
Contributor

Fixes #85372

This adds new runtime metrics to align with the proposed semantic conventions. I am opening this as a draft until the naming is finalised in the OTel semantic conventions.

The metrics are essentially a port of the existing OTel contrib runtime metrics. Small changes have been made to optimise the performance and align the naming with those being proposed and discussed over in the semantic conventions repo.

cc @tarekgh and @noahfalk

@lmolkova
Copy link

@stevejgordon thanks a lot for doing this!

we synced offline and made some common suggestions on open-telemetry/semantic-conventions#1035 (comment). Hope it'd help you focus on this PR and get it merged 🤞

@stevejgordon
Copy link
Contributor Author

@lmolkova, happy to contribute. Thanks for clarifying the suggestions in the semantic conventions. I've made those changes and aligned this PR with the updated naming.

@tarekgh
Copy link
Member

tarekgh commented Jul 11, 2024

@carlossanlop @adamsitnik @ericstj I am seeing the cyclic errors /__w/1/s/.packages/microsoft.dotnet.sharedframework.sdk/9.0.0-beta.24360.1/targets/sharedfx.targets(484,5): error : System.IO.Pipes > System.Net.Sockets > System.Net.NameResolution > System.Diagnostics.DiagnosticSource > System.Diagnostics.Process > System.IO.Pipes [/__w/1/s/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj] while this PR didn't add or change any dependencies. Any idea ideas why? the change started to call things like Process process = Process.GetCurrentProcess();.

@tarekgh
Copy link
Member

tarekgh commented Jul 11, 2024

I talked offline with @ericstj and looks we introduced the dependency on System.Diagnostics.Process which caused the cyclic dependency.

@noahfalk @samsp-msft @lmolkova @reyang do you see any issue if we move this feature to Microsoft.Extensions.Diagnostics library instead?

System.Diagnostics.DiagnosticSource is really low in the stack and it is good to reduce any dependencies there especially if we extend the runtime metrics in the future.

@reyang
Copy link

reyang commented Jul 11, 2024

I talked offline with @ericstj and looks we introduced the dependency on System.Diagnostics.Process which caused the cyclic dependency.

@noahfalk @samsp-msft @lmolkova @reyang do you see any issue if we move this feature to Microsoft.Extensions.Diagnostics library instead?

No issue from what I can see, thanks!

@stevejgordon
Copy link
Contributor Author

@noahfalk @samsp-msft @lmolkova @reyang do you see any issue if we move this feature to Microsoft.Extensions.Diagnostics library instead?

Unfortunately, there might be an issue with this. The implementation relies on the initialisation of the static meter from the ctor of MeterListener. We'd need an alternative way to ensure the Meter is created.

@jkotas
Copy link
Member

jkotas commented Jul 11, 2024

What is the impact of the dependency closure of this new meter on the size of trimmed or native-aot compiled applications?

If this is going to be enabled by default, does it mean that this is introducing significant binary size regression for trimmer or native-aot compiled apps?

I think we either need to avoid taking large dependency closure in the implementation and then it is ok to enable it by default; or it needs to opt-in and then the large dependency closure is ok (not ideal though).

@tarekgh
Copy link
Member

tarekgh commented Jul 11, 2024

@jkotas we have the switch System.Diagnostics.Metrics.Meter.IsSupported to allow disabling the metrics. Is this can be acceptable mitigation for AOT if apps care about the size and not using metrics? I am just thinking loudly here.

Unfortunately, there might be an issue with this. The implementation relies on the initialisation of the static meter from the ctor of MeterListener. We'd need an alternative way to ensure the Meter is created.

Yes, I am trying to see if there are any other concerns that can prevent us moving it to the other library. For alternative way to create the Meter, I am thinking in the direction if we can keep creating the Meter in System.Diagnostics.DiagnosticSource but having instruments that are bringing the new dependencies be created from the extensions library. Of course we'll need to have a way to pick the meter instance from System.Diagnostics.DiagnosticSource. I am still thinking of other alternatives or viable solutions.

@noahfalk
Copy link
Member

I talked offline with @ericstj and looks we introduced the dependency on System.Diagnostics.Process which caused the cyclic dependency.

I'm hopping between meetings and haven't had a chance to look more closely, but could we break the dependency by p/invoking to the relevant OS APIs directly? I believe on Windows its a call to GetProcessTimes() and on Linux a call to getrusage(). It doesn't seem like it would take much code to do that.

@noahfalk
Copy link
Member

I think we either need to avoid taking large dependency closure in the implementation and then it is ok to enable it by default; or it needs to opt-in and then the large dependency closure is ok (not ideal though).

I would aim for the former - avoid taking large dependency closure. In theory runtime metrics should be nothing more than some calls into low-level runtime APIs (GC, JIT, assemblies, threadpool) + a tiny bit of data from the OS (some cpu and memory usage info)

@tarekgh
Copy link
Member

tarekgh commented Jul 11, 2024

@noahfalk there are more dependencies and not just Process one. Yes, process one is what is causing cycles, but I am not sure if we are ok for the other dependencies.

    <Reference Include="System.Diagnostics.Process" />
    <Reference Include="System.Threading.ThreadPool" />
    <Reference Include="System.ComponentModel.Primitives" />

@tarekgh
Copy link
Member

tarekgh commented Jul 11, 2024

I'm hopping between meetings and haven't had a chance to look more closely, but could we break the dependency by p/invoking to the relevant OS APIs directly? I believe on Windows its a call to GetProcessTimes() and on Linux a call to getrusage(). It doesn't seem like it would take much code to do that.

Not dismissing the idea but want to point out that we have different implementations on Linux, Windows, Bsd, Mac and maybe more. @ericstj had another idea if this functionality can be exposed from Environment class (of course this will need a new API).

@ericstj
Copy link
Member

ericstj commented Jul 11, 2024

Pinvoking could be done via some runtime checks - you wouldn't want to introduce runtime specific builds to this component. We try not to do that in our nuget packages. Especially those that are used in older frameworks.

New API is blocker since that wouldn't be available on NetFx and older .NET versions.

@tarekgh
Copy link
Member

tarekgh commented Jul 11, 2024

New API is blocker since that wouldn't be available on NetFx and older .NET versions

I forgot about that 😢

@samsp-msft
Copy link
Member

Is the meter something that can be registered with DI and then fetched later, or does that create a problem with then taking a DI dependency?

@tarekgh
Copy link
Member

tarekgh commented Jul 11, 2024

Is the meter something that can be registered with DI and then fetched later, or does that create a problem with then taking a DI dependency?

You cannot access the DI from DiagnosticSource library level. In addition to that we need the scenario work outside containers (like console apps for instance). But you are thinking in the same direction I was thinking #104680 (comment). The Meter and instruments not having new dependencies can be created in DiagnosticSource. And in Extensions library we can start a listener which will give us the created instruments and from the instrument we can get the Meter and create more instruments that need the dependency. I know this is complicated, but it might work. The only issue I am seeing with that is we need a startup code to execute when the extensions library gets loaded.

@jkotas
Copy link
Member

jkotas commented Jul 12, 2024

In theory runtime metrics should be nothing more than some calls into low-level runtime APIs (GC, JIT, assemblies, threadpool) + a tiny bit of data from the OS (some cpu and memory usage info)

Yes, I agree. The immediate problem is caused by System.Diagnostics.Process that is a heavy weight component (last time I checked, any use of it brought in several MBs of code into native aot apps).

New API is blocker since that wouldn't be available on NetFx and older .NET versions.

It is fine to have a solution with good performance characteristics that uses new APIs for current .NET, and a solution with worse performance characteristics that depends on pre-existing APIs for downlevel platforms and hack around the static dependency cycles using reflection.

@noahfalk
Copy link
Member

noahfalk commented Jul 12, 2024

It is fine to have a solution with good performance characteristics that uses new APIs for current .NET, and a solution with worse performance characteristics that depends on pre-existing APIs for downlevel platforms and hack around the static dependency cycles using reflection.

I also think it would be fine if this feature doesn't exist on downlevel platforms at all, or it doesn't show up on downlevel platforms in this release. If it weren't for the dependency on the Meter API I would have happily implemented this feature in System.Private.CoreLib where there would never be any downlevel support.

there are more dependencies and not just Process one. Yes, process one is what is causing cycles, but I am not sure if we are ok for the other dependencies.

If we get rid of the System.Diagnostics.Process dependency that should also eliminate the System.ComponentModel.Primitives dependency. For threadpool I don't expect that is problematic as it is just a facade on S.P.C APIs. Downlevel we don't have to support the functionality. So I think Process is the only dependency that is problematic.

Not dismissing the idea but want to point out that we have different implementations on Linux, Windows, Bsd, Mac and maybe more.

The implementation inside of System.Diagnostics.Process is more complex because it has to handle the case where the process being inspected is not the current one. When we limit to current process only the complexity is much lower. The existing cpu-usage counter calls to RuntimeEventSourceHelper.GetCpuUsage which only has two platform specific implementations:
Unix (handles Linux, BSD, Mac, and others - ultimately calls into the PAL and uses getrusage)
Windows (calls to GetProcessTimes)

As best I can tell GetProcessTimes() and getrusage() covers all the cases where we'd need to make the data available, and then we've got cases like browser or WASI where the metric isn't going to be available. I'm certainly not opposed to adding a new public API, but I am guessing the quickest and easiest implementation would be a conditional check at runtime to determine the platform and then p/invoke to one of those two OS APIs.

@stevejgordon - do you want to give the p/invoke approach a try for cpu.time? I believe you'd want something like:

if(OperatingSystem.IsWindows())
{
   // pinvoke to GetProcessTimes()
}
else if(!OperatingSystem.IsWasi() && !OperatingSystem.IsBrowser())
{
   // pinvoke to getrusage()
}
else
{
   // no support for cpu.time on this platform
}

@stevejgordon
Copy link
Contributor Author

@noahfalk, sure! I'll have a look at this today.

@stevejgordon
Copy link
Contributor Author

@noahfalk, I have tweaked the tests to accommodate some of the platforms not running a forced GC. I also tweaked the existing tests that were failing to ensure they only observe their expected Meters. I'm hoping this current run should go through on all the tests, but I will keep an eye on it.

@tarekgh tarekgh marked this pull request as ready for review July 17, 2024 16:13
@tarekgh
Copy link
Member

tarekgh commented Jul 17, 2024

@stevejgordon I have changed the PR to be not draft. I think we can merge once we get the green CI and no more comments need to be addressed. I can enable the cpu time metrics with my changes later as I need more time to finish exposing the new Environment API.

@tarekgh
Copy link
Member

tarekgh commented Jul 17, 2024

@stevejgordon there are some tests failing on WASM

Maybe we need to exclude running the metric tests on WASM. something like [ConditionalFact(typeof(PlatformDetection), nameof(PlatformDetection.IsNotBrowser))]. We already do that in other metric tests.

@stevejgordon
Copy link
Contributor Author

@tarekgh, sure. Will try that first thing tomorrow. I was expecting something like that might be needed somewhere.

@noahfalk
Copy link
Member

Just a heads up, I lost internet connectivity at my house. I'm doing what I can from my cell phone but it's a subpar UI experience to say the least :)

@stevejgordon
Copy link
Contributor Author

@noahfalk / @tarekgh It looks like the tests are happy now! The only failures seem to be known issues and unrelated.

@tarekgh tarekgh merged commit fd94c84 into dotnet:main Jul 18, 2024
80 of 84 checks passed
@tarekgh
Copy link
Member

tarekgh commented Jul 18, 2024

Thanks @stevejgordon for your contribution.

@stevejgordon
Copy link
Contributor Author

Happy to help! Thanks for the support as we got this over the line!

@noahfalk
Copy link
Member

Thanks Steve!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Diagnostics.Metric community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metrics for System.Runtime
10 participants