Define and implement behavior of tracing functionality when EventSource is disabled via a feature switch #43657

vitek-karas · 2020-10-20T21:05:47Z

In .NET 5 we introduced a new feature switch EventSourceSupport - it effectively disables all event sources (typical observed behavior is that they can't be enabled and thus don't produce any events). For the most part this has been implemented for Mono/WASM scenarios which don't support event source anyway, so the code was effectively unused.

If used on CoreCLR, there's still quite a bit of code left in the app (which linker can't remove right now) related to EventPipe - currently all of this code is rooted. It's easy to remove it via the EventSourceSupport feature switch, but it's unclear what that will do to the EventPipe infrastructure.

Specifically:

We need to make sure this doesn't cause failures in the application itself
We need to determine the desired behavior of all in-proc tracing scenarios (this probably already works to a point - as in in-proc listeners will work, but won't receive any events).
We need to determine the desired behavior of all out-of-proc tracing scenarios - specifically dotnet trace collect and similar.

This issue is meant to track both the design an implementation of these changes.

The text was updated successfully, but these errors were encountered:

ghost · 2020-10-20T21:05:49Z

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti
See info in area-owners.md if you want to be subscribed.

LakshanF · 2020-12-06T19:13:57Z

Validated the application and in-proc behavior with the latest Corelib changes that includes the fix when native code calls this feature.

@eerhardt added test related to this that should be sufficient.

Need to chat with the diagnostics folks to that I'm validating the tool behavior correctly.

vitek-karas · 2020-12-07T12:41:38Z

Related to #43635.

LakshanF · 2020-12-07T15:15:47Z

@noahfalk, Can you review additional testing that would be interesting for this please?

I removed the types EventPipe, NativeRuntimeEventSource from CoreLib's root descriptor to explore the behavior when the feature-switch, EventSourceSupport, is set to false. I tested this lightly as follows;

Smoke test that does not have any references to types in the System.Diagnostics.Tracing ns
Test that has EventSource and EventListener in it: EventSource.GetSources().Count() returns null and other light testing shows reasonable behavior

Interestingly, if I turn the feature switch on, the behavior seems to be as expected for these types (linker is doing its job) but I'm not very confident that I test all interesting scenarios here

dotnet-trace: the tool immediately exits after the initial header information which seems reasonable (no error message though). The resulting trace.nettrace (around 1K) cannot be opened by perfview

FastSerialization.IOStreamStreamReader.Fill(Int32 minimum) ("Error: Exception EventPipe conversion: System.Exception: Read past end of stream.")

eerhardt · 2020-12-07T16:15:34Z

@eerhardt added test related to this that should be sufficient.

That test only covers the scenario where the app is not trimmed, and the EventSource.IsSupported AppContext switch is set to false.

If we are going to unroot the EventPipe code, we should probably add a "TrimmingTest" that trims the app with the EventSource feature switch set to false, and then ensure the app still works correctly.

See

runtime/src/libraries/System.Runtime/tests/TrimmingTests/System.Runtime.TrimmingTests.proj

Lines 21 to 25 in 738dcf7

 <TestConsoleAppSourceFiles Include="VerifyResourcesGetTrimmedTest.cs"> 

 <!-- Setting the Trimming feature switch to make sure that the Resources get trimmed by the linker 

  as this test will ensure exceptions are using Resource keys --> 

 <ExtraTrimmerArgs>--feature System.Resources.UseSystemResourceKeys true</ExtraTrimmerArgs> 

 </TestConsoleAppSourceFiles>

for how to set feature switches in our current "TrimmingTests" support.

tommcdon · 2020-12-07T22:59:32Z

@sywhang @josalem

sywhang · 2020-12-07T23:25:58Z

Thanks for the heads up @tommcdon.

If used on CoreCLR, there's still quite a bit of code left in the app (which linker can't remove right now) related to EventPipe - currently all of this code is rooted. It's easy to remove it via the EventSourceSupport feature switch, but it's unclear what that will do to the EventPipe infrastructure.

Actually I've been meaning to remove a large chunk of this rooted EventPipe code. I've managed to get the internal partner team that uses reflection to access these private methods to stop doing that and move to officially supported API. I got the OK from them on removing these, so I'll follow up with a PR to remove the unused bits (there are some tests that use these, so I'll have to rewrite some of them) in a day or two.

josalem · 2020-12-08T00:27:49Z

@LakshanF, you may want to also check for behavior of other tools, specifically ETW tracing, e.g., PerfView collection.

Another potential pitfall is the infrastructure for forwarding native events to managed EventSources. You need to validate that turning on the native providers via ETW/EventPipe while EventSource* is trimmed out doesn't cause the native diagnostics code to break.

dotnet-trace: the tool immediately exits after the initial header information which seems reasonable (no error message though). The resulting trace.nettrace (around 1K) cannot be opened by perfview

This sounds like something went wrong while tracing, causing the trace to be cut prematurely (In the face of most failures, the IPC connections are closed unceremoniously by the runtime). What providers did you turn on when you were collecting this trace? What was the state of the diagnostics server after this, could you start a second trace?

In my mind, trimming out EventSource shouldn't preclude a user from doing CPU profiling. That doesn't require any managed eventing code if I recall. It also shouldn't break the Diagnostic Port infrastructure (IPC for diagnostic tools) even if eventing sources aren't available.

marek-safar · 2020-12-08T09:32:31Z

/cc @lateralusX

LakshanF · 2020-12-14T19:14:37Z

The consensus is to honor the developer intent when they switch off EventSource feature and not enable any events from the application. The user experience when they use an external tool like dotnet-trace should be to fail gracefully. The current behavior with dotnet-trace is not optimal since it crashes the app and generates a trace file that is corrupted (issue #46047).
The reason for honoring the developer intention with turning the feature switch off is as follows;

We don’t expect for example server apps to turn off event source – it provides valuable functionality, so it should be kept
We expect for example UI apps, or simple command line tools to choose to disable event source – it doesn’t provide much value and the size requirements are stricter in these cases.
We expect to keep event source enabled by default for majority of app models (Issue System.Diagnostics.ProcessTests.ProcessTest.Process_MainModule test failure in CI #14475) it would be a developer explicit choice to disable it.

We also need to document the expected behavior very clearly when this feature is turned off. In terms of diagnostics impact, there may be additional things that disabling EventSource may introduce that are not very obvious to the end user:

No metrics can be emitted out of the app because EventCounters will not work.
Several VS profilers (i.e. Visual Studio Async Profiler, Database Profiler) will not work – These depend on TplEventSource and DiagnosticSourceEventSource working.
OpenTelemetry tracing – tracing with OT will be broken because it depends on DiagnosticSourceEventSource.
EventListeners will not work as they are tightly coupled with EventSource. Separating them out will be a significant chunk of work.

LakshanF · 2021-03-05T22:31:50Z

Still have the NativeRuntimeEventSource to unroot but updating with the latest thinking from the team

External Tooling Experience

Impact on EventPipe-dependent tools (dotnet-trace)
We can add logic in diagnostics server that will check with System.Private.CoreLib.dll to detect that the EventSource support is turned off. Since .NET Core diagnostic tools such as dotnet-trace communicate with target runtime through the diagnostics server, these tools will be able to recognize that the EventSource support is turned off in the impacted application. The tools can then provide a warning to the user that the application will not emit any events. Specifically, to enable dotnet-trace to report the issue as a warning immediately, and also log an event via the CLR public provider. This would make it really easy to (a) see the issue immediately for scenarios where we can control this, and (b) make it easy to tell from a trace why events are missing. Scenario (b) is super important, because we often get traces where something went wrong, and being able to diagnose this from the trace itself is very valuable. (c) - document the boundary of what works and doesn't so that users (and even ourselves) know what to expect. This would be both at the tooling level and also at the API / RPC interfaces where tool authors and runtime developers interact.

Impact on Visual Studio Profiler
For tools that rely on ETW such as VS Profiler, we can emit an event under the “Microsoft-Windows-DotNETRuntime” for indicating that EventSource is turned off that is always fired (from native side) when EventSource support is turned off. For example, VS profiler will not be able to get any events from Windows apps that turned off EventSource – but it will receive an event from “Microsoft-Windows-DotNETRuntime” that indicates the target app turned off EventSource via EventSourceSupport switch.

Impact on Distributed Tracing
DiagnosticSourceEventSource is an EventSource, so it will impact distributed tracing/OpenTelemetry scenarios as well. But DiagnosticSourceEventSource doesn't play a signficant role in distributed tracing scenarios today. Its reasonable for apps that turned off EventSource support to lose this option until later.

Experience with turning on EventSource Providers
External processes that turn on specific EventSource Providers will not get events from these providers. EventPipe, ETW, and LTTng will cache configuration of enabled providers until they come into existence in the application. Since the EventSource support is turned off, these EventSource provider will never come into existence in the impacted application

Ability for a well known provider to give information that EventSource is turned off
The decision to turn off native events like GC events in the case where EventSource support is turned off has not been decided yet. Many of these can work as they are emitted directly from the native side of the runtime – but some of the events being fired from managed components such as the managed threadpool will not work properly. Other managed events that depend on EventSource such as TplEventSource, DiagnosticSourceEventSource will be impacted.

vitek-karas · 2021-03-10T10:20:00Z

We should not make functional decisions based on the current location of the event firing code. Currently it's easy to keep all of the native code call sites even if EventSource is turned off, but it's something we should not do. It would prevent us from freely choosing if a given functionality can be in native or managed. We're already migrating large parts of the native runtime to managed and if we did the above this would become problematic.

Having one special event fired from native seems OK, but I would try to limit only to that and nothing else. Ideally we would not even have that - in the most size constrained environments we would like to remove all of event tracing code from the app, including the native parts (one day). So ideally the tooling should be able to handle those cases - at least somewhat gracefully.

It's similar to debugging - I could imagine one day we support some special way to produce an app which doesn't even start the debugger thread and doesn't open the debugger IPC. (Both as a security/reliability as well as size feature).

noahfalk · 2021-03-10T15:23:29Z

+1 on @vitek-karas answer. I think there is a few other reasons that lead to a similar conclusion:

If we disable functionality we have to describe the scope of what was disabled to users so they understand what to expect. Saying "if you do X, all events will be disabled" is fairly straightforward. "If you do X all of the events implemented in managed code will be disabled, the exact list of what this includes varies by runtime SKU and version" is harder to understand.
Disabling all events offers a simple implementation option for EventPipe - fail the initial request to create a session and fail the user's request to do tracing in the tool UI. Dealing with partial success/partial error is much messier.

sywhang · 2021-03-10T21:59:13Z

As long as we can describe the diagnostic impact to the users in a clear way when the feature switch is disabled, I think it is fine to just disable all tracing bottoms-up.

As previously discussed, for tools that rely on ETW (i.e. VS Profiler), it may be worth firing an event that shows that the feature switch has been turned off in the app, so that it is diagnosable when no events show up in the trace.

LakshanF · 2021-03-15T20:10:33Z

Closing this issue for this round of changes that have been done:

Worked on the 2 types, EventPipe, NativeRuntimeEventSource in https://github.com/dotnet/runtime/blob/main/src/coreclr/System.Private.CoreLib/src/ILLink/ILLinkTrim.xml
Opened a doc issue and the tooling issues (links above) after discussions with the Diagnostics team and other stakeholders
There is around 12% size reduction with the feature switch turned off
The remaining types in the namespace System.Diagnostics.Tracing other than enums are EventSource and a couple of FX event sources (FrameworkEventSource, NativeRuntimeEventSource), and ActivityTracker

vitek-karas added area-System.Diagnostics.Tracing linkable-framework Issues associated with delivering a linker friendly framework labels Oct 20, 2020

vitek-karas added this to the 6.0.0 milestone Oct 20, 2020

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Oct 20, 2020

vitek-karas mentioned this issue Oct 20, 2020

Correctly apply EventSourceSupport feature switch to NativeRuntimeEventSource #43602

Merged

jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Oct 20, 2020

noahfalk added feature-request enhancement Product code improvement that does NOT require public API changes/additions and removed feature-request labels Nov 10, 2020

LakshanF self-assigned this Dec 6, 2020

LakshanF mentioned this issue Feb 10, 2021

Removing 2 rooted types related to EventSource FeatureSwitch #48104

Merged

LakshanF mentioned this issue Mar 10, 2021

EventSource Feature switch type fix #49432

Merged

This was referenced Mar 15, 2021

Document EventSource trim feature switch behavior dotnet/docs#23325

Open

Add a runtime event for detecting EventSourceSwitch turned off #47062

Open

This was referenced Mar 15, 2021

Add diagnostic server command to detect EventSourceSupport feature switch #47061

Open

dotnet-trace needs to work better with apps that has the EventSourceSupport feature switched off #46047

Closed

LakshanF closed this as completed Mar 15, 2021

LakshanF mentioned this issue Mar 25, 2021

ActivityTracker is mostly untrimmed even when event source is disabled #43635

Open

ghost locked as resolved and limited conversation to collaborators Apr 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define and implement behavior of tracing functionality when EventSource is disabled via a feature switch #43657

Define and implement behavior of tracing functionality when EventSource is disabled via a feature switch #43657

vitek-karas commented Oct 20, 2020

ghost commented Oct 20, 2020

LakshanF commented Dec 6, 2020

vitek-karas commented Dec 7, 2020

LakshanF commented Dec 7, 2020

eerhardt commented Dec 7, 2020

tommcdon commented Dec 7, 2020

sywhang commented Dec 7, 2020

josalem commented Dec 8, 2020

marek-safar commented Dec 8, 2020

LakshanF commented Dec 14, 2020

LakshanF commented Mar 5, 2021

vitek-karas commented Mar 10, 2021

noahfalk commented Mar 10, 2021 •

edited

Loading

sywhang commented Mar 10, 2021

LakshanF commented Mar 15, 2021

Define and implement behavior of tracing functionality when EventSource is disabled via a feature switch #43657

Define and implement behavior of tracing functionality when EventSource is disabled via a feature switch #43657

Comments

vitek-karas commented Oct 20, 2020

ghost commented Oct 20, 2020

LakshanF commented Dec 6, 2020

vitek-karas commented Dec 7, 2020

LakshanF commented Dec 7, 2020

eerhardt commented Dec 7, 2020

tommcdon commented Dec 7, 2020

sywhang commented Dec 7, 2020

josalem commented Dec 8, 2020

marek-safar commented Dec 8, 2020

LakshanF commented Dec 14, 2020

LakshanF commented Mar 5, 2021

vitek-karas commented Mar 10, 2021

noahfalk commented Mar 10, 2021 • edited Loading

sywhang commented Mar 10, 2021

LakshanF commented Mar 15, 2021

noahfalk commented Mar 10, 2021 •

edited

Loading