Skip to content

Conversation

mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Sep 17, 2025

Implements dotnet/docs#47894

Following the addition of emitting native runtime and custom EventSource events as user_events through dotnet/runtime#115265 and the public release of https://github.com/microsoft/one-collect which supports collecting both .NET user_events and Linux perf events into a single .nettrace file, dotnet-trace will support a new verb, collect-linux, that wraps around record-trace.

This PR does the following:

  • Adds collect-linux verb and serializes a subset of dotnet-trace collect options in addition to a collect-linux specific --perf-events option into record-trace args. (see [Diagnostics][dotnet-trace] Add collect-linux verb docs#47894 for overarching details)
  • Adds record-trace dynamic library to dotnet-trace
  • Updates existing profiles (cpu-sampling -> dotnet-common + dotnet-sampled-thread-time) and adds collect-linux specific profiles
  • Updates list-profiles verb with revamped profiles + multiline description formatting
  • Refactors EventPipeProvider composition logic (MergeProfileAndProviders + ToProviders -> ComputeProviderConfig) and rename Extensions.cs -> ProviderUtils.cs
  • Revamp EventPipeProvider composition tests (ProviderParsing.cs -> ProviderCompositionTests.cs)
  • Various cleanup: Update CLREventKeywords + Update logging + refactor collect logic + expand dotnet-trace common options

Testing

dotnet-trace collect-linux

On Linux

collect-linux
collect-linux --help
$ ./dotnet-trace collect-linux -h
Description:
  Collects diagnostic traces using perf_events, a Linux OS technology. collect-linux requires admin privileges to capture kernel- and user-mode events, and by default, captures events from all processes. This Linux-only command includes the same .NET
  events as dotnet-trace collect, and it uses the kernel’s user_events mechanism to emit .NET events as perf events, enabling unification of user-space .NET events with kernel-space system events.

Usage:
  dotnet-trace collect-linux [options]

Options:
  --providers       A comma delimited list of EventPipe providers to be enabled. This is in the form 'Provider[,Provider]',where Provider is in the form: 'KnownProviderName[:[Flags][:[Level][:[KeyValueArgs]]]]', and KeyValueArgs is in the form:
                    '[key1=value1][;key2=value2]'.  Values in KeyValueArgs that contain ';' or '=' characters need to be surrounded by '"', e.g., FilterAndPayloadSpecs="MyProvider/MyEvent:-Prop1=Prop1;Prop2=Prop2.A.B;".  Depending on your shell, you may
                    need to escape the '"' characters and/or surround the entire provider specification in quotes, e.g., --providers 'KnownProviderName:0x1:1:FilterSpec=\"KnownProviderName/EventName:-Prop1=Prop1;Prop2=Prop2.A.B;\"'. These providers are in
                    addition to any providers implied by the --profile argument. If there is any discrepancy for a particular provider, the configuration here takes precedence over the implicit configuration from the profile.  See documentation for
                    examples.
  --clreventlevel   Verbosity of CLR events to be emitted.
  --clrevents       List of CLR runtime events to emit.
  --perf-events     Comma-separated list of perf events (e.g. syscalls:sys_enter_execve,sched:sched_switch).
  --profile         A named, pre-defined set of provider configurations for common tracing scenarios. You can specify multiple profiles as a comma-separated list. When multiple profiles are specified, the providers and settings are combined (union), and
                    duplicates are ignored.
  -o, --output      The output path for the collected trace data. If not specified it defaults to '<appname>_<yyyyMMdd>_<HHmmss>.nettrace', e.g., 'myapp_20210315_111514.nettrace'. [default: default]
  --duration        When specified, will trace for the given timespan and then automatically stop the trace. Provided in the form of dd:hh:mm:ss.
  -n, --name        The name of the process to collect the trace.
  -p, --process-id  The process id to collect the trace.
  -?, -h, --help    Show help and usage information
`collect-linux` without elevated privileges
$ ./dotnet-trace collect-linux
No profile or providers specified, defaulting to trace profiles 'dotnet-common' + 'cpu-sampling'.
Applying profile 'dotnet-common': Microsoft-Windows-DotNETRuntime:0x000000100003801D:4
Applying profile 'cpu-sampling': --on-cpu

Provider Name                           Keywords            Level               Enabled By
Microsoft-Windows-DotNETRuntime         0x000000100003801D  Informational(4)    --profile

Error: Tracefs is not accessible: Permission denied (os error 13)
`collect-linux` with elevated privileges
$ sudo ./dotnet-trace collect-linux
[sudo] password for mihw:
No profile or providers specified, defaulting to trace profiles 'dotnet-common' + 'cpu-sampling'.
Applying profile 'dotnet-common': Microsoft-Windows-DotNETRuntime:0x000000100003801D:4
Applying profile 'cpu-sampling': --on-cpu

Provider Name                           Keywords            Level               Enabled By
Microsoft-Windows-DotNETRuntime         0x000000100003801D  Informational(4)    --profile

Recording started.  Press CTRL+C to stop.
^C
Recording stopped.
Resolving symbols.
Finished recording trace.
Trace written to trace_20250919_205934.nettrace

On Windows (and I presume other non-Linux OS):

`collect-linux`
.\artifacts\bin\dotnet-trace\Debug\net8.0\dotnet-trace.exe collect-linux
The collect-linux command is only supported on Linux.

dotnet-trace list-profiles

`list-profiles`
dotnet-trace profiles:
        dotnet-common                        - Lightweight .NET runtime diagnostics designed to stay low overhead.
                                               Includes:
                                                   GC
                                                   AssemblyLoader
                                                   Loader
                                                   JIT
                                                   Exceptions
                                                   Threading
                                                   JittedMethodILToNativeMap
                                                   Compilation
                                               Equivalent to --providers "Microsoft-Windows-DotNETRuntime:0x100003801D:4".
        dotnet-sampled-thread-time (collect) - Samples .NET thread stacks (~100 Hz) toestimate how much wall clock time code is using.
        gc-verbose                           - Tracks GC collections and samples object allocations.
        gc-collect                           - Tracks GC collections only at very low overhead.
        database                             - Captures ADO.NET and Entity Framework database commands
        cpu-sampling (collect-linux)         - Kernel CPU sampling events for measuring CPU usage.
        thread-time (collect-linux)          - Kernel thread context switch events for measuring CPU usage and wall clock time
Screenshot 2025-09-19 142848 Screenshot 2025-09-19 142945 Screenshot 2025-09-19 142858

@mdh1418 mdh1418 force-pushed the dotnet_trace_collect_linux branch from c09b095 to 0989d21 Compare September 19, 2025 05:45
@mdh1418 mdh1418 force-pushed the dotnet_trace_collect_linux branch 2 times, most recently from 76e65e1 to 61b2569 Compare September 19, 2025 18:32
@mdh1418 mdh1418 force-pushed the dotnet_trace_collect_linux branch from 61b2569 to c8e53ea Compare September 19, 2025 18:56
@mdh1418 mdh1418 marked this pull request as ready for review September 19, 2025 19:43
@mdh1418 mdh1418 requested a review from a team as a code owner September 19, 2025 19:43
@mdh1418 mdh1418 added the DO NOT MERGE do not merge this PR label Sep 19, 2025
@mdh1418 mdh1418 changed the title [dotnet-trace] Add collect-linux verb [NO-MERGE][dotnet-trace] Add collect-linux verb Sep 19, 2025
Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good. Aside from some small comments inline a few broader things:

  1. Historically we relied on manual testing to keep these tools operating well but now that we have less time available from the testers I think we need to improve our automated testing. Partly this is to ensure we aren't inadvertently changing the original 'collect' verb and partly to ensure going forward the 'collect-linux' behavior doesn't regress either. I think the best way to do this would be:
  • Open a new PR that we'll check in first containing some basic tests of the existing collect verb.
  • Commit this PR 2nd and all the tests in the 1st PR should continue to pass. This ensures we didn't change anything unintended.

To do the testing we probably need to create some small interface shims. We already have an IConsole interface defined that could be moved to the shared Common folder. We could also create a small interface around the DiagnosticsClient.EventPipeCollect() API so that a test can return some dummy data in a stream instead. dotnet-counters has some example tests that show how we can run some code and then confirm the console output is what we expect. In this case I imagine we'd be running the Collect() function and giving some chosen input arguments.

  1. I think there is a bit more adjustment to be done on some of the output text, but it should be fine to get this one in first, then tweak afterwards in some 3rd PR.

@mdh1418 mdh1418 force-pushed the dotnet_trace_collect_linux branch from 789fb2b to 1f75125 Compare October 7, 2025 13:39
Record-Trace has a bug for multithreaded .NET apps when specifying a process to trace.
microsoft/one-collect#192
Until that is resolved, the scenario is broken.
@mdh1418 mdh1418 changed the title [NO-MERGE][dotnet-trace] Add collect-linux verb [dotnet-trace] Add collect-linux verb Oct 7, 2025
@mdh1418 mdh1418 removed the DO NOT MERGE do not merge this PR label Oct 7, 2025
@mdh1418
Copy link
Member Author

mdh1418 commented Oct 7, 2025

The failure is unrelated I think it's dotnet/runtime#120317 even after the refactor #5585

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants