Detect DotNet processes with IPC mmap operation#229
Conversation
One-collect currently depends upon the existence of an mmap call in the .NET runtime for a mapping called doublemapper in order to detect that the process is a .NET process. This is fragile as doublemapper is an implementation detail used by the W^X implementation. Rather than depending on this mapping, depend upon the existence of a mapping called dotnet_ipc_created, which will be created by the .NET runtime once the IPC channel has been created. For the purposes of backwards compatibility, we'll keep checking for doublemapper, though ideally the new mapping is backported.
There was a problem hiding this comment.
Could record-trace also look for non-executable mappings to find dotnet_ipc_created? https://github.com/dotnet/runtime/pull/123779/changes#r2743888505
There was a problem hiding this comment.
It's technically possible, but right now we filter out non-executable mappings. Are you thinking that you want to keep permissions to be a minimum (just read)?
There was a problem hiding this comment.
Sorry, realized I linked the wrong thing, this is the conversation dotnet/runtime#123779 (comment).
I think the motivation is from a security standpoint, where it would be safer to have fewer executable pages. I don't currently know the vulnerabilities, especially if it's a 0-filled page, but there's currently no reason the runtime needs to make it executable besides record-trace discovery.
Is the main driver for filtering out non-executable mappings that it's more performant?
There was a problem hiding this comment.
security standpoint
Creating executable mappings that are not backed by a (trusted) binary on disk is a suspect operation. I would not be surprised if it is blank disabled in locked down environments, irrespective of whether the mapping is writeable.
There was a problem hiding this comment.
It's technically possible, but right now we filter out non-executable mappings
If it makes the filtering cheaper on average, we can make the mapping to have some unique size like 42 that gets checked before the name.
There was a problem hiding this comment.
@beaubelgrave, do you have any concerns with just removing the protection check all-up for these call sites? If needed, we can benchmark this, but I'm not too concerned. The filename is already preloaded (not lazily) and many mappings will be file-backed and will start with a '/'. I suspect that the filename checks are likely to bail pretty quickly if they aren't going to match.
There was a problem hiding this comment.
Is the main driver for filtering out non-executable mappings that it's more performant?
There are other mmap closures where we filter out non-executable mappings because we only need executable mappings and processing all of them becomes much more costly. I suspect that the protection check here is just because we were looking for doublemapper which needs to be executable (at least in its current implementation).
There was a problem hiding this comment.
@beaubelgrave, do you have any concerns with just removing the protection check all-up for these call sites? If needed, we can benchmark this, but I'm not too concerned. The filename is already preloaded (not lazily) and many mappings will be file-backed and will start with a '/'. I suspect that the filename checks are likely to bail pretty quickly if they aren't going to match.
mmap's are not high volume events (except at startup). The CLR hook gets mmap events, even if they are not executable. So we can remove that check just for the DotNet part, so the impact I think should be manageable, especially if the mmap name is uncommon, IE: strcmp will bail very quickly within the string.
I'm not concerned with this until we have data indicating we need to do something different. I agree with @jkotas that if we keep the size the same in each release, we could use that if needed to get more perf.
There was a problem hiding this comment.
Sounds good. I've just pushed a fix for this.
|
|
||
| /* Check if dotnet process */ | ||
| if filename.starts_with("/memfd:doublemapper") { | ||
| if filename.starts_with("/memfd:doublemapper") || filename.starts_with("/memfd:dotnet_ipc_created") { |
There was a problem hiding this comment.
How much of backward compatibility do we care about here? What are the .NET runtime versions that we expect this to work on?
If we really need to keep this around, it should have a comment that the doublemapper check is just a best effort check for backward compatibility.
There was a problem hiding this comment.
.NET events will only work on .NET 10, but perfmap support exists much further back and ideally we'd cover older supported versions (e.g. .NET 8+). I think it's reasonable to say that doublemapper is really a best-effort approach in all cases. I'll add a comment to this effect.
4e1d859 to
60125a2
Compare
Dotnet 10+ processes will create a memfd mapping called dotnet_ipc_created to signal that the diagnostics IPC channel has been created. The mapping uses protection PROT_NONE and so one_collect needs to be able to listen for non-executable mappings when dotnet processes are involved. Rather than always listening to all mappings, use the presence of the DotnetHelper as the signal to listen to all mmap events instead of just executable mmap events.
External tools interested in connecting to the runtime's diagnostic ports benefit from a low-overhead IO signal that a .NET process is ready to receive IPC commands, instead of trying to IO over all known temp file directories looking for diagnostic ports for each process. Following the discussion in microsoft/one-collect#226, this PR adds a new mapping, `dotnet_ipc_created`, that is created once a .NET process' singular listen port is successfully created. ## Testing userevents runtime tests now work on NativeAOT with the record-trace change microsoft/one-collect#229 ``` mihw@CPC-mihw-6KMZDM:~/repo/runtime$ ./artifacts/tests/coreclr/linux.x64.Release/tracing/userevents/basic/basic/basic.sh BEGIN EXECUTION /home/mihw/repo/runtime/src/tests/Common/scripts/nativeaottest.sh /home/mihw/repo/runtime/artifacts/tests/coreclr/linux.x64.Release/tracing/userevents/basic/basic/ basic.dll '' traceeAssemblyPath: Starting record-trace: sudo -n /home/mihw/repo/runtime/artifacts/tests/coreclr/linux.x64.Release/tracing/userevents/common/userevents_common/record-trace --script-file /home/mihw/repo/runtime/artifacts/tests/coreclr/linux.x64.Release/tracing/userevents/basic/basic/native/../basic.script --out /tmp/tmpBoli5K.nettrace --log-mode console --log-filter error,one_collect::helpers::dotnet::os::linux=debug record-trace started with PID: 1543079 [record-trace] 2026-01-29T22:55:19.967428Z DEBUG one_collect::helpers::dotnet::os::linux: Registered .NET tracepoint: name=OC_DotNet_Microsoft_Windows_DotNETRuntime_1543081_All, callstacks=false, use_names=true Starting tracee process: /home/mihw/repo/runtime/artifacts/tests/coreclr/linux.x64.Release/tracing/userevents/basic/basic/native/basic tracee Tracee process started with PID: 1543083 Waiting for tracee process to exit... [record-trace] 2026-01-29T22:55:20.093460Z DEBUG one_collect::helpers::dotnet::os::linux: Opened diagnostic socket: pid=1543063, nspid=1543063 [record-trace] 2026-01-29T22:55:20.093476Z DEBUG one_collect::helpers::dotnet::os::linux: Opened diagnostic socket: pid=1543063, nspid=1543063 [record-trace] 2026-01-29T22:55:20.094446Z DEBUG one_collect::helpers::dotnet::os::linux: Opened diagnostic socket: pid=1543063, nspid=1543063 [record-trace] Recording started. Press CTRL+C to stop. [record-trace] 2026-01-29T22:55:20.097795Z INFO one_collect::helpers::dotnet::os::linux: Enabled .NET events for process: pid=1543063 [record-trace] 2026-01-29T22:55:20.098955Z DEBUG one_collect::helpers::dotnet::os::linux: Opened diagnostic socket: pid=1543083, nspid=1543083 [record-trace] 2026-01-29T22:55:20.099085Z DEBUG one_collect::helpers::dotnet::os::linux: Opened diagnostic socket: pid=1543083, nspid=1543083 [record-trace] 2026-01-29T22:55:20.100017Z DEBUG one_collect::helpers::dotnet::os::linux: Opened diagnostic socket: pid=1543083, nspid=1543083 [record-trace] 2026-01-29T22:55:20.104842Z INFO one_collect::helpers::dotnet::os::linux: Enabled .NET events for process: pid=1543083 Stopping record-trace with SIGINT. Waiting for record-trace to exit... [record-trace] Recording stopped. [record-trace] Resolving symbols. [record-trace] Finished recording trace. [record-trace] Trace written to /tmp/tmpBoli5K.nettrace Expected: 100 Actual: 100 END EXECUTION - PASSED ```
Backport of #123779 to release/10.0 /cc @mdh1418 ## Customer Impact - [ ] Customer reported - [x] Found internally User_events support was added in .NET 10. The officially supported way to enable user_events is through One-Collect's [record-trace](https://github.com/microsoft/one-collect) (which dotnet-trace wraps around). Record-trace relied on an implementation detail of W^X that NativeAOT doesn't support since it doesn't support PerfMaps (see microsoft/one-collect#226), so it was belatedly discovered that user_events doesn't work for NativeAOT. Through discussions in microsoft/one-collect#226, this `dotnet_ipc_created` mapping with no permissions (PROT_NONE) was found as an acceptable minimal OS interaction to signal when the .NET process' diagnostic ports are available. ## Regression - [ ] Yes - [x] No ## Testing I tested the [User_events runtime tests](https://github.com/dotnet/runtime/tree/main/src/tests/tracing/userevents) against a locally built record-trace based on microsoft/one-collect#229 in both CoreCLR and NativeAOT on my WSL2 instance. ## Risk Low. The mapping being introduced has minimal permissions as it is a private mapping created with `PROT_NONE`. --------- Co-authored-by: mdh1418 <mitchhwang1418@gmail.com>
External tools interested in connecting to the runtime's diagnostic ports benefit from a low-overhead IO signal that a .NET process is ready to receive IPC commands, instead of trying to IO over all known temp file directories looking for diagnostic ports for each process. Following the discussion in microsoft/one-collect#226, this PR adds a new mapping, `dotnet_ipc_created`, that is created once a .NET process' singular listen port is successfully created. ## Testing userevents runtime tests now work on NativeAOT with the record-trace change microsoft/one-collect#229 ``` mihw@CPC-mihw-6KMZDM:~/repo/runtime$ ./artifacts/tests/coreclr/linux.x64.Release/tracing/userevents/basic/basic/basic.sh BEGIN EXECUTION /home/mihw/repo/runtime/src/tests/Common/scripts/nativeaottest.sh /home/mihw/repo/runtime/artifacts/tests/coreclr/linux.x64.Release/tracing/userevents/basic/basic/ basic.dll '' traceeAssemblyPath: Starting record-trace: sudo -n /home/mihw/repo/runtime/artifacts/tests/coreclr/linux.x64.Release/tracing/userevents/common/userevents_common/record-trace --script-file /home/mihw/repo/runtime/artifacts/tests/coreclr/linux.x64.Release/tracing/userevents/basic/basic/native/../basic.script --out /tmp/tmpBoli5K.nettrace --log-mode console --log-filter error,one_collect::helpers::dotnet::os::linux=debug record-trace started with PID: 1543079 [record-trace] 2026-01-29T22:55:19.967428Z DEBUG one_collect::helpers::dotnet::os::linux: Registered .NET tracepoint: name=OC_DotNet_Microsoft_Windows_DotNETRuntime_1543081_All, callstacks=false, use_names=true Starting tracee process: /home/mihw/repo/runtime/artifacts/tests/coreclr/linux.x64.Release/tracing/userevents/basic/basic/native/basic tracee Tracee process started with PID: 1543083 Waiting for tracee process to exit... [record-trace] 2026-01-29T22:55:20.093460Z DEBUG one_collect::helpers::dotnet::os::linux: Opened diagnostic socket: pid=1543063, nspid=1543063 [record-trace] 2026-01-29T22:55:20.093476Z DEBUG one_collect::helpers::dotnet::os::linux: Opened diagnostic socket: pid=1543063, nspid=1543063 [record-trace] 2026-01-29T22:55:20.094446Z DEBUG one_collect::helpers::dotnet::os::linux: Opened diagnostic socket: pid=1543063, nspid=1543063 [record-trace] Recording started. Press CTRL+C to stop. [record-trace] 2026-01-29T22:55:20.097795Z INFO one_collect::helpers::dotnet::os::linux: Enabled .NET events for process: pid=1543063 [record-trace] 2026-01-29T22:55:20.098955Z DEBUG one_collect::helpers::dotnet::os::linux: Opened diagnostic socket: pid=1543083, nspid=1543083 [record-trace] 2026-01-29T22:55:20.099085Z DEBUG one_collect::helpers::dotnet::os::linux: Opened diagnostic socket: pid=1543083, nspid=1543083 [record-trace] 2026-01-29T22:55:20.100017Z DEBUG one_collect::helpers::dotnet::os::linux: Opened diagnostic socket: pid=1543083, nspid=1543083 [record-trace] 2026-01-29T22:55:20.104842Z INFO one_collect::helpers::dotnet::os::linux: Enabled .NET events for process: pid=1543083 Stopping record-trace with SIGINT. Waiting for record-trace to exit... [record-trace] Recording stopped. [record-trace] Resolving symbols. [record-trace] Finished recording trace. [record-trace] Trace written to /tmp/tmpBoli5K.nettrace Expected: 100 Actual: 100 END EXECUTION - PASSED ```
…ativeAOT tests (#124616) ## Description `Microsoft.OneCollect.RecordTrace` `0.1.33304` (from the `dotnet-diagnostics-tests` feed) contains the fix to detect .NET processes without perfmaps ([one-collect#229](microsoft/one-collect#229)), unblocking UserEvents tracing for NativeAOT. Changes: - **`eng/Versions.props`**: Bump `MicrosoftOneCollectRecordTraceVersion` from `0.1.32221` → `0.1.33304` - **`src/tests/tracing/userevents/Directory.Build.props`**: Remove `NativeAotIncompatible` (completing the #123697 checklist). The general `CLRTestTargetUnsupported` disable (#123442) is kept since tests are still flaky — validated via `/azp run runtime-nativeaot-outerloop` that NativeAOT UserEvents tests pass with the updated package. <!-- START COPILOT CODING AGENT TIPS --> --- 🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. [Learn more about Advanced Security.](https://gh.io/cca-advanced-security) --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mdh1418 <16830051+mdh1418@users.noreply.github.com>
One-collect currently depends upon the existence of an mmap call in the .NET runtime for a mapping called doublemapper in order to detect that the process is a .NET process. This is fragile as doublemapper is an implementation detail used by the W^X implementation.
Rather than depending on this mapping, depend upon the existence of a mapping called dotnet_ipc_created, which will be created by the .NET runtime once the IPC channel has been created.
For the purposes of backwards compatibility, we'll keep checking for doublemapper, though ideally the new mapping is backported.
Fixes #226