Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Feb 5, 2026

Description

Fixes #123438

PerfMap::Enable(sendExisting=true) crashes when a PerfMapEnable IPC command is received very early in startup (e.g., during PauseForDiagnosticsMonitor). At this point:

  • GetAppDomain() returns NULL (created in SystemDomain::Attach)
  • ExecutionManager::GetEEJitManager() returns NULL (created in ExecutionManager::Init)

Both are initialized after DiagnosticServerAdapter::Initialize, so IPC commands can arrive before these are ready.

This PR adds a volatile bool to signal when sendExisting logic dependencies are initialized, otherwise sendExisting logic is skipped. sendExisting logic is safe to skip when called early - no assemblies are loaded and no code is JIT'd at that point. Future JIT'd methods will still be logged.

Startup Timeline

PerfMap::Initialize()
DiagnosticServerAdapter::Initialize()    ← IPC listener starts
PauseForDiagnosticsMonitor()             ← Can receive PerfMapEnable here
...
SystemDomain::Attach()                   ← AppDomain created
ExecutionManager::Init()                 ← EEJitManager created
PerfMap::SignalSendExistingReady()       ← Flag set, sendExisting now safe

Testing

mdh1418 validated on a wsl2 instance, pausing the startup logic with DOTNET_DefaultDiagnosticPortSuspend=1 and kicking off a separate app invoking DiagnosticClient.EnablePerfMap. Before the change, it would sigsegv and crash.
After the changes, its resilient and doesn't crash.

Original prompt

Can you investigate ways to fix #123438


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Investigate ways to fix runtime issue 123438 Fix PerfMap crash when Enable IPC command is sent early in startup Feb 5, 2026
Copilot AI requested a review from mdh1418 February 5, 2026 19:06
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 6, 2026
AppDomain is created in SystemDomain::Attach, which runs after
DiagnosticServerAdapter::Initialize. A PerfMapEnable IPC command received
during PauseForDiagnosticsMonitor would crash calling IterateAssembliesEx
on null.

Safe to skip: no assemblies are loaded before SystemDomain::Attach.
EEJitManager is created in ExecutionManager::Init, which runs after
DiagnosticServerAdapter::Initialize. A PerfMapEnable IPC command received
during PauseForDiagnosticsMonitor would crash on null dereference.

Safe to skip: no JIT'd code exists before ExecutionManager::Init.

CodeVersionManager::LockHolder remains inside the null check intentionally -
the lock is only needed while iterating code versions, not for the null check.
The lock is already initialized in CodeVersionManager::StaticInitialize before
DiagnosticServerAdapter::Initialize.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a CoreCLR startup crash when a PerfMapEnable IPC command arrives very early (before AppDomain and/or EEJitManager are initialized), by making PerfMap::Enable(sendExisting: true) resilient to those components being unavailable.

Changes:

  • Guard assembly enumeration behind a null-check for GetAppDomain() to avoid calling IterateAssembliesEx before SystemDomain::Attach().
  • Guard JIT code heap enumeration behind a null-check for ExecutionManager::GetEEJitManager() to avoid iterating heaps before ExecutionManager::Init().
  • Reuse the EEJitManager* local when building the CodeHeapIterator.

@mdh1418 mdh1418 added area-VM-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Feb 6, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @agocke
See info in area-owners.md if you want to be subscribed.

The previous null checks on GetAppDomain() and GetEEJitManager() have race
conditions because statics like m_pEEJitManager are not Volatile.

Use a Volatile<bool> s_sendExistingReady flag instead. The flag is set by
PerfMap::SignalSendExistingReady, called from EEStartup after
ExecutionManager::Init. When Enable is called before the flag is set,
sendExisting iteration is skipped since no assemblies are loaded and no
code is JIT'd anyway.
…ndenciesReady

- Rename method to better capture intent
- Update comment to clarify it must be called before any code is JITed or restored from R2R
- Remove stale comment about call site location
@mdh1418
Copy link
Member

mdh1418 commented Feb 10, 2026

/backport to release/10.0

@github-actions
Copy link
Contributor

Started backporting to release/10.0 (link to workflow run)

@github-actions
Copy link
Contributor

@mdh1418 backporting to release/10.0 failed, the patch most likely resulted in conflicts. Please backport manually!

git am output
$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Add null check for AppDomain in early-startup PerfMap::Enable
Applying: Add null check for EEJitManager in early-startup PerfMap::Enable
Using index info to reconstruct a base tree...
M	src/coreclr/vm/perfmap.cpp
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/vm/perfmap.cpp
CONFLICT (content): Merge conflict in src/coreclr/vm/perfmap.cpp
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0002 Add null check for EEJitManager in early-startup PerfMap::Enable
Error: The process '/usr/bin/git' failed with exit code 128

Link to workflow output

@hoyosjs hoyosjs deleted the copilot/fix-runtime-issue-123438 branch February 10, 2026 22:22
steveisok pushed a commit that referenced this pull request Feb 11, 2026
…y in startup (#124208)

Manual backport of #123226 and #124055 to release/10.0

## Customer Impact

- [ ] Customer reported
- [x] Found internally

The DiagnosticServer allows profilers to enable PerfMap early in
start-up, even before PerfMap is initialized and dependencies are ready,
inducing a crash. Enabling PerfMap this early in start-up was not tested
until user_events support was added in .NET 10, where One-Collect's
[record-trace](https://github.com/microsoft/one-collect) aggressively
enables PerfMaps as soon as it connects to a .NET Process so it can
resolve JIT'd code. As .NET user_events + record-trace gains more users,
this crash will likely be observed more frequently.

With the changes in the two PRs, the runtime is resilient to early
PerfMap::Enable commands received by the DiagnosticServer.
DiagnosticServer initialization will still be early in start-up, PerfMap
initialization is just bumped right before it.

## Regression

- [ ] Yes
- [x] No

## Testing

I validated on a wsl2 instance, pausing the startup logic with
DOTNET_DefaultDiagnosticPortSuspend=1 and kicking off a separate app
invoking DiagnosticClient.EnablePerfMap. Before the change, it would
sigsegv and crash.
After the changes, its resilient and doesn't crash.

## Risk

Low. PerfMap initialization doesn't depend on the logic between its
former and proposed location. The logic to handle early IPC PerfMap
Enable commands makes the problematic logic a no-op until the
dependencies are ready.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PerfMap Reenable Crashing from AssemblyIterator IterateAssembliesEx

3 participants