-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Fix PerfMap crash when Enable IPC command is sent early in startup #124055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
AppDomain is created in SystemDomain::Attach, which runs after DiagnosticServerAdapter::Initialize. A PerfMapEnable IPC command received during PauseForDiagnosticsMonitor would crash calling IterateAssembliesEx on null. Safe to skip: no assemblies are loaded before SystemDomain::Attach.
EEJitManager is created in ExecutionManager::Init, which runs after DiagnosticServerAdapter::Initialize. A PerfMapEnable IPC command received during PauseForDiagnosticsMonitor would crash on null dereference. Safe to skip: no JIT'd code exists before ExecutionManager::Init. CodeVersionManager::LockHolder remains inside the null check intentionally - the lock is only needed while iterating code versions, not for the null check. The lock is already initialized in CodeVersionManager::StaticInitialize before DiagnosticServerAdapter::Initialize.
e94322b to
62d36a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Fixes a CoreCLR startup crash when a PerfMapEnable IPC command arrives very early (before AppDomain and/or EEJitManager are initialized), by making PerfMap::Enable(sendExisting: true) resilient to those components being unavailable.
Changes:
- Guard assembly enumeration behind a null-check for
GetAppDomain()to avoid callingIterateAssembliesExbeforeSystemDomain::Attach(). - Guard JIT code heap enumeration behind a null-check for
ExecutionManager::GetEEJitManager()to avoid iterating heaps beforeExecutionManager::Init(). - Reuse the
EEJitManager*local when building theCodeHeapIterator.
|
Tagging subscribers to this area: @agocke |
The previous null checks on GetAppDomain() and GetEEJitManager() have race conditions because statics like m_pEEJitManager are not Volatile. Use a Volatile<bool> s_sendExistingReady flag instead. The flag is set by PerfMap::SignalSendExistingReady, called from EEStartup after ExecutionManager::Init. When Enable is called before the flag is set, sendExisting iteration is skipped since no assemblies are loaded and no code is JIT'd anyway.
…ndenciesReady - Rename method to better capture intent - Update comment to clarify it must be called before any code is JITed or restored from R2R - Remove stale comment about call site location
|
/backport to release/10.0 |
|
Started backporting to |
|
@mdh1418 backporting to git am output$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch
Applying: Add null check for AppDomain in early-startup PerfMap::Enable
Applying: Add null check for EEJitManager in early-startup PerfMap::Enable
Using index info to reconstruct a base tree...
M src/coreclr/vm/perfmap.cpp
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/vm/perfmap.cpp
CONFLICT (content): Merge conflict in src/coreclr/vm/perfmap.cpp
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0002 Add null check for EEJitManager in early-startup PerfMap::Enable
Error: The process '/usr/bin/git' failed with exit code 128 |
…y in startup (#124208) Manual backport of #123226 and #124055 to release/10.0 ## Customer Impact - [ ] Customer reported - [x] Found internally The DiagnosticServer allows profilers to enable PerfMap early in start-up, even before PerfMap is initialized and dependencies are ready, inducing a crash. Enabling PerfMap this early in start-up was not tested until user_events support was added in .NET 10, where One-Collect's [record-trace](https://github.com/microsoft/one-collect) aggressively enables PerfMaps as soon as it connects to a .NET Process so it can resolve JIT'd code. As .NET user_events + record-trace gains more users, this crash will likely be observed more frequently. With the changes in the two PRs, the runtime is resilient to early PerfMap::Enable commands received by the DiagnosticServer. DiagnosticServer initialization will still be early in start-up, PerfMap initialization is just bumped right before it. ## Regression - [ ] Yes - [x] No ## Testing I validated on a wsl2 instance, pausing the startup logic with DOTNET_DefaultDiagnosticPortSuspend=1 and kicking off a separate app invoking DiagnosticClient.EnablePerfMap. Before the change, it would sigsegv and crash. After the changes, its resilient and doesn't crash. ## Risk Low. PerfMap initialization doesn't depend on the logic between its former and proposed location. The logic to handle early IPC PerfMap Enable commands makes the problematic logic a no-op until the dependencies are ready.
Description
Fixes #123438
PerfMap::Enable(sendExisting=true) crashes when a PerfMapEnable IPC command is received very early in startup (e.g., during PauseForDiagnosticsMonitor). At this point:
Both are initialized after DiagnosticServerAdapter::Initialize, so IPC commands can arrive before these are ready.
This PR adds a volatile bool to signal when
sendExistinglogic dependencies are initialized, otherwise sendExisting logic is skipped. sendExisting logic is safe to skip when called early - no assemblies are loaded and no code is JIT'd at that point. Future JIT'd methods will still be logged.Startup Timeline
Testing
mdh1418 validated on a wsl2 instance, pausing the startup logic with
DOTNET_DefaultDiagnosticPortSuspend=1and kicking off a separate app invoking DiagnosticClient.EnablePerfMap. Before the change, it would sigsegv and crash.After the changes, its resilient and doesn't crash.
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.