-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Description
Running BenchmarkDotNet MicroBenchmarks on AMD Zen 4 Windows x64 machines causes an immediate crash with exit code 0x80000004 (STATUS_SINGLE_STEP). The crash occurs in ClrRestoreNonvolatileContextWorker during exception handling/context restoration. The issue reproduces on AMD EPYC but not Intel Xeon, suggesting it may be AMD-specific.
Reproduction Steps
git clone https://github.com/dotnet/performance
cd performance
python .\scripts\benchmarks_ci.py --csproj .\src\benchmarks\micro\MicroBenchmarks.csproj --incremental no --architecture x64 -f net11.0 --dotnet-versions 11.0.100-preview.1.26076.102 --bdn-artifacts .\BenchmarkDotNet.Artifacts --partition=0 --bdn-arguments="--anyCategories Libraries Runtime --logBuildOutput --generateBinLog --partition-count 15 --partition-index 0"
This will install dotnet and run the performance testing, eventually failing when it actually goes to run the microbenchmarks. For quicker replication, run the above once until the failure and then run the final command instead (all the generated setup is usable), you will need to replace dotnet with the path to the installed dotnet.
.\tools\dotnet\x64\dotnet.exe run --project .\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net11.0 --no-restore --no-build -- --anyCategories Libraries Runtime --logBuildOutput
Expected behavior
Benchmarks run successfully
Actual behavior
Process exits immediately with code 2147483652 (0x80000004 - STATUS_SINGLE_STEP).
Debugger shows crash in: coreclr!ClrRestoreNonvolatileContextWorker+0xb1
A CLR exception (e0434352) occurs just before the crash, suggesting the issue is in exception handling when OSR-compiled code is on the stack.
Regression?
Last successful run (https://dev.azure.com/dnceng/internal/_build/results?buildId=2880806&view=results) used dotnet version: 11.0.100-alpha.1.26065.110 and used commit 93c450d (although this is reproable without corerun/runtime build dependency). The issue that started after this was dotnet/sdk#52542 and is not related to the above error, so the exact breaking time is unclear.
Also interestingly, our Windows x86 runs are not hitting this issue.
Known Workarounds
Any of these environment variables prevent the crash:
- DOTNET_TC_OnStackReplacement=0 ✓
- DOTNET_TC_QuickJit=0 ✓
- DOTNET_TieredCompilation=0 ✓
- DOTNET_JitMinOpts=1 ✓
DOTNET_JitEnableGuardedDevirtualization=0 does NOT fix the issue.
This suggests the bug is in On-Stack Replacement (OSR) code generation or exception handling when OSR is active.
Configuration
- .NET Version: 11.0.0-preview.1.26076.102
- OS: Windows 11 (Build 22621)
- Architecture: x64
- CPU (failing): AMD EPYC 9124 16-Core Processor (Zen 4)
- CPU (working): Intel Xeon Platinum
Other information
Debugger analysis (Summarized with Copilot):
EXCEPTION_RECORD:
ExceptionAddress: coreclr!ClrRestoreNonvolatileContextWorker+0xb1
ExceptionCode: 80000004 (Single step exception)
FAULTING_SOURCE_FILE: D:\a\_work\1\s\src\runtime\src\coreclr\vm\amd64\Context.asm
FAULTING_SOURCE_LINE_NUMBER: 74
FAILURE_BUCKET_ID: SINGLE_STEP_80000004_coreclr.dll!ClrRestoreNonvolatileContextWorker
Stack trace at crash:
coreclr!ClrRestoreNonvolatileContextWorker+0xb1
0x00007ffa0487e92f (JIT'd code)
0x00007ffa0487e7ca (JIT'd code)
System_Linq+0x4c0bf
coreclr!CallDescrWorkerInternal+0x83
coreclr!RunMainInternal+0x16c
coreclr!RunMain+0x111
coreclr!Assembly::ExecuteMainMethod+0x1c7
Issue was discovered in CI/CD pipeline running performance benchmarks. 100% reproducible on AMD Zen 4 machines.