-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Description
A .NET 9 application running as the primary process (PID 1) in a Docker container fails to terminate after an unhandled exception. The runtime correctly generates a crash dump as configured by DOTNET_DbgEnableMiniDump=1, but immediately afterward the process continues running instead of exiting.
Reproduction Steps
Update: the reproduction has been moved to its own repo at TestUnhandledCrashConsoleApp.
The following minimal application and Docker configuration reproduces the issue.
Step 1: C# Application Code (Program.cs) triggers an unhandled exception on a timer thread after 10 seconds.
AppDomain.CurrentDomain.UnhandledException += (sender, e) =>
{
Console.WriteLine("Unhandled exception: " + e.ExceptionObject.ToString());
};
using var timer = new Timer(
callback: _ => throw new InvalidOperationException("This is a test exception"),
state: null,
dueTime: TimeSpan.FromSeconds(10),
period: Timeout.InfiniteTimeSpan
);
Console.WriteLine("Hello, World! Application will crash in 10 seconds...");
// Keep the application running
Thread.Sleep(Timeout.Infinite);
Step 2: Publish the application as a self-contained executable for the Linux x64 runtime.
dotnet publish -c Release -r linux-x64
Step 3: Dockerfile
Create a Dockerfile to package the application.
# Use the official .NET 9 runtime base image
FROM mcr.microsoft.com/dotnet/aspnet:9.0
# Set the working directory inside the container
WORKDIR /app
# Copy the published application files into the container
COPY ./bin/Release/net9.0/linux-x64/publish/ .
# Set the entrypoint to run the application.
# Using this "exec" form ensures the app runs as PID 1.
ENTRYPOINT ["./YourAppName"]
(Note: Replace YourAppName with the actual name of the project's executable.)
Step 4: Build the image and run containers demonstrating the different behaviors.
docker build -t pid1-crash-app .
Step 5: Execute the following test cases:
A. FAILS: Application crashes, creates a dump, and then hangs.
docker run --rm -v ~/Downloads:/dumps -e "DOTNET_DbgEnableMiniDump=1" -e "DOTNET_DbgMiniDumpType=1" -e "DOTNET_DbgMiniDumpName=/dumps/coredump.%p" pid1-crash-app
B. PASSES: Application crashes and exits correctly (no dump created).
docker run --rm -v ~/Downloads:/dumps -e "DOTNET_DbgEnableMiniDump=0" -e "DOTNET_DbgMiniDumpType=1" -e "DOTNET_DbgMiniDumpName=/dumps/coredump.%p" pid1-crash-app
C. PASSES: Application is not PID 1, creates a dump, and exits correctly.
docker run --rm --init -v ~/Downloads:/dumps -e "DOTNET_DbgEnableMiniDump=1" -e "DOTNET_DbgMiniDumpType=1" -e "DOTNET_DbgMiniDumpName=/dumps/coredump.%p" pid1-crash-app
TestUnhandledCrashConsoleApp.zip
Expected behavior
Test Case A PASSES: Application is PID 1, creates a dump, and exits correctly.
Actual behavior
Test Case A FAILS: Application crashes, creates a dump, and then hangs.
Regression?
Yes.
A .NET 8 - based application passes all 3 tests successfully.
Known Workarounds
This behavior does not occur under the following conditions:
- When DOTNET_DbgEnableMiniDump is set to 0.
- When the container is run with an init system (e.g., using the docker run --init flag), which prevents the application from running as PID 1.
Configuration
- .NET Version: .NET 9
- Operating System: Linux (Ubuntu 20.04, etc)
- Deployment: An application running in a Docker container.
Other information
STRACE Analysis: Attached strace.txt from a sidecar container.
strace -f -e trace=signal -p 1
- Successful Dump Creation: strace logs confirm that the .NET runtime successfully launches the createdump utility and is notified of its completion via the SIGCHLD signal.
[pid 4633] +++ exited with 255 +++
[pid 1102] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=4633, si_uid=5001, si_status=255, si_utime=0, si_stime=9 /* 0.09 s */} —
- Failed Termination Sequence: After the dump is complete, the strace log shows a dedicated crash-handling thread pid 4094 attempting to terminate the application by sending a SIGABRT signal twice to itself. This signal is "swallowed" and the process does not terminate.
[pid 4094] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=1, si_uid=5001} ---
[pid 4094] rt_sigaction(SIGABRT, {sa_handler=SIG_DFL, sa_mask=~[], sa_flags=SA_RESTORER, sa_restorer=0x7e87a0a78090}, NULL, 8) = 0
[pid 4094] rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
[pid 4094] tgkill(1, 4094, SIGABRT) = 0
[pid 4094] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 4094] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=1, si_uid=5001} ---
- State Corruption and SIGSEGV: Immediately after the failed SIGABRT attempts, the crash-handling thread receives a SIGSEGV (Segmentation Fault), indicating its memory state has become corrupted.
[pid 4094] --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---
gdb Analysis:
- A gdb backtrace of the hung process confirms the cause of the SIGSEGV is a call to
__netlink_requestwith a null pointer (h=0x0).
Thread 238 (LWP 4094):
#0 0x00007e87a0b47bbf in __netlink_request (h=0x0, type=<optimized out>) at ../sysdeps/unix/sysv/linux/ifaddrs.c:119
- The gdb command info handle SIGABRT shows the default handler is registered (yet strace shows the signal is not handled correctly).
(gdb) info handle SIGABRT
Signal Stop Print Pass to program Description
SIGABRT Yes Yes Yes Aborted