Skip to content

.NET 9 Process Fails To Terminate after Unhandled Exception when running as PID 1 in Docker with Crash Dump enabled #118049

@baal2000

Description

@baal2000

Description

A .NET 9 application running as the primary process (PID 1) in a Docker container fails to terminate after an unhandled exception. The runtime correctly generates a crash dump as configured by DOTNET_DbgEnableMiniDump=1, but immediately afterward the process continues running instead of exiting.

Reproduction Steps

Update: the reproduction has been moved to its own repo at TestUnhandledCrashConsoleApp.

The following minimal application and Docker configuration reproduces the issue.

Step 1: C# Application Code (Program.cs) triggers an unhandled exception on a timer thread after 10 seconds.

AppDomain.CurrentDomain.UnhandledException += (sender, e) =>
{
    Console.WriteLine("Unhandled exception: " + e.ExceptionObject.ToString());
};

using var timer = new Timer(
    callback: _ => throw new InvalidOperationException("This is a test exception"),
    state: null,
    dueTime: TimeSpan.FromSeconds(10),
    period: Timeout.InfiniteTimeSpan
);

Console.WriteLine("Hello, World! Application will crash in 10 seconds...");
// Keep the application running
Thread.Sleep(Timeout.Infinite);

Step 2: Publish the application as a self-contained executable for the Linux x64 runtime.

dotnet publish -c Release -r linux-x64

Step 3: Dockerfile
Create a Dockerfile to package the application.

# Use the official .NET 9 runtime base image
FROM mcr.microsoft.com/dotnet/aspnet:9.0

# Set the working directory inside the container
WORKDIR /app

# Copy the published application files into the container
COPY ./bin/Release/net9.0/linux-x64/publish/ .

# Set the entrypoint to run the application.
# Using this "exec" form ensures the app runs as PID 1.
ENTRYPOINT ["./YourAppName"]

(Note: Replace YourAppName with the actual name of the project's executable.)

Step 4: Build the image and run containers demonstrating the different behaviors.

docker build -t pid1-crash-app .

Step 5: Execute the following test cases:

A. FAILS: Application crashes, creates a dump, and then hangs.

docker run --rm -v ~/Downloads:/dumps -e "DOTNET_DbgEnableMiniDump=1" -e "DOTNET_DbgMiniDumpType=1" -e "DOTNET_DbgMiniDumpName=/dumps/coredump.%p" pid1-crash-app

B. PASSES: Application crashes and exits correctly (no dump created).

docker run --rm -v ~/Downloads:/dumps -e "DOTNET_DbgEnableMiniDump=0" -e "DOTNET_DbgMiniDumpType=1" -e "DOTNET_DbgMiniDumpName=/dumps/coredump.%p" pid1-crash-app

C. PASSES: Application is not PID 1, creates a dump, and exits correctly.

docker run --rm --init -v ~/Downloads:/dumps -e "DOTNET_DbgEnableMiniDump=1" -e "DOTNET_DbgMiniDumpType=1" -e "DOTNET_DbgMiniDumpName=/dumps/coredump.%p" pid1-crash-app

TestUnhandledCrashConsoleApp.zip

Expected behavior

Test Case A PASSES: Application is PID 1, creates a dump, and exits correctly.

Actual behavior

Test Case A FAILS: Application crashes, creates a dump, and then hangs.

Regression?

Yes.

A .NET 8 - based application passes all 3 tests successfully.

Known Workarounds

This behavior does not occur under the following conditions:

  • When DOTNET_DbgEnableMiniDump is set to 0.
  • When the container is run with an init system (e.g., using the docker run --init flag), which prevents the application from running as PID 1.

Configuration

  • .NET Version: .NET 9
  • Operating System: Linux (Ubuntu 20.04, etc)
  • Deployment: An application running in a Docker container.

Other information

STRACE Analysis: Attached strace.txt from a sidecar container.
strace -f -e trace=signal -p 1

  1. Successful Dump Creation: strace logs confirm that the .NET runtime successfully launches the createdump utility and is notified of its completion via the SIGCHLD signal.
[pid  4633] +++ exited with 255 +++
[pid  1102] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=4633, si_uid=5001, si_status=255, si_utime=0, si_stime=9 /* 0.09 s */} —
  1. Failed Termination Sequence: After the dump is complete, the strace log shows a dedicated crash-handling thread pid 4094 attempting to terminate the application by sending a SIGABRT signal twice to itself. This signal is "swallowed" and the process does not terminate.
[pid  4094] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=1, si_uid=5001} ---
[pid  4094] rt_sigaction(SIGABRT, {sa_handler=SIG_DFL, sa_mask=~[], sa_flags=SA_RESTORER, sa_restorer=0x7e87a0a78090}, NULL, 8) = 0
[pid  4094] rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
[pid  4094] tgkill(1, 4094, SIGABRT)    = 0
[pid  4094] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  4094] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=1, si_uid=5001} ---
  1. State Corruption and SIGSEGV: Immediately after the failed SIGABRT attempts, the crash-handling thread receives a SIGSEGV (Segmentation Fault), indicating its memory state has become corrupted.
[pid  4094] --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---

gdb Analysis:

  1. A gdb backtrace of the hung process confirms the cause of the SIGSEGV is a call to __netlink_request with a null pointer (h=0x0).
Thread 238 (LWP 4094):
#0  0x00007e87a0b47bbf in __netlink_request (h=0x0, type=<optimized out>) at ../sysdeps/unix/sysv/linux/ifaddrs.c:119
  1. The gdb command info handle SIGABRT shows the default handler is registered (yet strace shows the signal is not handled correctly).
 (gdb) info handle SIGABRT
Signal        Stop	Print	Pass to program	Description
SIGABRT       Yes	Yes	Yes		Aborted

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions