Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET 5 apps can no longer intercept SIGINT signals (receive CancelKeyPress events) when running under Docker #51221

Closed
wesnerm opened this issue Apr 14, 2021 · 11 comments · Fixed by #52891

Comments

@wesnerm
Copy link

wesnerm commented Apr 14, 2021

Description

My company has hundreds of microservices processing petabytes of data per month. We have been using SIGINT as the stop signal in our Dockerfiles. A service will intercept the SIGINT signal in the Console.CancelKeyPress event handler, set the Cancel arg to false, and initiate a final shutdown sequence to perform any cleanup and avoid data loss. SIGTERM is actually the default signal that Docker uses and that signal will call the AssemblyLoadContext.Current.Unloading and the AppDomain.Current.ProcessExit handlers before abruptly exiting. SIGINT has some advantages over SIGTERM in .NET Core such as system-recognized keystrokes and the option to exit through main.

In versions prior to .NET 5 (.NET Core 3.1 for instance), our services running inside a Docker container have been able to capture Docker stop signals of SIGINT. In .NET 5, the SIGINT signals are no longer being intercepted and Console.CancelKeyPress is never called. This issue resulted in some data loss in our services. SIGTERM signals will still invoke the Unloading and ProcessExit handlers. We are currently using SIGTERM as a temporary workaround, but we believe that the new SIGINT behavior is a regression and a breaking change. It may not have been reported earlier because SIGINT is not the default stop signal that Docker uses; it needs to be specified alongside a STOPSIGNAL keyword in the Dockerfile.

Note: The behavior does not depend on whether the signal was issued via the keyboard or a kill system call.

Our dockerfiles contain the following code. Changing the stop signal from SIGINT to SIGTERM allows the signal to be intercepted and for a clean shutdown to occur. Similarly, changing from dotnet5 to dotnet3 allowed the the signal to be intercepted.

FROM xxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/base:dotnet5
STOPSIGNAL SIGINT
WORKDIR /app
COPY . .
ENTRYPOINT ["dotnet", "Sdk.Service.dll"]

Configuration

The problem only occurs in .NET 5.0.
The containers are run in Amazon Linux AMI 2018.03 (like rhel fedora)

Also, reproduced in WSL2 Linux subsystem under Windows.

Regression?

This worked in .NET Core 3.1. It fails in .NET 5.0.

Other information

There are two related issues that I uncovered:

"CancelKeyPress not firing on coreclr on ubuntu #16088"
#16088

  • In this case, it appears that there is a wrapper process that doesn't pass through the signal to the child.
  • dotnet run was called which introduced a wrapper instead of dotnet, which does not use a wrapper.
  • Our services use dotnet in our Dockerfile. One speculation is that, in .NET 5, dotnet may forward to dotnet run.

"AppDomain.ProcessExit is not invoked on docker stop"
#36089

  • This particular bug may have been due to timeouts exceeding the Docker default 10s before the process is killed.
  • However, our services use a 90s timeout so it can't be the issue.

"dotnet run doesn't handle ctrl-c well #4779"
dotnet/sdk#4779 (comment)

@dotnet-issue-labeler dotnet-issue-labeler bot added area-PAL-coreclr untriaged New issue has not been triaged by the area owner labels Apr 14, 2021
@ghost
Copy link

ghost commented Apr 14, 2021

Tagging subscribers to this area: @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

My company has hundreds of microservices processing petabytes of data per month. We have been using SIGINT as the stop signal in our Dockerfiles. A service will intercept the SIGINT signal in the Console.CancelKeyPress event handler, set the Cancel arg to false, and initiate a final shutdown sequence to perform any cleanup and avoid data loss. SIGTERM is actually the default signal that Docker uses and that signal will call the AssemblyLoadContext.Current.Unloading and the AppDomain.Current.ProcessExit handlers before abruptly exiting. SIGINT has some advantages over SIGTERM in .NET Core such as system-recognized keystrokes and the option to exit through main.

In versions prior to .NET 5 (.NET Core 3.1 for instance), our services running inside a Docker container have been able to capture Docker stop signals of SIGINT. In .NET 5, the SIGINT signals are no longer being intercepted and Console.CancelKeyPress is never called. This issue resulted in some data loss in our services. SIGTERM signals will still invoke the Unloading and ProcessExit handlers. We are currently using SIGTERM as a temporary workaround, but we believe that the new SIGINT behavior is a regression and a breaking change. It may not have been reported earlier because SIGINT is not the default stop signal that Docker uses; it needs to be specified alongside a STOPSIGNAL keyword in the Dockerfile.

Note: The behavior does not depend on whether the signal was issued via the keyboard or a kill system call.

Our dockerfiles contain the following code. Changing the stop signal from SIGINT to SIGTERM allows the signal to be intercepted and for a clean shutdown to occur. Similarly, changing from dotnet5 to dotnet3 allowed the the signal to be intercepted.

FROM xxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/base:dotnet5
STOPSIGNAL SIGINT
WORKDIR /app
COPY . .
ENTRYPOINT ["dotnet", "Sdk.Service.dll"]

Configuration

The problem only occurs in .NET 5.0 or higher.
The containers are run in Ubuntu x64.
Also, reproduced in WSL2 Linux subsystem under Windows.

Regression?

This worked in .NET Core 3.1. It fails in .NET 5.0.

Other information

There are two related issues that I uncovered:

"CancelKeyPress not firing on coreclr on ubuntu #16088"
#16088

  • In this case, it appears that there is a wrapper process that doesn't pass through the signal to the child.
  • dotnet run was called which introduced a wrapper instead of dotnet, which does not use a wrapper.
  • Our services use dotnet in our Dockerfile. One speculation is that, in .NET 5, dotnet may forward to dotnet run.

"AppDomain.ProcessExit is not invoked on docker stop"
#36089

  • This particular bug may have been due to timeouts exceeding the Docker default 10s before the process is killed.
  • However, our services use a 90s timeout so it can't be the issue.

"dotnet run doesn't handle ctrl-c well #4779"
dotnet/sdk#4779 (comment)

Author: wesnerm
Assignees: -
Labels:

area-PAL-coreclr, area-System.Console, untriaged

Milestone: -

@wesnerm
Copy link
Author

wesnerm commented Apr 15, 2021

I recall SIGQUIT also had the same issues as SIGINT.

@tmds
Copy link
Member

tmds commented Apr 15, 2021

This works fine when I try to reproduce the issue:

$ dotnet new console -o console
$ cd console

Edit Program.cs:

using System;
using System.Threading;

namespace console
{
    class Program
    {
        static void Main(string[] args)
        {
            System.Console.WriteLine("Press Ctrl+C to stop the app.");
            ManualResetEventSlim mre = new();
            Console.CancelKeyPress += (_, e) => { mre.Set(); e.Cancel = true; };
            mre.Wait();
            Console.WriteLine("CancelKeyPress received... stopping");
            Thread.Sleep(2000);
            Console.WriteLine("Bye!");
        }
    }
}

Publish the app:

$ dotnet publish -c Release

Write a Dockerfile:

FROM mcr.microsoft.com/dotnet/runtime:5.0
WORKDIR /root
ADD bin/Release/net5.0/publish .
ENTRYPOINT ["dotnet", "console.dll"]

Build an image:

$ podman build -t cancelapp .

Now run it and press Ctrl+C:

$ podman run -ti cancelapp
Press Ctrl+C to stop the app.
^CCancelKeyPress received... stopping
Bye!

@wesnerm
Copy link
Author

wesnerm commented Apr 15, 2021

I will reproduce your steps and get back to you. I will also get additional environment information.

@josephblodgett
Copy link

josephblodgett commented Apr 15, 2021

Thank you so much for looking at this. Here are my repro steps using your steps as closely as possible.

(same)

$ dotnet new console -o console
$ cd console

(additional step)
ensure the target framework is net5.0 (csproj file)
net5.0

(same)
Edit Program.cs:

using System;
using System.Threading;

namespace console
{
    class Program
    {
        static void Main(string[] args)
        {
            System.Console.WriteLine("Press Ctrl+C to stop the app.");
            ManualResetEventSlim mre = new();
            Console.CancelKeyPress += (_, e) => { mre.Set(); e.Cancel = true; };
            mre.Wait();
            Console.WriteLine("CancelKeyPress received... stopping");
            Thread.Sleep(2000);
            Console.WriteLine("Bye!");
        }
    }
}

(same)
Publish the app:

dotnet publish -c Release

(different)
also I wrote the docker file in the publish dir
Write a docker file:

FROM mcr.microsoft.com/dotnet/aspnet:5.0
STOPSIGNAL SIGINT
WORKDIR /app
COPY . .
ENTRYPOINT ["dotnet", "console.dll"]

docker build: (different, but likely just because i'm not familiar with podman)
docker build .\publish\

Successfully built 36e###

(for me, dropping to wsl)

docker run 36e###

open another wsl

docker ps

grab the container id

docker stop 702##
702##

output from the app:

docker run 36efc1343170
Press Ctrl+C to stop the app.
/mnt/c/repro/console/bin/Release/net5.0$

no additional logging.

Now the exact same steps but with .net 3 work totally fine for me.

@tmds
Copy link
Member

tmds commented Apr 16, 2021

How does it behave when you do $ docker run -ti 36e### and press Ctrl+C?

@josephblodgett
Copy link

with -ti it runs exactly as expected where it outputs all the additional Console output:

docker run -ti 36e###
Press Ctrl+C to stop the app.
^CCancelKeyPress received... stopping
Bye!

@josephblodgett
Copy link

Now if you change TargetFramework in the .csproj to <TargetFramework>netcoreapp3.1</TargetFramework>
AND
change the Dockerfile (in the bin\Release\netcoreapp3.1\publish) to:

FROM mcr.microsoft.com/dotnet/aspnet:3.1
STOPSIGNAL SIGINT
WORKDIR /app
COPY . .
ENTRYPOINT ["dotnet", "console.dll"]

and one minor edit to your program.cs since target-typed object creation isn't available with that target:
from:

ManualResetEventSlim mre = new ();

to:

ManualResetEventSlim mre = new ManualResetEventSlim();

And run the exact same test as I outlined above via the docker stop command, it works flawlessly:

/mnt/c/repro/console/bin/Release/netcoreapp3.1$ docker run b8ba870bf4d3
Press Ctrl+C to stop the app.
CancelKeyPress received... stopping
Bye!

So some change between 3.1 and 5 appears to have a different behavior.

@tmds
Copy link
Member

tmds commented Apr 20, 2021

#34297 added a check in .NET 5:

// Initialization is only needed when input isn't redirected.
if (Console.IsInputRedirected)
{
s_initialized = true;
return;
}

This means: when there is no terminal, CancelKeyPress won't work.

The -t flag on docker run specifies if whether there is a terminal.

This is indeed a breaking change.
Though, it is not weird a CancelKeyPress can't occur when there is no terminal.

For terminating containers, the recommended signal is SIGTERM.

@jeffhandley @adamsitnik @carlossanlop @jozkee do you prefer the .NET Core 3.1, or the .NET 5 behavior?

@davidfowl
Copy link
Member

Feels like this should be reverted to 3.1 behavior and focus on this #50527 for signal handling improvements. Using SIGTERM to shutdown currently in .NET Core requires more code and blocking code in process exit in order to let other code run (and can result in deadlocks if done incorrectly).

@wesnerm
Copy link
Author

wesnerm commented Apr 22, 2021

Another example of why performing shutdown in ProcessExit is undesirable is the non-deterministic ordering and processing of exit handlers:

NLog performs automatic shutdown of logging in a ProcessExit event handler. Unfortunately, that event handler is executed before our own cleanup handler. The shutdown of the logging library occurs before our cleanup routines, which also log important information.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 18, 2021
@ghost ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels May 28, 2021
@ghost ghost closed this as completed in #52891 May 28, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 28, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jun 27, 2021
jbaehr referenced this issue Jul 29, 2021
…#52891)

* Console.Unix: fix, make SIGINT work when input is redirected.

* Fix misplaced s_initialized assignment
@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Oct 27, 2021
@adamsitnik adamsitnik added this to the 6.0.0 milestone Oct 27, 2021
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants