Don't listen for the Console_CancelKeyPress #1323

andrewlock · 2021-03-29T16:30:04Z

Flushing and closing the tracer on cancel key press can lead to flushing the buffer before the app has finished writing spans. CurrentDomain_ProcessExit should always fire and flush, so flushing twice is unnecessary

As an example repro, create a new service worker using dotnet new worker, and update worker.cs to:

public class Worker : BackgroundService
{
    private readonly ILogger<Worker> _logger;
    public Worker(ILogger<Worker> logger, IHostApplicationLifetime lifetime)
    {
        _logger = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        var client = new HttpClient();
        using (var scope = Tracer.Instance.StartActive("background-service")); // this trace is closed AFTER the tracer is flushed
        {
            while (!stoppingToken.IsCancellationRequested)
            {
                var response = await client.GetStringAsync("https://jsonplaceholder.typicode.com/todos/1");
                _logger.LogInformation("Received Json");
                await Task.Delay(100, stoppingToken);
            }
        }
    }
}

Run the app, see the requests being made, and hit ctrl-c. The traces are not sent, as the background-service operation is closed after the Tracer is close and flushed

@DataDog/apm-dotnet

Flushing and closing the tracer on cancel key press can lead to flushing the buffer before the app has finished writing spans. CurrentDomain_ProcessExit should always fire, so flushing twice is unnecessary

lucaspimentel · 2021-03-29T20:15:22Z

If we do this, we should probably remove it from Datadog.Trace.Tools.Runner as well. WDYT, @tonyredondo?

tonyredondo · 2021-03-29T22:08:13Z

If we do this, we should probably remove it from Datadog.Trace.Tools.Runner as well. WDYT, @tonyredondo?

I did a simple test and the behavior changes:

The current code with some Console.WriteLine...

Behavior:

But if I stop listeing the Ctrl+C handler and comment the code:

Behavior:

Here `CurrentDomain_ProcessExit` doesn't get called when I press `CTRL+C`.

So I cannot remove it. I don't know why this doesn't happen in the tracer... WDYT, @andrewlock ?

tonyredondo · 2021-03-29T22:12:11Z

Also, If I don't cancel the tokenSource in the Console_CancelKeyPress event, then it waits until the child process ends. Don't cancel anything.

Code:

Behavior:

tonyredondo · 2021-03-29T22:14:00Z

I wonder then if this behavior can also happen in the tracer, because then we should handle both calls and prepare the tracer for both calls, 1 only flush, 1 flush + close.

andrewlock · 2021-03-30T11:34:03Z

Here CurrentDomain_ProcessExit doesn't get called when I press CTRL+C.

Unless I'm misunderstanding, this seems really weird @tonyredondo? Why wouldn't the CurrentDomain_ProcessExit event fire, just because you've removed the Console_CancelKeyPress handler?

we should handle both calls and prepare the tracer for both calls, 1 only flush, 1 flush + close.

This seems like a reasonable approach anyway tbh. i.e. keep the Console_CancelKeyPress handler, but change it to call FlushTracesAsync instead of FlushAndCloseAsync. I can't see a downside to that approach, assuming that we will never get a Console_CancelKeyPress without CurrentDomain_ProcessExit on process shutdown (I don't believe we will)

tonyredondo · 2021-03-30T13:05:34Z

Unless I'm misunderstanding, this seems really weird @tonyredondo? Why wouldn't the CurrentDomain_ProcessExit event fire, just because you've removed the Console_CancelKeyPress handler?

It doesn't fire If I hit CTRL+C. In a normal exit it will.
I actually have to look about this behavior, I suppose is something between one listening to SIGTERM and the other to SIGKILL. But I need to check the runtime code.

Although I saw this:

Gracefully handling Ctrl+C ignores Program Main exit code. dotnet/runtime#27595
ConsoleLifetime doesn't allow full graceful shutdown for ProcessExit dotnet/runtime#35990

For (1) I think is a problem with the dotnet CLI related to this issue (dotnet/sdk#4779)

I don't know if (2) affect us.

This seems like a reasonable approach anyway tbh. i.e. keep the Console_CancelKeyPress handler, but change it to call FlushTracesAsync instead of FlushAndCloseAsync. I can't see a downside to that approach, assuming that we will never get a Console_CancelKeyPress without CurrentDomain_ProcessExit on process shutdown (I don't believe we will)

👍🏻

tonyredondo · 2021-03-30T13:18:45Z

So:

SIGTERM is handled by AppDomain.CurrentDomain.ProcessExit
SIGINT and SIGQUIT is handled by Console.CancelKeyPress
SIGKILL can't be caught or ignored.

dotnet/runtime#4950 (comment)

Source code:

For the Console.CancelKeyPress: https://github.com/dotnet/runtime/blob/79ae74f5ca5c8a6fe3a48935e85bd7374959c570/src/libraries/Native/Unix/System.Native/pal_signal.c#L109-L122 : where g_ctrlCallback is registered here called from here and called from here and here
Follow AppDomain.ProcessExit is a bit more complicated, but it raises from the native side here and called from the finalizer thread here. That line is executed when a shutdown is requested (_ASSERTE), and the request is fulfilled on normal shutdowns or forced shutdowns controlled by the EEPolicy::HandleExitProcess (here) and that ForceEEShutdown is called from the ExceptionHandling (here). And finally we got the SIGTERM reference here

andrewlock · 2021-03-30T15:03:28Z

Ah, so in a .NET Core app, they handle the console key press and do a normal exit, so that might explain the difference with what we're seeing I guess?

Yea there was a time when SIGTERM was not handled. There was some other bug related to handling the CancelKeyPress event, I don't remember the details. You can ignore these.

Urgh, going to be so many edge-cases here...

So what's the danger if we only call FlushTracesAsync, and don't call FlushAndCloseAsync (i.e. if we get a SIGINT/SIGQUIT without the associated SIGTERM)?

I'm also wondering how this would work in (for example) Kubernetes. There the container gets SIGTERM-ed and then SIGKILL-ed after x seconds. It's possible (and in some cases, necessary) to extend that duration significantly.

As I understand it, in that case, we would get the AppDomain.CurrentDomain.ProcessExit with the SIGTERM, and then keep receiving messages for 30s, before the app finally exits. And we won't get any notification of that closing by default (ASP.NET Core has IHostApplicationLifetime that you can hook into, but that's not an easy solution for us)

tonyredondo · 2021-03-30T15:42:17Z

Ah, so in a .NET Core app, they handle the console key press and do a normal exit, so that might explain the difference with what we're seeing I guess?

Yeah I think this can be caused by a bug in the .NET CLI.

Yea there was a time when SIGTERM was not handled. There was some other bug related to handling the CancelKeyPress event, I don't remember the details. You can ignore these.

Urgh, going to be so many edge-cases here...

S C A R Y

So what's the danger if we only call FlushTracesAsync, and don't call FlushAndCloseAsync (i.e. if we get a SIGINT/SIGQUIT without the associated SIGTERM)?

I think we should do that, FlushTracesAsync for SIGINT/SIGQUIT. and FlushAndCloseAsync for SIGTERM.
The Console.CancelKeyPress eventargs has the following:

        //
        // Summary:
        //     Gets or sets a value that indicates whether simultaneously pressing the System.ConsoleModifiers.Control
        //     modifier key and the System.ConsoleKey.C console key (Ctrl+C) or the Ctrl+Break
        //     keys terminates the current process. The default is false, which terminates the
        //     current process.
        //
        // Returns:
        //     true if the current process should resume when the event handler concludes; false
        //     if the current process should terminate. The default value is false; the current
        //     process terminates when the event handler returns. If true, the current process
        //     continues.
        public bool Cancel { get; set; }

I think we can control the behavior by setting this to true and doing a grateful shutdown if we want.

As I understand it, in that case, we would get the AppDomain.CurrentDomain.ProcessExit with the SIGTERM, and then keep receiving messages for 30s, before the app finally exits.

Maybe I'm missing something but if we are already handling the SIGTERM we just can end the process before receiving the SIGKILL signal. Isn't the SIGKILL signal being sent only if the pid is still alive?

andrewlock · 2021-03-30T15:55:30Z

I think we can control the behavior by setting this to true and doing a grateful shutdown if we want.

That seems like a dangerous thing for us to be messing with in people's apps though 😟 I feel like we're just stuck with handling the various combinations that could happen.

Maybe I'm missing something but if we are already handling the SIGTERM we just can end the process before receiving the SIGKILL signal. Isn't the SIGKILL signal being sent only if the pid is still alive?

The point is that we will get the SIGTERM but the app won't stop handling requests for (in my example) 30s. That means we will lose those 30s of traces while the aspnetcore app is still running, before it exits gracefully. (SIGKILL doesn't matter here, we won't see it if config is correct and app closes gracefully before it's killed).

e.g. Timeline of events:

Time, s	Event
-1	Kubernetes starts a new deployment
0	Kubernetes sends a `SIGTERM` to the old pod
^	Tracer detects the `SIGTERM`, and performs flush and close
^	ASP.NET Core detects `SIGTERM` and starts shutting down, calling `IHostApplicationLifetime.Stopping`
^	A handler attached to the `Stopping` event blocks for 30s, to give time for the Ingress to drain requests from the pod. Traffic is still handled, but we won't be flushing any traces
30	The event handler exits, allowing aspnetcore to finish shutdown
31	The app exits normally
-	(`SIGTERM` is never called as the app exited within a 60s threshold set on the pods (for example)

tonyredondo · 2021-03-30T16:02:19Z

That seems like a dangerous thing for us to be messing with in people's apps though 😟 I feel like we're just stuck with handling the various combinations that could happen.

Yes, you're right, I agree we shouldn't be playing with the customer app behavior.

The point is that we will get the SIGTERM but the app won't stop handling requests for (in my example) 30s. That means we will lose those 30s of traces while the aspnetcore app is still running, before it exits gracefully. (SIGKILL doesn't matter here, we won't see it if config is correct and app closes gracefully before it's killed).

Oh now I get it!. Yes we have a problem here... hooking to IHostApplicationLifetime not gonna be easy from our side.

andrewlock · 2021-04-01T13:06:21Z

Closing this for now as this is overly simplistic and ignores other edge cases. @tonyredondo is looking at an alternative, more robust, approach playing with the ProcessExit delegate list... hopefully that pans out! 🤞

Don't listen for the Console_CancelKeyPress

04dcad0

Flushing and closing the tracer on cancel key press can lead to flushing the buffer before the app has finished writing spans. CurrentDomain_ProcessExit should always fire, so flushing twice is unnecessary

andrewlock requested a review from a team as a code owner March 29, 2021 16:30

andrewlock closed this Apr 1, 2021

andrewlock deleted the andrew/remove-console-cancel-event branch April 1, 2021 13:06

tonyredondo mentioned this pull request Apr 1, 2021

ProcessExit event handler improvements #1332

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't listen for the Console_CancelKeyPress #1323

Don't listen for the Console_CancelKeyPress #1323

andrewlock commented Mar 29, 2021

lucaspimentel commented Mar 29, 2021

tonyredondo commented Mar 29, 2021

tonyredondo commented Mar 29, 2021

tonyredondo commented Mar 29, 2021

andrewlock commented Mar 30, 2021

tonyredondo commented Mar 30, 2021

tonyredondo commented Mar 30, 2021 •

edited

Loading

andrewlock commented Mar 30, 2021

tonyredondo commented Mar 30, 2021 •

edited

Loading

andrewlock commented Mar 30, 2021

tonyredondo commented Mar 30, 2021

andrewlock commented Apr 1, 2021

Don't listen for the Console_CancelKeyPress #1323

Don't listen for the Console_CancelKeyPress #1323

Conversation

andrewlock commented Mar 29, 2021

lucaspimentel commented Mar 29, 2021

tonyredondo commented Mar 29, 2021

The current code with some Console.WriteLine...

Behavior:

But if I stop listeing the Ctrl+C handler and comment the code:

Behavior:

Here CurrentDomain_ProcessExit doesn't get called when I press CTRL+C.

tonyredondo commented Mar 29, 2021

tonyredondo commented Mar 29, 2021

andrewlock commented Mar 30, 2021

tonyredondo commented Mar 30, 2021

tonyredondo commented Mar 30, 2021 • edited Loading

andrewlock commented Mar 30, 2021

tonyredondo commented Mar 30, 2021 • edited Loading

andrewlock commented Mar 30, 2021

tonyredondo commented Mar 30, 2021

andrewlock commented Apr 1, 2021

Here `CurrentDomain_ProcessExit` doesn't get called when I press `CTRL+C`.

tonyredondo commented Mar 30, 2021 •

edited

Loading

tonyredondo commented Mar 30, 2021 •

edited

Loading