Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

On SIGTERM default to a non-zero exit code #21300

Merged
merged 13 commits into from
Jan 15, 2019
Merged

Conversation

tmds
Copy link
Member

@tmds tmds commented Nov 30, 2018

@tmds
Copy link
Member Author

tmds commented Dec 3, 2018

@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test please

src/pal/src/exception/signal.cpp Outdated Show resolved Hide resolved
src/pal/src/synchmgr/synchmanager.cpp Outdated Show resolved Hide resolved
@janvorli
Copy link
Member

janvorli commented Dec 4, 2018

Btw, the OSX build has failed with a real issue:

/Users/dotnet-bot/j/workspace/dotnet_coreclr/master/x64_checked_osx10.12_innerloop_prtest/src/pal/src/exception/signal.cpp:575:9: error: use of undeclared identifier 'restore_signal_and_resend'
04:36:10         restore_signal_and_resend(SIGTERM, &g_previous_sigterm);

SIGTERM is used on OSX too.

@tmds
Copy link
Member Author

tmds commented Dec 5, 2018

Important note:
This is a breaking change, especially for application that handle AppDomain.CurrentDomain.ProcessExit/DomainUnload to implement clean termination but don't set Environment.ExitCode to 0.

@tmds
Copy link
Member Author

tmds commented Dec 5, 2018

I'm thinking that instead of setting the exitcode, we should flag that this is a 'terminationExit'.
If the user sets Environment.ExitCode in ProcessExit we clear this flag.
Then when we exit the process, we look at the flag and call exit or do a termination by 'SIGTERM' signal (using the default handler).
@janvorli what do you think?

@janvorli
Copy link
Member

janvorli commented Dec 5, 2018

@tmds I am not sure I understand what would the change to use a terminationExit flag change w.r.t. the breaking change you've mentioned. Unless I am missing something, you'd still terminate the app with the same exit code as the current change if the user doesn't set the Environment.ExitCode in the AppDomain.CurrentDomain.ProcessExit/DomainUnload, right?

@tmds
Copy link
Member Author

tmds commented Dec 5, 2018

Yes, those comments were unrelated.

I was thinking a user may want to add something in ProcessExit like:

if (failed && Environment.ExitCode != 0) Environment.ExitCode = 15;

That would allow different ProcessExit event handlers to work together.
With the current implementation, the ExitCode is already set to non-zero.

@janvorli
Copy link
Member

janvorli commented Dec 5, 2018

@tmds ah, got it.

@janvorli
Copy link
Member

janvorli commented Dec 5, 2018

@jkotas, what do you think about the breaking aspect of this change that @tmds mentioned above?

@tmds
Copy link
Member Author

tmds commented Dec 5, 2018

Thinking about it more:

on SIGTERM: set 'terminateBySigterm'
if ProcessExit handlers registered (checked when invoking): clear 'terminateBySigterm'
terminate the process based on the terminateBySigterm flag.

That will make the exit non-zero when there are no ProcessExit handlers and keep the existing exit code when there are such handlers.

@janvorli
Copy link
Member

janvorli commented Dec 5, 2018

I am fine with the final state of the code. Let's see what @jkotas thinks about the breaking aspect of this change before merging it.

@tmds
Copy link
Member Author

tmds commented Dec 5, 2018

Thank you for reviewing @janvorli.
I think this change would be good to handle the breaking aspect: #21300 (comment).

@jkotas
Copy link
Member

jkotas commented Dec 5, 2018

give a non-zero exit code if it doesn't terminate cleanly
give a zero exit code if it shutdown cleanly

There is no way to distinguish between these two cases today, in backward compatible way. Basically, there is no API that the app can use to say "I want to shutdown cleanly even though the process got terminated by a signal". Your change is changing the Environment.ExitCode to be such API.

This includes Kestrel (and things built using the 'generic host').

This means that this change needs to be coordinated with Kestrel. BTW: What do you mean by generic host?

@tmds
Copy link
Member Author

tmds commented Dec 5, 2018

There is no way to distinguish between these two cases today, in backward compatible way.

To shutdown cleanly, some action needs to be performed on exit. That's why we could use the presence of ProcessExit handlers as an assumption for clean termination. I think that is as backwards compatible as possible.

What do you mean by generic host?

The underlying parts of Kestrel can be used for building hosting apps that process non-http request.

@jkotas
Copy link
Member

jkotas commented Dec 5, 2018

That's why we could use the presence of ProcessExit handlers as an assumption for clean termination.

That is wrong assumption to make. Presence of ProcessExit handlers does not imply clean termination. For example, logging libraries often subscribe to ProcessExit handlers to flush their buffers - but that does not make the process exit cleanly.

@tmds
Copy link
Member Author

tmds commented Dec 5, 2018

Presence of ProcessExit handlers does not imply clean termination.

That's why I call it an assumption :)

For example, logging libraries often subscribe to ProcessExit handlers to flush their buffers - but that does not make the process exit cleanly.

For those programs it keeps the behaviour as-is and requires someone to explicitly add a handler that sets the exit code to non-zero.

I'm suggesting it as an option to improve backwards compat.

@tmds
Copy link
Member Author

tmds commented Dec 6, 2018

@davidfowl ptal at this comment

@tmds
Copy link
Member Author

tmds commented Dec 7, 2018

I'm happy with the current state of the PR.
I think it is better to indicate success as failure than failure as success.

Another suggestion for change.

We can change the 128 + SIGTERM to a fixed value above 0xff. This makes it easier to detect termination by exit in ProcessExit. The value above 0xff is due to the higher bits getting filtered out by the exit call:

// TODO: Convert uExitCodes into sysexits(3)?
ERROR("exit() only supports the lower 8-bits of an exit code. "
"status will only see error 0x%x instead of 0x%x.\n", uExitCode & 0xff, uExitCode);

As a value, we could for example use: 128 + 256 + 15 (SIGTERM) = 393.

The implementation could be extended further (at a later time):

  • set a termination flag on SIGTERM
  • add an API Environment.IsTerminationExitCode that returns the flag
  • calls to Environment.ExitCode/Exit clear the flag
  • terminate via kill(SIGTERM) instead of exit based on the flag

@jkotas
Copy link
Member

jkotas commented Dec 7, 2018

This delta looks fine to me as well, however there are multiple follow ups necessary:

  • Confirm that this change will work for Kestrel and/or whether it is necessary to fix Kestrel for the new behavior
  • Tests in CoreFX
  • Make it work the same on Windows (the tests should tell us where we are - where the differences are)

@tmds
Copy link
Member Author

tmds commented Dec 11, 2018

Confirm that this change will work for Kestrel and/or whether it is necessary to fix Kestrel for the new behavior

@davidfowl @Tratcher can you please have a look?

Make it work the same on Windows (the tests should tell us where we are - where the differences are)

Looking a bit into Windows behavior:

When closing an app with the with the close button, the ProcessExit event fires. The parent gets an exit code of STATUS_CONTROL_C_EXIT. Changing the Environment.ExitCode in the event handler, or calling Environment.Exit doesn't affect the exit code. It's good that this is non-zero by default. Maybe we can make a change so the ExitCode becomes settable, or maybe this isn't controllable by user-space?

For Windows services, this can be controlled via ServiceBase.ExitCode.

I haven't found an API on Windows to request an app to terminate (cfr kill(pid, SIGTERM)). Is there such an API?

@jkotas
Copy link
Member

jkotas commented Dec 11, 2018

API on Windows

I think the closest one is https://docs.microsoft.com/en-us/windows/console/generateconsolectrlevent

@tmds
Copy link
Member Author

tmds commented Dec 18, 2018

For the open actions on this issue.

  • I'll add corefx Linux tests.

  • I won't look into changing the Windows behavior. Compared to Linux, it is good that by default the exit code is already non-zero. Similar to changes made here, it may be improved further by making visible the non-zero exit code in ProcessExit and allow it to become zero. For Windows console apps, I don't think it is as important to be able to come to the zero exit code because there is no termination API (like kill SIGTERM). For Windows services, it is important and already possible via the ServiceBase class.

  • @davidfowl @Tratcher to look into impact for Kestrel/Generic host.

@tmds
Copy link
Member Author

tmds commented Jan 7, 2019

I think no more review changes are needed for this PR. So we are waiting for an ACK from @davidfowl / @Tratcher on corresponding Kestrel/Host changes.

@Tratcher
Copy link
Member

Tratcher commented Jan 7, 2019

AspNetCore's hosts attempt to shut down gracefully when triggered by ProcessExit or CancelKeyPress:
https://github.com/aspnet/Extensions/blob/24c84a5b1db0fca3cfcc54eed3f4f94c6670a709/src/Hosting/Hosting/src/Internal/ConsoleLifetime.cs#L51-L52
https://github.com/aspnet/AspNetCore/blob/dc206b747e17373f8f0b75432f5cc4a0f5cc46e3/src/Hosting/Hosting/src/WebHostExtensions.cs#L155-L156

We don't do anything with the exit code. The event threads don't even know if the app ends up shutting down gracefully or non-gracefully. Setting any exit code would need to happen on the Main thread.

@tmds
Copy link
Member Author

tmds commented Jan 9, 2019

Setting any exit code would need to happen on the Main thread.

When the app is terminated by SIGTERM, the system exit function is called with the Environment.ExitCode value. The main thread may not complete.

Before this PR, the ExitCode was zero by default. This isn't ok for applications that don't terminate nicely. Now, the ExitCode will be 143 by default (on SIGTERM). So if Kestrel manages to shutdown cleanly, then ExitCode should be set to zero in the ProcessExit event handler (after the ManualResetEvent.Wait).

@Tratcher
Copy link
Member

Tratcher commented Jan 9, 2019

Go ahead and merge. We'll file an issue to follow up on our end.

@jkotas
Copy link
Member

jkotas commented Jan 9, 2019

@dotnet-bot test this please

@jkotas
Copy link
Member

jkotas commented Jan 10, 2019

@tmds Do you plan to add the tests?

@janvorli Does the delta look good to you otherwise?

@tmds
Copy link
Member Author

tmds commented Jan 10, 2019

Yes, I'll add tests for this in corefx.

@janvorli
Copy link
Member

Does the delta look good to you otherwise?

@jkotas Yes, it does

@jkotas
Copy link
Member

jkotas commented Jan 10, 2019

@tmds Could you please submit a PR to CoreFX with the (failing) tests? I would like to make it easy to see that the test are failing without this change, and they will start passing once this change reaches CoreFX.

// We set a non-zero exit code to indicate the process didn't terminate cleanly.
// This value can be changed by the user by setting Environment.ExitCode in the
// ProcessExit event. We only start termination on the first SIGTERM signal
// to ensure we don't overwrite an exit code already set in ProcessExit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the SIGTERM causes an int-returning Program.Main to exit gracefully and return a non-zero exit code?

I'm aware of dotnet/aspnetcore#6526 which tracks using Environment.ExitCode in the ProcessExit event to set the process exit code to zero when the ASP.NET host exits gracefully.

I'm more concerned about apps that return custom exit codes from Program.Main. Today this works as long as you avoid dotnet run as demonstrated in dotnet/corefx#32749. Now that dotnet/cli#10544 has been merged, dotnet run shouldn't even be an issue going forward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the SIGTERM causes an int-returning Program.Main to exit gracefully and return a non-zero exit code?

The program observing the exit code will assume the application didn't exit gracefully.

I'm more concerned about apps that return custom exit codes from Program.Main.

The ProcessExit handler must be aware of those. It can behave different based on the Environment.ExitCode value.

tmds added a commit to tmds/corefx that referenced this pull request Jan 14, 2019
This test verifies the default ExitCode on SIGTERM is not zero.

Verifies dotnet/coreclr#21300.
@jkotas jkotas merged commit 3f09035 into dotnet:master Jan 15, 2019
@jkotas
Copy link
Member

jkotas commented Jan 15, 2019

@tmds Thank you!

stephentoub pushed a commit to dotnet/corefx that referenced this pull request Jan 22, 2019
* Unix: Add test for ExitCode on SIGTERM

This test verifies the default ExitCode on SIGTERM is not zero.

Verifies dotnet/coreclr#21300.

* PR feedback

* Fix BuildConfigurations
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* On SIGTERM default to a non-zero exit code

* Fix Windows builds

* Improve SIG_DFL/SIG_IGN handling

* Remove PAL_GetTerminationExitCode

* Use sa_handler/sa_sigaction based on SA_SIGINFO; remove HAVE_SIGINFO_T.

* configure.cmake: remove siginfo_t check

* Move restore_signal_and_resend so OSX can use it; add function documentation

* Fix OSX build: include pal/process.h for gPID

* Check SIG_IGN and SIG_DFL against sa_handler

* Don't use sa_handler when SA_SIGINFO is set

* Fix equality check

* Swap order of checking SA_SIGINFO and SIG_IGN/SIG_DFL


Commit migrated from dotnet/coreclr@3f09035
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Unix: Add test for ExitCode on SIGTERM

This test verifies the default ExitCode on SIGTERM is not zero.

Verifies dotnet/coreclr#21300.

* PR feedback

* Fix BuildConfigurations


Commit migrated from dotnet/corefx@e3ab2b0
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants