You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #6574 we introduce a GracefulSignalHandler to decouple signal handling logic from the rest of pants execution to great effect. As noted in a comment in that PR, when we receive a ctrl-c from the terminal in remote_pants_runner.py, we fire off a SIGINT to the pantsd-runner process running in daemon_pants_runner.py and then wait for it to exit. This directly leads to interactivity issues, as it is not always clear to the user on the terminal why the pantsd-runner process is taking so long to close, and sometimes it doesn't close, requiring the user to manually send a SIGKILL to get their terminal session back. I think this can be effectively resolved by doing the following:
have some extremely simple retry/backoff logic for sending a SIGINT to the pantsd-runner process, escalating to a SIGTERM, then a SIGKILL, and waiting for a set timeout on the pid.
after sending a SIGKILL and waiting a bit, don't continue to wait for the pantsd-runner process to exit, because there's clearly some issue, so invoke the currently commented-out super(PailgunClientSignalHandler, self).handle_sigint(signum, frame) call from the client.
if sending SIGINT and SIGTERM failed to cause the pantsd-runner process to exit, print an error message to the terminal -- if SIGKILL fails (if waiting on the pantsd-runner pid to exit continues to time out after sending the signal), print a message with the pantsd-runner's pid/pgrp so the user can send signals or do any analysis themselves (this may be useful for logging in CI runs as well).
We would almost definitely need to address the TODOs regarding pantsd testing in #7008 before we can merge a PR fixing this.
The text was updated successfully, but these errors were encountered:
### Problem
#6552 has some changes around signal handling. This was an attempt to move those out, and to gain greater visibility into and control over when pants errors out due to a signal (helping to address difficulties in flaky tests around exiting, such as #6708).
Resolves#7014, resolves#6708, resolves#6847, resolves#7199.
### Solution
- Create a very simple `SignalHandler` class which can be subclassed to set pants's behavior upon receiving different signals in different execution contexts.
- Add a `_signal_handler` field to `ExceptionSink`.
- Apply `SignalHandler` everywhere that tries to modify pants's exception handling.
- Add the `--pantsd-pailgun-quit-timeout` bootstrap option and implement a timeout strategy to continue to forward output from the remote `pantsd-runner` process for a brief period of time upon receiving a signal, then killing the remote process if it's still alive at the end.
- Unskip some flaky tests that we hopefully won't have to reskip.
### Result
It's much more clear when and where signal handlers are set, and the flow of control is made more clear. It is easier to extend signal handlers in a hygienic way, allowing for the resolution of multiple flaky tests.
In #6574 we introduce a
GracefulSignalHandler
to decouple signal handling logic from the rest of pants execution to great effect. As noted in a comment in that PR, when we receive a ctrl-c from the terminal inremote_pants_runner.py
, we fire off a SIGINT to thepantsd-runner
process running indaemon_pants_runner.py
and then wait for it to exit. This directly leads to interactivity issues, as it is not always clear to the user on the terminal why thepantsd-runner
process is taking so long to close, and sometimes it doesn't close, requiring the user to manually send a SIGKILL to get their terminal session back. I think this can be effectively resolved by doing the following:pantsd-runner
process, escalating to a SIGTERM, then a SIGKILL, and waiting for a set timeout on the pid.pantsd-runner
process to exit, because there's clearly some issue, so invoke the currently commented-outsuper(PailgunClientSignalHandler, self).handle_sigint(signum, frame)
call from the client.pantsd-runner
process to exit, print an error message to the terminal -- if SIGKILL fails (if waiting on thepantsd-runner
pid to exit continues to time out after sending the signal), print a message with thepantsd-runner
's pid/pgrp so the user can send signals or do any analysis themselves (this may be useful for logging in CI runs as well).We would almost definitely need to address the TODOs regarding pantsd testing in #7008 before we can merge a PR fixing this.
The text was updated successfully, but these errors were encountered: