use a more intricate timeout strategy upon ctrl-c to address pantsd interactivity issues #7014

cosmicexplorer · 2019-01-01T18:52:14Z

In #6574 we introduce a GracefulSignalHandler to decouple signal handling logic from the rest of pants execution to great effect. As noted in a comment in that PR, when we receive a ctrl-c from the terminal in remote_pants_runner.py, we fire off a SIGINT to the pantsd-runner process running in daemon_pants_runner.py and then wait for it to exit. This directly leads to interactivity issues, as it is not always clear to the user on the terminal why the pantsd-runner process is taking so long to close, and sometimes it doesn't close, requiring the user to manually send a SIGKILL to get their terminal session back. I think this can be effectively resolved by doing the following:

have some extremely simple retry/backoff logic for sending a SIGINT to the pantsd-runner process, escalating to a SIGTERM, then a SIGKILL, and waiting for a set timeout on the pid.
after sending a SIGKILL and waiting a bit, don't continue to wait for the pantsd-runner process to exit, because there's clearly some issue, so invoke the currently commented-out super(PailgunClientSignalHandler, self).handle_sigint(signum, frame) call from the client.
if sending SIGINT and SIGTERM failed to cause the pantsd-runner process to exit, print an error message to the terminal -- if SIGKILL fails (if waiting on the pantsd-runner pid to exit continues to time out after sending the signal), print a message with the pantsd-runner's pid/pgrp so the user can send signals or do any analysis themselves (this may be useful for logging in CI runs as well).

We would almost definitely need to address the TODOs regarding pantsd testing in #7008 before we can merge a PR fixing this.

The text was updated successfully, but these errors were encountered:

### Problem #6552 has some changes around signal handling. This was an attempt to move those out, and to gain greater visibility into and control over when pants errors out due to a signal (helping to address difficulties in flaky tests around exiting, such as #6708). Resolves #7014, resolves #6708, resolves #6847, resolves #7199. ### Solution - Create a very simple `SignalHandler` class which can be subclassed to set pants's behavior upon receiving different signals in different execution contexts. - Add a `_signal_handler` field to `ExceptionSink`. - Apply `SignalHandler` everywhere that tries to modify pants's exception handling. - Add the `--pantsd-pailgun-quit-timeout` bootstrap option and implement a timeout strategy to continue to forward output from the remote `pantsd-runner` process for a brief period of time upon receiving a signal, then killing the remote process if it's still alive at the end. - Unskip some flaky tests that we hopefully won't have to reskip. ### Result It's much more clear when and where signal handlers are set, and the flow of control is made more clear. It is easier to extend signal handlers in a hygienic way, allowing for the resolution of multiple flaky tests.

cosmicexplorer added the pantsd label Jan 1, 2019

cosmicexplorer mentioned this issue Mar 5, 2019

Handle signals gracefully #6574

Merged

cosmicexplorer closed this as completed in #6574 Mar 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use a more intricate timeout strategy upon ctrl-c to address pantsd interactivity issues #7014

use a more intricate timeout strategy upon ctrl-c to address pantsd interactivity issues #7014

cosmicexplorer commented Jan 1, 2019 •

edited

Loading

use a more intricate timeout strategy upon ctrl-c to address pantsd interactivity issues #7014

use a more intricate timeout strategy upon ctrl-c to address pantsd interactivity issues #7014

Comments

cosmicexplorer commented Jan 1, 2019 • edited Loading

cosmicexplorer commented Jan 1, 2019 •

edited

Loading