-
-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In case of interrupt signal, Task does not give time to a subprocess to cleanup #458
Comments
FYI, I am working on a PR. |
Helper (sleepit) and test code based on https://github.com/marco-m/timeit go-task/task/go-task#458
Helper (sleepit) and test code based on https://github.com/marco-m/timeit go-task/task/go-task#458
Helper (sleepit) and test code based on https://github.com/marco-m/timeit go-task/task/go-task#458
Hi @marco-m (and @marco-m-pix4d 😅), thanks for opening this issue! We currently use the default timeout logic given by the mvdan.cc/sh library. The implementation is around here: What it does is: when Task receives a SIGINT or SIGTERM signal, we forward that signal to the process immediately but force a SIGKILL after a given timeout. The library default timeout is 2 seconds and we just use that. I think it'd be possible to increase that if we think it's too low, and/or make it configuration per Taskfile/task/command as well. |
Hello @andreynering :-)
Ah, now I understand because in my repro test, to be able to reliably replicate the problem, I had to use a timeout longer than 2s :-D
I saw the code path with a cancellable context.Context, yes. Currently I am experimenting without any forwarding at all (I pass an empty context to the mvdan.cc/sh Runner), because on Unix there is no need to forward anything, it is the TTY driver that upon receiving CTRL-C sends a SINGINT to the whole process group (https://en.wikipedia.org/wiki/Process_group). On the other hand I am not sure of what will happen on Windows with my current approach. I will then revisit the approach following your suggestion, thanks. |
See go-task/task/go-task#458 Helper (sleepit) and test code based on https://github.com/marco-m/timeit
…c/sh We used to pass to mvdan.cc/sh/interp.Runner a context that was cancelled on reception of a OS signal. This caused the Runner to terminate the subprocess abruptly. The correct behavior instead is for us to completely ignore the signal and let the subprocess deal with it. If the subprocess doesn't handle the signal, it will be terminated. If the subprocess does handle the signal, it knows better than us wether it wants to cleanup and terminate or do something different. So now we pass an empty context just to make the API of interp.Runner happy Fixes go-task/task/go-task#458
After further thinking and experimenting, I don't think we can forward anything at all. This is explained in details in the PR #479. Still, I don't have a solution for Windows. |
So what about simply making cleanup timeout configurable? Is that possible already? |
@olegstepura as I wrote in #458 (comment), I don't think it is actually possible. Best way to understand is to checkout the PR and experiment with it. |
I'm switching from Makefile to Taskfile. I do this on Mac OS. And noticed this issue. If a task starts some docker container and I press ctrl+c container is kept, it does not exit with Taskfile. While with Makefile it did. Now I wonder how they solved it. |
Hello! I'm having the same issue on Windows where Task is not waiting for my app to cleanly shutdown. From what I've seen, there's no such option to disable the sending of a Kill signal which would be ideal in my scenario since my app handles interrupts by itself. Or at least, change the current implementation so that it still waits for the timeout but sends a Kill signal instead of an Interrupt. |
Can this be fixed please? It turned out to be a big deal for me. I keep encountering this issue again and again on any long-running app, pretty much running Task useless for such tasks :/ I would fix it myself but I have no idea of how. Maybe this is causing the issue? Why would the process be instantly killed only on windows instead of waiting for the timeout as well? I tried removing that and recompiling Task but it didn't work. Maybe I modified the incorrect file. |
Actually it seems Task doesn't even use that code. That explains why my changes didn't have an effect. I think I found the actual code responsible for handling signals and adding a delay didn't solve the issue, which tells me Task might be exiting early without waiting for goroutines to exit or something.. |
In case it is not clear: there is a PR to fix this, see #479. |
@marco-m-pix4d what is the current status of that PR? |
See go-task/task/#458 Helper (sleepit) and test code based on https://github.com/marco-m/timeit
…c/sh We used to pass to mvdan.cc/sh/interp.Runner a context that was cancelled on reception of a OS signal. This caused the Runner to terminate the subprocess abruptly. The correct behavior instead is for us to completely ignore the signal and let the subprocess deal with it. If the subprocess doesn't handle the signal, it will be terminated. If the subprocess does handle the signal, it knows better than us wether it wants to cleanup and terminate or do something different. So now we pass an empty context just to make the API of interp.Runner happy Fixes go-task/task/#458
this should be fixed now with #479 merged. Though we still need to do a release for this feature. Closing this for now. |
Any chance to get a new release for this? I just encountred this bug with a simple
I mentioned that there was no Ctrl-C made by the user at all. |
Background
Although Task has logic to intercept a SIGINT in getSignalContext():
task/cmd/task/task.go
Line 209 in d3cd9f1
the execution of a subprocess is interrupted immediately, not giving time to cleanup. For some subprocesses such as Terraform, this can have severe consequences as explained below.
The cancellable context obtained from getSignalContext() is passed everywhere:
Executor.Run()
Executor.RunTask()
Executor.runCommand()
internal/execext/exec.go:RunCommand()
and finally to
mvdan.cc/sh/v3/interp/interp.New()
The failing use case
I found this problem invoking
terraform apply
from Task.Terraform is a tool to provision cloud infrastructure as code. When it runs, it can modify state in a remote backend, for example S3. If interrupted with CTRL-C (so SIGINT), it will take around 10 seconds to cleanup in a way that is safe for the remote state to be used again.
When run under task, this grace period is skipped and Terraform is terminated too fast. This can leave the remote state in a corrupted configuration and require non-obvius manual intervention to recover.
One could argue that Terraform should have a more resilient approach, but I think that in general this is a good example of the general problem.
Proposal 1
In my opinion Task should actually do as Terraform, which reacts differently depending on how many SIGINT signals it has received:
Proposal 2
If proposal 1 is too complicated to implement, a fallback would be to introduce a per-task timeout: when a SIGINT is received, start a timer, and terminate abruptly only on timer expiration
How to reproduce
Taskfile:
Go code to simulate a program that needs time to cleanup on reception of SIGINT:
When running Task and typing CTRL-C we get:
the exit status of Task is 1 and the line "Cleanup done" is not printed, confirming that the subprocess has been terminated abruptly.
On the other hand, when running Task and sending a INT signal with kill (kill -INT ), we see:
the exit status of Task is 0 and the line "Cleanup done" is printed, confirming that the subprocess terminated by itself cleanly.
The text was updated successfully, but these errors were encountered: