-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli-plugins: terminate plugin when CLI exits #4599
Conversation
Have to check some CI failures, a lot of tests calling Compose breaking (not sure if due to parallel tests running into the issue described above or due to other issues with listening on an abstract namespace sockets in CI). |
b0326c9
to
034036a
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #4599 +/- ##
==========================================
- Coverage 59.71% 59.27% -0.44%
==========================================
Files 288 285 -3
Lines 24858 24858
==========================================
- Hits 14843 14735 -108
- Misses 9128 9237 +109
+ Partials 887 886 -1 |
cmd/docker/docker.go
Outdated
pluginCloseSocketAddr := net.UnixAddr{Name: "@docker_cli_" + subcommand, Net: "unix"} | ||
listener, err := net.ListenUnix("unix", &pluginCloseSocketAddr) | ||
if err != nil { | ||
return errors.New("bork") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really need to order a bork® T-shirt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And a børk® 🇸🇪 for David)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐶
9a8e3f5
to
c9235b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks really promising, I'm glad that my ramble on solving this using a socket was helpful.
For others, Win32 job control can technically be used to solve this on Windows, but macOS still exists.
We could poll getppid(2)
, but we'd need a busy loop, spam system calls (causing context switching), and it's full of edge-cases like PR_SET_CHILD_SUBREAPER
or the fact that Solaris and Android have different init PIDs in some cases.
Letting the kernel notify us when the parent has either died, or has asked us to exit (solving the non-interactive SIGINT/SIGTERM case) seems best to me.
I'm still not sure if the 'three signal then exit' logic should apply in the non-interactive case, but we do just fine with it in dockerd as @laurazard pointed to me. Similarly, I'm not sure if we should handle SIGINT ourselves when there is a TTY in the loop, but I think this should be good enough as-is, and we can figure out further edge-cases once we get this in and have others kick the tires.
c9235b4
to
fe94a09
Compare
ba252a3
to
3da28ad
Compare
Failures in e2e;
erm... @neersighted would this be because of ... |
No, this is I'm not sure why we're getting a |
I was going to say, it seems to be failing only when running inside of a container. I still want to change that code to fallback to the previous behaviour when we can't establish a connection, so that should be okay after those changes, but I'd like to figure out why it's not working. |
Yeah, had my wires crossed on the AF_VSOCK; ended up on that as I was wondering if there's something different when running in a container 🥹 |
Not sure what's going on there either, it will take some digging into. containerd's v1 shim API used abstract sockets until containerd/containerd@126b35c, so it should work in the general case DinD... |
5147aa9
to
eb6367a
Compare
Adding this here as it seems like the most relevant place: At the last maintainers call, it was suggested that a more elegant solution to this could be built, relying on another OS primitive – unnamed pipes – for communication between the CLI and the plugin process. In this implementation, instead of generating a random abstract socket name/passing it as an environment variable, we'd create an unnamed pipe on the side of the CLI process, and set the FD 3 (or another non-0/1/2 FD) of the plugin process as that pipe, which could then use it to be alerted to exit. Note This is my current understanding of the issue, which is definitely incomplete as my knowledge of Windows internals is extremely limited 😅 This would be a great solution, but in practice it falls short as Go does not have built-in support for inheriting file descriptors on Windows. In truth, "file descriptors" aren't a concept on Windows, and although this should be possible to do using file handles, there's a lot of complexity regarding how to pass the desired file handle to the child process. While, on Unixy platforms, we can do: reader, writer, err := os.Pipe()
if err == nil {
defer writer.Close()
plugincmd.ExtraFiles = append(plugincmd.ExtraFiles, reader)
} and then simply access FD 3 on the child process, a Windows implementation would have to look like: reader, writer, err := os.Pipe()
if err == nil {
defer writer.Close()
plugincmd.SysProcAttr = &syscall.SysProcAttr{AdditionalInheritedHandles: []syscall.Handle{syscall.Handle(reader.Fd())}}
} and we would still have to pass the file handle as an environment variable/some other way so that the child process knows where to access it, as we can't just "access FD 3". This is because (For the more curious, check out https://go-review.googlesource.com/c/go/+/288299, the Go CL where We could implement this feature that way (more likely, passing the handle as an environment variable), but then we end up on the same different implementation for each platform place that we're trying to avoid. I think, taking this into account, the abstract socket/ (cc @corhere @cpuguy83 @thaJeztah) Other alternatives, for future reference:
|
Yeah, it is annoying that we can't predict the Windows handle value at compile-time, unlike the tricks which can be played with POSIX file descriptors. We would have to have separate implementations of anonymous pipe IPC for Windows and Unix, but at least Linux and macOS could share one implementation. One distinct advantage of anonymous pipes over abstract UNIX sockets is their anonymity: there is no name by which unrelated processes could reference it, intentionally or accidentally. However this is not a dealbreaker for abstract sockets at this time as the IPC channel is only being used for lifecycle management. The worst a malicious process could reasonably do is DoS a plugin. I'd suggest keeping the anonymous-pipe design in your back pocket just in case abstract sockets don't work out for some reason. Until then, I completely agree with Keeping It Simple using AF_UNIX. |
While I think an identical implementation would be "great", I would not consider it to be a strict requirement. I know there's been many cases we (and the Go team itself for stdlib funcionality) have tried to provide a platform-agnostic approach, this also has resulted in many situations where trying to shoehorn Windows into Unix semantics resulted in poor choices. While I think it's still preferable if we implement library functions (it's great if we and can provide a platform-agnostic API / signature), we should not limit / force ourselves to do so if that means a more complicated and/or less idiomatic and/or less optimized/stable implementation. In some cases there are significant differences between Windows and Unix ("non-Windows"), and for some of those, it may be better to admit those differences, and choose the best / most stable solution for each. Given that this particular code is an implementation and not a "library package" (or at least: it's an internal package, and does not require a public Go API), I'm not against having platform-specific implementations if that means we can optimize each with platform-specific semantics (potentially making it more stable). I think the goal here is to provide the same / similar experience from a user's perspective; if that requires a different implementation under the hood for each, that's an implementation detail. (All my $0.02c 😅) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! ❤️
Previously, if a long-lived plugin process (such as an execution of `compose up`) was running and then detached from a terminal, signalling the parent CLI process to exit would leave the plugin process behind. To address this, changes were introduced on the CLI side (see: docker/cli#4599) to enable the CLI to notify a running plugin process that it should exit. This makes it so that, when the parent CLI process is going to exit, the command context of the plugin command being executed is cancelled. This commit takes advantage of these changes by tapping into the command context's done channel and using it to teardown on an up. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Previously, if a long-lived plugin process (such as an execution of `compose up`) was running and then detached from a terminal, signalling the parent CLI process to exit would leave the plugin process behind. To address this, changes were introduced on the CLI side (see: docker/cli#4599) to enable the CLI to notify a running plugin process that it should exit. This makes it so that, when the parent CLI process is going to exit, the command context of the plugin command being executed is cancelled. This commit takes advantage of these changes by tapping into the command context's done channel and using it to teardown on an up. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Previously, if a long-lived plugin process (such as an execution of `compose up`) was running and then detached from a terminal, signalling the parent CLI process to exit would leave the plugin process behind. To address this, changes were introduced on the CLI side (see: docker/cli#4599) to enable the CLI to notify a running plugin process that it should exit. This makes it so that, when the parent CLI process is going to exit, the command context of the plugin command being executed is cancelled. This commit takes advantage of these changes by tapping into the command context's done channel and using it to teardown on an up. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Previously, if a long-lived plugin process (such as an execution of `compose up`) was running and then detached from a terminal, signalling the parent CLI process to exit would leave the plugin process behind. To address this, changes were introduced on the CLI side (see: docker/cli#4599) to enable the CLI to notify a running plugin process that it should exit. This makes it so that, when the parent CLI process is going to exit, the command context of the plugin command being executed is cancelled. This commit takes advantage of these changes by tapping into the command context's done channel and using it to teardown on an up. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Previously, if a long-lived plugin process (such as an execution of `compose up`) was running and then detached from a terminal, signalling the parent CLI process to exit would leave the plugin process behind. To address this, changes were introduced on the CLI side (see: docker/cli#4599) to enable the CLI to notify a running plugin process that it should exit. This makes it so that, when the parent CLI process is going to exit, the command context of the plugin command being executed is cancelled. This commit takes advantage of these changes by tapping into the command context's done channel and using it to teardown on an up. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Previously, if a long-lived plugin process (such as an execution of `compose up`) was running and then detached from a terminal, signalling the parent CLI process to exit would leave the plugin process behind. To address this, changes were introduced on the CLI side (see: docker/cli#4599) to enable the CLI to notify a running plugin process that it should exit. This makes it so that, when the parent CLI process is going to exit, the command context of the plugin command being executed is cancelled. This commit takes advantage of these changes by tapping into the command context's done channel and using it to teardown on an up. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Changes were made in docker#4599 to provide a mechanism for the CLI to notify running plugin processes that they should exit, in order to improve the general CLI/plugin UX. The current implementation boils down to: 1. The CLI creates a socket 2. The CLI executes the plugin 3. The plugin connects to the socket 4. (When) the CLI receives a termination signal, it uses the socket to notify the plugin that it should exit 5. The plugin's gets notified via the socket, and cancels it's `cmd.Context`, which then gets handled appropriately This change works in most cases and fixes the issue it sets out to solve (see: docker/compose#11292) however, in the case where the user has a TTY attached and the plugin is not already handling received signals, steps 4+ changes: 4. (When) the CLI receives a termination signal, before it can use the socket to notify the plugin that it should exit, the plugin process also receives a signal due to sharing the pgid with the CLI Since we now have a proper "job control" mechanism, we can simplify the scenarios by executing the plugins with their own process group id, thereby removing the "double notification" issue and making it so that plugins can handle the same whether attached to a TTY or not. In order to make this change "plugin-binary" backwards-compatible, in the case that a plugin does not connect to the socket, the CLI passes the signal to the plugin process. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Changes were made in docker#4599 to provide a mechanism for the CLI to notify running plugin processes that they should exit, in order to improve the general CLI/plugin UX. The current implementation boils down to: 1. The CLI creates a socket 2. The CLI executes the plugin 3. The plugin connects to the socket 4. (When) the CLI receives a termination signal, it uses the socket to notify the plugin that it should exit 5. The plugin's gets notified via the socket, and cancels it's `cmd.Context`, which then gets handled appropriately This change works in most cases and fixes the issue it sets out to solve (see: docker/compose#11292) however, in the case where the user has a TTY attached and the plugin is not already handling received signals, steps 4+ changes: 4. (When) the CLI receives a termination signal, before it can use the socket to notify the plugin that it should exit, the plugin process also receives a signal due to sharing the pgid with the CLI Since we now have a proper "job control" mechanism, we can simplify the scenarios by executing the plugins with their own process group id, thereby removing the "double notification" issue and making it so that plugins can handle the same whether attached to a TTY or not. In order to make this change "plugin-binary" backwards-compatible, in the case that a plugin does not connect to the socket, the CLI passes the signal to the plugin process. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Changes were made in docker#4599 to provide a mechanism for the CLI to notify running plugin processes that they should exit, in order to improve the general CLI/plugin UX. The current implementation boils down to: 1. The CLI creates a socket 2. The CLI executes the plugin 3. The plugin connects to the socket 4. (When) the CLI receives a termination signal, it uses the socket to notify the plugin that it should exit 5. The plugin's gets notified via the socket, and cancels it's `cmd.Context`, which then gets handled appropriately This change works in most cases and fixes the issue it sets out to solve (see: docker/compose#11292) however, in the case where the user has a TTY attached and the plugin is not already handling received signals, steps 4+ changes: 4. (When) the CLI receives a termination signal, before it can use the socket to notify the plugin that it should exit, the plugin process also receives a signal due to sharing the pgid with the CLI Since we now have a proper "job control" mechanism, we can simplify the scenarios by executing the plugins with their own process group id, thereby removing the "double notification" issue and making it so that plugins can handle the same whether attached to a TTY or not. In order to make this change "plugin-binary" backwards-compatible, in the case that a plugin does not connect to the socket, the CLI passes the signal to the plugin process. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
See docker/cli#4599 and docker/cli#4769. When running through the CLI, signal handling + the "3 SIGINTS = exit" behavior is handled by the CLI, and the CLI will signal buildx through the plugin socket to cancel it's context. To deal with the standalone case, this commit introduces `cobrautil.HandleContextCancellation` which checks if the context being executed is already cancellable/plugin is running through the CLI and sets up signal handling if necessary. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
See docker/cli#4599 and docker/cli#4769. When running through the CLI, signal handling + the "3 SIGINTS = exit" behavior is handled by the CLI, and the CLI will signal buildx through the plugin socket to cancel it's context. To deal with the standalone case, this commit introduces `cobrautil.HandleContextCancellation` which checks if the context being executed is already cancellable/plugin is running through the CLI and sets up signal handling if necessary. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
See docker/cli#4599 and docker/cli#4769. Since we switched to using the cobra command context instead of `appcontext`, we need to set up the signal handling that was being provided by `appcontext`, as well as configuring the context with the OTEL tracing utilities also used by `appcontext`. This commit introduces `cobrautil.ConfigureContext` which implements the pre-existing signal handling logic from `appcontext` and cancels the command's context when a signal is received, as well as doing the relevant OTEL config. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
See docker/cli#4599 and docker/cli#4769. Since we switched to using the cobra command context instead of `appcontext`, we need to set up the signal handling that was being provided by `appcontext`, as well as configuring the context with the OTEL tracing utilities also used by `appcontext`. This commit introduces `cobrautil.ConfigureContext` which implements the pre-existing signal handling logic from `appcontext` and cancels the command's context when a signal is received, as well as doing the relevant OTEL config. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
See docker/cli#4599 and docker/cli#4769. Since we switched to using the cobra command context instead of `appcontext`, we need to set up the signal handling that was being provided by `appcontext`, as well as configuring the context with the OTEL tracing utilities also used by `appcontext`. This commit introduces `cobrautil.ConfigureContext` which implements the pre-existing signal handling logic from `appcontext` and cancels the command's context when a signal is received, as well as doing the relevant OTEL config. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
if errors.Is(err, io.EOF) { | ||
cancel() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we missed this during review
Previously, if a long-lived plugin process (such as an execution of `compose up`) was running and then detached from a terminal, signalling the parent CLI process to exit would leave the plugin process behind. To address this, changes were introduced on the CLI side (see: docker/cli#4599) to enable the CLI to notify a running plugin process that it should exit. This makes it so that, when the parent CLI process is going to exit, the command context of the plugin command being executed is cancelled. This commit takes advantage of these changes by tapping into the command context's done channel and using it to teardown on an up. Signed-off-by: Laura Brehm <laurabrehm@hey.com>
- What I did
Previously, long lived CLI plugin processes weren't properly handled (see: #4402) resulting in plugin processes being left behind running, after the CLI process exits.
- How I did it
Change the plugin handling code to open an abstract unix socket before running the plugin and passing it to the plugin process, and changes the signal handling on the CLI side to close this socket which tells the plugin that it should exit.
The 3 SIGINTs/SIGTERMs logic is copied over/kept from
cmd/docker/internal/appcontext
(previouslygithub.com/moby/buildkit/util/appcontext
) (see #4457).This implementation makes use of sockets instead of simply setting PDEATHSIG on the plugin process so that it will work on most platforms (including Windows :'))
- How to verify it
Before:
Screen.Recording.2023-10-12.at.21.49.02.mov
After:
Screen.Recording.2023-10-12.at.22.04.25.mov
Or, build your plugin with the CLI code from my branch :)
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)