Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli-plugins: terminate plugin when CLI exits #4599

Merged
merged 1 commit into from
Dec 12, 2023

Conversation

laurazard
Copy link
Member

@laurazard laurazard commented Oct 12, 2023

- What I did

Previously, long lived CLI plugin processes weren't properly handled (see: #4402) resulting in plugin processes being left behind running, after the CLI process exits.

- How I did it

Change the plugin handling code to open an abstract unix socket before running the plugin and passing it to the plugin process, and changes the signal handling on the CLI side to close this socket which tells the plugin that it should exit.

The 3 SIGINTs/SIGTERMs logic is copied over/kept from cmd/docker/internal/appcontext (previously github.com/moby/buildkit/util/appcontext) (see #4457).

This implementation makes use of sockets instead of simply setting PDEATHSIG on the plugin process so that it will work on most platforms (including Windows :'))

- How to verify it

Before:

Screen.Recording.2023-10-12.at.21.49.02.mov

After:

Screen.Recording.2023-10-12.at.22.04.25.mov

Or, build your plugin with the CLI code from my branch :)

- Description for the changelog

Fix issue with long lived CLI plugin processes not being terminated after the CLI exits in some cases.

- A picture of a cute animal (not mandatory but encouraged)

20231011_220257

cmd/docker/docker.go Outdated Show resolved Hide resolved
@laurazard
Copy link
Member Author

Have to check some CI failures, a lot of tests calling Compose breaking (not sure if due to parallel tests running into the issue described above or due to other issues with listening on an abstract namespace sockets in CI).

@laurazard laurazard force-pushed the plugin-signal-handling branch from b0326c9 to 034036a Compare October 12, 2023 21:25
@codecov-commenter
Copy link

codecov-commenter commented Oct 12, 2023

Codecov Report

Merging #4599 (5e394d0) into master (a46f850) will decrease coverage by 0.44%.
Report is 162 commits behind head on master.
The diff coverage is 0.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4599      +/-   ##
==========================================
- Coverage   59.71%   59.27%   -0.44%     
==========================================
  Files         288      285       -3     
  Lines       24858    24858              
==========================================
- Hits        14843    14735     -108     
- Misses       9128     9237     +109     
+ Partials      887      886       -1     

cmd/docker/docker.go Outdated Show resolved Hide resolved
pluginCloseSocketAddr := net.UnixAddr{Name: "@docker_cli_" + subcommand, Net: "unix"}
listener, err := net.ListenUnix("unix", &pluginCloseSocketAddr)
if err != nil {
return errors.New("bork")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐩

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to order a bork® T-shirt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And a børk® 🇸🇪 for David)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐶

@laurazard laurazard force-pushed the plugin-signal-handling branch 2 times, most recently from 9a8e3f5 to c9235b4 Compare October 12, 2023 21:52
Copy link
Member

@neersighted neersighted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks really promising, I'm glad that my ramble on solving this using a socket was helpful.

For others, Win32 job control can technically be used to solve this on Windows, but macOS still exists.

We could poll getppid(2), but we'd need a busy loop, spam system calls (causing context switching), and it's full of edge-cases like PR_SET_CHILD_SUBREAPER or the fact that Solaris and Android have different init PIDs in some cases.

Letting the kernel notify us when the parent has either died, or has asked us to exit (solving the non-interactive SIGINT/SIGTERM case) seems best to me.

I'm still not sure if the 'three signal then exit' logic should apply in the non-interactive case, but we do just fine with it in dockerd as @laurazard pointed to me. Similarly, I'm not sure if we should handle SIGINT ourselves when there is a TTY in the loop, but I think this should be good enough as-is, and we can figure out further edge-cases once we get this in and have others kick the tires.

cmd/docker/internal/signals/signals_unix.go Show resolved Hide resolved
cmd/docker/docker.go Outdated Show resolved Hide resolved
@laurazard laurazard force-pushed the plugin-signal-handling branch from c9235b4 to fe94a09 Compare October 12, 2023 23:03
cmd/docker/docker.go Outdated Show resolved Hide resolved
@laurazard laurazard force-pushed the plugin-signal-handling branch 2 times, most recently from ba252a3 to 3da28ad Compare October 12, 2023 23:15
@thaJeztah
Copy link
Member

Failures in e2e;

got a weird error! accept unix @docker_cli_9bd0ef8a-2f24-441f-85d9-fcab0b79d535: use of closed network connection

erm... @neersighted would this be because of ...

@neersighted
Copy link
Member

neersighted commented Oct 12, 2023

No, this is AF_UNIX in the abstract domain (read: not a path, NUL bytes don't delimit the socket name, instead the addrlen defines when the socket name ends), not AF_VSOCK.

I'm not sure why we're getting a net.ErrClosed here; we should get EINVAL if the socket isn't listening.

@laurazard
Copy link
Member Author

I was going to say, it seems to be failing only when running inside of a container. I still want to change that code to fallback to the previous behaviour when we can't establish a connection, so that should be okay after those changes, but I'd like to figure out why it's not working.

@thaJeztah
Copy link
Member

Yeah, had my wires crossed on the AF_VSOCK; ended up on that as I was wondering if there's something different when running in a container 🥹☺️

@neersighted
Copy link
Member

neersighted commented Oct 13, 2023

Not sure what's going on there either, it will take some digging into. containerd's v1 shim API used abstract sockets until containerd/containerd@126b35c, so it should work in the general case DinD...

@laurazard laurazard force-pushed the plugin-signal-handling branch 2 times, most recently from 5147aa9 to eb6367a Compare October 23, 2023 12:46
@thaJeztah thaJeztah added this to the 25.0.0 milestone Oct 23, 2023
@laurazard laurazard self-assigned this Oct 24, 2023
@laurazard
Copy link
Member Author

Adding this here as it seems like the most relevant place:

At the last maintainers call, it was suggested that a more elegant solution to this could be built, relying on another OS primitive – unnamed pipes – for communication between the CLI and the plugin process. In this implementation, instead of generating a random abstract socket name/passing it as an environment variable, we'd create an unnamed pipe on the side of the CLI process, and set the FD 3 (or another non-0/1/2 FD) of the plugin process as that pipe, which could then use it to be alerted to exit.


Note

This is my current understanding of the issue, which is definitely incomplete as my knowledge of Windows internals is extremely limited 😅

This would be a great solution, but in practice it falls short as Go does not have built-in support for inheriting file descriptors on Windows. In truth, "file descriptors" aren't a concept on Windows, and although this should be possible to do using file handles, there's a lot of complexity regarding how to pass the desired file handle to the child process.

While, on Unixy platforms, we can do:

reader, writer, err :=  os.Pipe()
if err == nil {
    defer writer.Close()
    plugincmd.ExtraFiles = append(plugincmd.ExtraFiles, reader)
}

and then simply access FD 3 on the child process, a Windows implementation would have to look like:

reader, writer, err :=  os.Pipe()
if err == nil {
    defer writer.Close()
    plugincmd.SysProcAttr = &syscall.SysProcAttr{AdditionalInheritedHandles: []syscall.Handle{syscall.Handle(reader.Fd())}}
}

and we would still have to pass the file handle as an environment variable/some other way so that the child process knows where to access it, as we can't just "access FD 3".

This is because cmd.ExtraFiles is not supported on Windows – more on this issue in golang/go#21085 (comment).

(For the more curious, check out https://go-review.googlesource.com/c/go/+/288299, the Go CL where os.Pipes were made inheritable on Windows, and how the feature is tested – it involves passing the pipe handle as an argument when running the child command)

We could implement this feature that way (more likely, passing the handle as an environment variable), but then we end up on the same different implementation for each platform place that we're trying to avoid.

I think, taking this into account, the abstract socket/AF_UNIX implementation is preferable.

(cc @corhere @cpuguy83 @thaJeztah)


Other alternatives, for future reference:

  • Named pipes suffer from much the same issue, as we'd have to write our own implementation/rely on go-winio for them on Windows
  • PDEATHSIG works on Unix, but not Windows, and the same/similar behaviour can be obtained on Windows with Job Objects, but then we still need a solution for macOS.

@corhere
Copy link
Contributor

corhere commented Oct 30, 2023

and we would still have to pass the file handle as an environment variable/some other way so that the child process knows where to access it, as we can't just "access FD 3".

Yeah, it is annoying that we can't predict the Windows handle value at compile-time, unlike the tricks which can be played with POSIX file descriptors. We would have to have separate implementations of anonymous pipe IPC for Windows and Unix, but at least Linux and macOS could share one implementation.

One distinct advantage of anonymous pipes over abstract UNIX sockets is their anonymity: there is no name by which unrelated processes could reference it, intentionally or accidentally. However this is not a dealbreaker for abstract sockets at this time as the IPC channel is only being used for lifecycle management. The worst a malicious process could reasonably do is DoS a plugin. I'd suggest keeping the anonymous-pipe design in your back pocket just in case abstract sockets don't work out for some reason. Until then, I completely agree with Keeping It Simple using AF_UNIX.

cli-plugins/plugin/plugin.go Outdated Show resolved Hide resolved
cmd/docker/docker.go Outdated Show resolved Hide resolved
cmd/docker/docker.go Outdated Show resolved Hide resolved
cli-plugins/plugin/plugin.go Outdated Show resolved Hide resolved
cli-plugins/plugin/plugin.go Outdated Show resolved Hide resolved
@thaJeztah
Copy link
Member

We could implement this feature that way (more likely, passing the handle as an environment variable), but then we end up on the same different implementation for each platform place that we're trying to avoid.

Let me interject for a moment. What you guys are referring to as Linux .. (edit: sorry, damn you, autocomplete)

While I think an identical implementation would be "great", I would not consider it to be a strict requirement.

I know there's been many cases we (and the Go team itself for stdlib funcionality) have tried to provide a platform-agnostic approach, this also has resulted in many situations where trying to shoehorn Windows into Unix semantics resulted in poor choices.

While I think it's still preferable if we implement library functions (it's great if we and can provide a platform-agnostic API / signature), we should not limit / force ourselves to do so if that means a more complicated and/or less idiomatic and/or less optimized/stable implementation.

In some cases there are significant differences between Windows and Unix ("non-Windows"), and for some of those, it may be better to admit those differences, and choose the best / most stable solution for each.

Given that this particular code is an implementation and not a "library package" (or at least: it's an internal package, and does not require a public Go API), I'm not against having platform-specific implementations if that means we can optimize each with platform-specific semantics (potentially making it more stable).

I think the goal here is to provide the same / similar experience from a user's perspective; if that requires a different implementation under the hood for each, that's an implementation detail.

(All my $0.02c 😅)

cmd/docker/docker.go Outdated Show resolved Hide resolved
cmd/docker/docker.go Outdated Show resolved Hide resolved
cmd/docker/docker.go Outdated Show resolved Hide resolved
cli-plugins/plugin/plugin.go Outdated Show resolved Hide resolved
cmd/docker/docker.go Outdated Show resolved Hide resolved
cli-plugins/plugin/plugin.go Outdated Show resolved Hide resolved
cli-plugins/plugin/plugin.go Outdated Show resolved Hide resolved
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ❤️

@thaJeztah thaJeztah merged commit 7d92573 into docker:master Dec 12, 2023
76 checks passed
@neersighted neersighted deleted the plugin-signal-handling branch December 12, 2023 13:58
laurazard added a commit to laurazard/compose that referenced this pull request Dec 21, 2023
Previously, if a long-lived plugin process (such as
an execution of `compose up`) was running and then
detached from a terminal, signalling the parent CLI
process to exit would leave the plugin process behind.

To address this, changes were introduced on the CLI side
(see: docker/cli#4599) to enable
the CLI to notify a running plugin process that it should
exit. This makes it so that, when the parent CLI process
is going to exit, the command context of the plugin
command being executed is cancelled.

This commit takes advantage of these changes by tapping into
the command context's done channel and using it to teardown
on an up.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
laurazard added a commit to laurazard/compose that referenced this pull request Dec 21, 2023
Previously, if a long-lived plugin process (such as
an execution of `compose up`) was running and then
detached from a terminal, signalling the parent CLI
process to exit would leave the plugin process behind.

To address this, changes were introduced on the CLI side
(see: docker/cli#4599) to enable
the CLI to notify a running plugin process that it should
exit. This makes it so that, when the parent CLI process
is going to exit, the command context of the plugin
command being executed is cancelled.

This commit takes advantage of these changes by tapping into
the command context's done channel and using it to teardown
on an up.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
laurazard added a commit to laurazard/compose that referenced this pull request Dec 21, 2023
Previously, if a long-lived plugin process (such as
an execution of `compose up`) was running and then
detached from a terminal, signalling the parent CLI
process to exit would leave the plugin process behind.

To address this, changes were introduced on the CLI side
(see: docker/cli#4599) to enable
the CLI to notify a running plugin process that it should
exit. This makes it so that, when the parent CLI process
is going to exit, the command context of the plugin
command being executed is cancelled.

This commit takes advantage of these changes by tapping into
the command context's done channel and using it to teardown
on an up.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
laurazard added a commit to laurazard/compose that referenced this pull request Dec 21, 2023
Previously, if a long-lived plugin process (such as
an execution of `compose up`) was running and then
detached from a terminal, signalling the parent CLI
process to exit would leave the plugin process behind.

To address this, changes were introduced on the CLI side
(see: docker/cli#4599) to enable
the CLI to notify a running plugin process that it should
exit. This makes it so that, when the parent CLI process
is going to exit, the command context of the plugin
command being executed is cancelled.

This commit takes advantage of these changes by tapping into
the command context's done channel and using it to teardown
on an up.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
laurazard added a commit to laurazard/compose that referenced this pull request Dec 22, 2023
Previously, if a long-lived plugin process (such as
an execution of `compose up`) was running and then
detached from a terminal, signalling the parent CLI
process to exit would leave the plugin process behind.

To address this, changes were introduced on the CLI side
(see: docker/cli#4599) to enable
the CLI to notify a running plugin process that it should
exit. This makes it so that, when the parent CLI process
is going to exit, the command context of the plugin
command being executed is cancelled.

This commit takes advantage of these changes by tapping into
the command context's done channel and using it to teardown
on an up.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
laurazard added a commit to laurazard/compose that referenced this pull request Dec 23, 2023
Previously, if a long-lived plugin process (such as
an execution of `compose up`) was running and then
detached from a terminal, signalling the parent CLI
process to exit would leave the plugin process behind.

To address this, changes were introduced on the CLI side
(see: docker/cli#4599) to enable
the CLI to notify a running plugin process that it should
exit. This makes it so that, when the parent CLI process
is going to exit, the command context of the plugin
command being executed is cancelled.

This commit takes advantage of these changes by tapping into
the command context's done channel and using it to teardown
on an up.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
@thaJeztah thaJeztah mentioned this pull request Jan 4, 2024
1 task
laurazard added a commit to laurazard/cli that referenced this pull request Jan 9, 2024
Changes were made in docker#4599 to provide
a mechanism for the CLI to notify running plugin processes that they
should exit, in order to improve the general CLI/plugin UX. The current
implementation boils down to:
1. The CLI creates a socket
2. The CLI executes the plugin
3. The plugin connects to the socket
4. (When) the CLI receives a termination signal, it uses the socket to
   notify the plugin that it should exit
5. The plugin's gets notified via the socket, and cancels it's `cmd.Context`,
   which then gets handled appropriately

This change works in most cases and fixes the issue it sets out to solve
(see: docker/compose#11292) however, in the case
where the user has a TTY attached and the plugin is not already handling
received signals, steps 4+ changes:
4. (When) the CLI receives a termination signal, before it can use the
   socket to notify the plugin that it should exit, the plugin process
   also receives a signal due to sharing the pgid with the CLI

Since we now have a proper "job control" mechanism, we can simplify the
scenarios by executing the plugins with their own process group id,
thereby removing the "double notification" issue and making it so that
plugins can handle the same whether attached to a TTY or not.

In order to make this change "plugin-binary" backwards-compatible, in
the case that a plugin does not connect to the socket, the CLI passes
the signal to the plugin process.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
laurazard added a commit to laurazard/cli that referenced this pull request Jan 10, 2024
Changes were made in docker#4599 to provide
a mechanism for the CLI to notify running plugin processes that they
should exit, in order to improve the general CLI/plugin UX. The current
implementation boils down to:
1. The CLI creates a socket
2. The CLI executes the plugin
3. The plugin connects to the socket
4. (When) the CLI receives a termination signal, it uses the socket to
   notify the plugin that it should exit
5. The plugin's gets notified via the socket, and cancels it's `cmd.Context`,
   which then gets handled appropriately

This change works in most cases and fixes the issue it sets out to solve
(see: docker/compose#11292) however, in the case
where the user has a TTY attached and the plugin is not already handling
received signals, steps 4+ changes:
4. (When) the CLI receives a termination signal, before it can use the
   socket to notify the plugin that it should exit, the plugin process
   also receives a signal due to sharing the pgid with the CLI

Since we now have a proper "job control" mechanism, we can simplify the
scenarios by executing the plugins with their own process group id,
thereby removing the "double notification" issue and making it so that
plugins can handle the same whether attached to a TTY or not.

In order to make this change "plugin-binary" backwards-compatible, in
the case that a plugin does not connect to the socket, the CLI passes
the signal to the plugin process.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
laurazard added a commit to laurazard/cli that referenced this pull request Jan 10, 2024
Changes were made in docker#4599 to provide
a mechanism for the CLI to notify running plugin processes that they
should exit, in order to improve the general CLI/plugin UX. The current
implementation boils down to:
1. The CLI creates a socket
2. The CLI executes the plugin
3. The plugin connects to the socket
4. (When) the CLI receives a termination signal, it uses the socket to
   notify the plugin that it should exit
5. The plugin's gets notified via the socket, and cancels it's `cmd.Context`,
   which then gets handled appropriately

This change works in most cases and fixes the issue it sets out to solve
(see: docker/compose#11292) however, in the case
where the user has a TTY attached and the plugin is not already handling
received signals, steps 4+ changes:
4. (When) the CLI receives a termination signal, before it can use the
   socket to notify the plugin that it should exit, the plugin process
   also receives a signal due to sharing the pgid with the CLI

Since we now have a proper "job control" mechanism, we can simplify the
scenarios by executing the plugins with their own process group id,
thereby removing the "double notification" issue and making it so that
plugins can handle the same whether attached to a TTY or not.

In order to make this change "plugin-binary" backwards-compatible, in
the case that a plugin does not connect to the socket, the CLI passes
the signal to the plugin process.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
laurazard added a commit to laurazard/buildx that referenced this pull request Jan 10, 2024
See docker/cli#4599 and
docker/cli#4769.

When running through the CLI, signal handling + the "3
SIGINTS = exit" behavior is handled by the CLI, and the
CLI will signal buildx through the plugin socket to cancel
it's context.

To deal with the standalone case, this commit introduces
`cobrautil.HandleContextCancellation` which checks if the
context being executed is already cancellable/plugin is
running through the CLI and sets up signal handling if
necessary.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
laurazard added a commit to laurazard/buildx that referenced this pull request Jan 11, 2024
See docker/cli#4599 and
docker/cli#4769.

When running through the CLI, signal handling + the "3
SIGINTS = exit" behavior is handled by the CLI, and the
CLI will signal buildx through the plugin socket to cancel
it's context.

To deal with the standalone case, this commit introduces
`cobrautil.HandleContextCancellation` which checks if the
context being executed is already cancellable/plugin is
running through the CLI and sets up signal handling if
necessary.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
laurazard added a commit to laurazard/buildx that referenced this pull request Jan 17, 2024
See docker/cli#4599 and
docker/cli#4769.

Since we switched to using the cobra command context instead of
`appcontext`, we need to set up the signal handling that was being
provided by `appcontext`, as well as configuring the context with
the OTEL tracing utilities also used by `appcontext`.

This commit introduces `cobrautil.ConfigureContext` which implements
the pre-existing signal handling logic from `appcontext` and cancels
the command's context when a signal is received, as well as doing
the relevant OTEL config.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
laurazard added a commit to laurazard/buildx that referenced this pull request Jan 17, 2024
See docker/cli#4599 and
docker/cli#4769.

Since we switched to using the cobra command context instead of
`appcontext`, we need to set up the signal handling that was being
provided by `appcontext`, as well as configuring the context with
the OTEL tracing utilities also used by `appcontext`.

This commit introduces `cobrautil.ConfigureContext` which implements
the pre-existing signal handling logic from `appcontext` and cancels
the command's context when a signal is received, as well as doing
the relevant OTEL config.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
tonistiigi pushed a commit to laurazard/buildx that referenced this pull request Jan 20, 2024
See docker/cli#4599 and
docker/cli#4769.

Since we switched to using the cobra command context instead of
`appcontext`, we need to set up the signal handling that was being
provided by `appcontext`, as well as configuring the context with
the OTEL tracing utilities also used by `appcontext`.

This commit introduces `cobrautil.ConfigureContext` which implements
the pre-existing signal handling logic from `appcontext` and cancels
the command's context when a signal is received, as well as doing
the relevant OTEL config.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Comment on lines +61 to +63
if errors.Is(err, io.EOF) {
cancel()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we missed this during review ☺️ 🙈 ;

temenuzhka-thede pushed a commit to temenuzhka-thede/compose that referenced this pull request Sep 17, 2024
Previously, if a long-lived plugin process (such as
an execution of `compose up`) was running and then
detached from a terminal, signalling the parent CLI
process to exit would leave the plugin process behind.

To address this, changes were introduced on the CLI side
(see: docker/cli#4599) to enable
the CLI to notify a running plugin process that it should
exit. This makes it so that, when the parent CLI process
is going to exit, the command context of the plugin
command being executed is cancelled.

This commit takes advantage of these changes by tapping into
the command context's done channel and using it to teardown
on an up.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants