-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to use pidfd and epoll to wait init process exit #4517
base: main
Are you sure you want to change the base?
Conversation
Because we should switch to unix.PidFDSendSignal in new kernels, it has been supported in go runtime. We don't need to add fall back to unix.Kill code here. Signed-off-by: lfbzhm <lifubang@acmcoder.com>
bcddc62
to
8126a8d
Compare
@abel-von PTAL |
It seems that the first commit can be merged now and is definitely an improvement. For the rest of it, give me a few days to review. |
libcontainer/container_linux.go
Outdated
return nil | ||
} | ||
|
||
logrus.Debugf("pidfd & epoll failed with an error: %v, fall back to unix.Signal.\n", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need for \n
here.
Reviewing this reminded me the next step needed for pidfd support in Go, so I wrote this proposal: golang/go#70352 |
Wonderful proposal, I used to think that golang wouldn't support similar interfaces, but I think it's very useful, looking forward its coming. |
Signed-off-by: lfbzhm <lifubang@acmcoder.com>
When using unix.Kill to kill the container, we need a for loop to detect the init process exited or not manually, we sleep 100ms each time in the current, but for stopped containers or containers running in a low load machine, we don't need to wait so long time. This change will reduce the delete delay in some situations, especially for those pods with many containers in. Co-authored-by: Abel Feng <fshb1988@gmail.com> Signed-off-by: lfbzhm <lifubang@acmcoder.com>
8126a8d
to
7833912
Compare
|
||
// Kill kills the container and wait the init process exit. | ||
func (c *Container) Kill() error { | ||
if c.config.Namespaces.IsPrivate(configs.NEWPID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might make sense to explain the reason for this "if". Something like "when the container doesn't have a private pidns, we have to kill every process in the cgroup, which killViaPidfd can't do".
return errors.New("container init still running") | ||
} | ||
|
||
// Kill kills the container and wait the init process exit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and waits for the init process to exit.
|
||
events := make([]unix.EpollEvent, 1) | ||
for { | ||
// Set the timeout to 10s, the same as the traditional unix.Signal solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... the same as in kill below
} | ||
// When --force is given, we kill all container processes and | ||
// then destroy the container. This is done even for a stopped | ||
// container, because (in case it does not have its own PID | ||
// namespace) there may be some leftover processes in the | ||
// container's cgroup. | ||
if force { | ||
return killContainer(container) | ||
} | ||
|
||
s, err := container.Status() | ||
if err != nil { | ||
return err | ||
} | ||
switch s { | ||
case libcontainer.Stopped: | ||
// For a stopped container, because (in case it does not have | ||
// its own PID namespace) there may be some leftover processes | ||
// in the container's cgroup. | ||
if !container.Config().Namespaces.IsPrivate(configs.NEWPID) { | ||
return killContainer(container) | ||
} | ||
return container.Destroy() | ||
case libcontainer.Created: | ||
return killContainer(container) | ||
default: | ||
// When --force is given, we kill all container processes and | ||
// then destroy the container. | ||
if force { | ||
return killContainer(container) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, commit description doesn't describe this change.
I also remember breaking something than I moved this code around. Ah! It was commit 29283bb. I hope this change won't result in some sort of regression :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK, you said in the commit message that you don't want to wait in case of a stopped container.
if !container.Config().Namespaces.IsPrivate(configs.NEWPID) { | ||
return killContainer(container) | ||
} | ||
return container.Destroy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe drop killContainer
entirely here and just call container.Kill()
and then container.Destroy()
? The code is more readable this way.
Or, rename killContainer to killAndDestroy.
This PR does some optimizations for
runc delete -f
.unix.PidFDSendSignal
to send signal to the process,this is helpful to reduce the risk of pid reuse attack. So we should replace
unix.Kill
withos.Process.Signal
in runc when possible.os.Process.Wait
is used to wait the child process, to wait a unrelated process, weshould introduce pidfd & epoll to reduce the sleep time when we want to detect the init
process exited or not.
unix.Kill
solution, but for stopped containers or containers running in a low load machine,we don't need to wait 100ms to do the next detection.
Close: #4512