Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout while waiting for StartTransinetUnit completion signal #1754

Merged
merged 1 commit into from
Mar 8, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion libcontainer/cgroups/systemd/apply_systemd.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ import (
"github.com/opencontainers/runc/libcontainer/cgroups"
"github.com/opencontainers/runc/libcontainer/cgroups/fs"
"github.com/opencontainers/runc/libcontainer/configs"
"github.com/sirupsen/logrus"
)

type Manager struct {
Expand Down Expand Up @@ -300,7 +301,11 @@ func (m *Manager) Apply(pid int) error {
return err
}

<-statusChan
select {
case <-statusChan:
case <-time.After(time.Second):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have some sort of warning at least. I would also recommend investigating why we don't get the status from systemd. In the issue you said:

Not sure how dbus works from within the containerized env or how we might fix this.

I'm not sure what containerised environment you're using, but dbus should be running as a daemon inside whatever container you're spawning runc inside as it normally would on your host. Can you reproduce the issue using runc on the host?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyphar added warning.

what containerised environment ...

It occured in containerized OpenShift. I am trying to reproduce it on my local machine. Will update on that further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the way this has worked in the past is that the origin node container runs as privileged, which implies a shared IPC and PID namespace with the host such that the containerized node can get dbus signals from the host dbus.

logrus.Warnf("Timed out while waiting for StartTransientUnit completion signal from dbus. Continuing...")
}

if err := joinCgroups(c, pid); err != nil {
return err
Expand Down