Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deps: upgrade opencontainer/runc to 1.2 #25138

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Conversation

pkazmierczak
Copy link
Contributor

No description provided.

// note that os.Args[0] refers to the executor shim typically
// and first args arguments is ignored now due
// until https://github.com/opencontainers/runc/pull/1888 is merged
libcontainer.InitArgs(os.Args[0], "libcontainer-shim"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we dropping the shim argument in the process? I think we need that to trigger the init() function in drivers/shared/executor/libcontainer_nsenter_linux.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fixing this, sorry got distracted by other stuff today :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah. apparently we can't InitArgs anymore :/
opencontainers/runc@630c0d7

I will admit that I'm not understanding the context here, the libcontainer-shim intended behavior is a mystery to me here. Is this perhaps something we can obsolete/remove? Otherwise the best course of action would be to stay at runc 1.1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Nomad client is running with as many "M-threads" as there are cores on the host because that's the default behavior of the Go runtime. But we don't want that to be the case with the executor process we create, to reduce resource overhead. So we inject a shim between to change the gomaxprocs and pin to the main thread for correctness in signal handling.

Even if the "init args" is broken, there's probably still a place for us to hook in somewhere here, as runc is basically just a giant pile of these shims that get called in order (that's what we're attempting to hook in #20017 for example). We probably can't leave runc at 1.1 for very long, as it's a security-sensitive bit of our stack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a little more context ahead of our sidebar discussion, here's what the relevant parts of my process tree looks like if I run a job with the exec task driver:

  13722 \_ /usr/local/bin/nomad agent -dev -dev-connect -config /etc/nomad.d/dev.hcl
  14482     \_ /usr/local/bin/nomad logmon
  14490     \_ /usr/local/bin/nomad executor {"LogFile":"/var/nomad/data/alloc/f801be2f
  14500         \_ /bin/busybox httpd -vv -f -p 8001 -h /local

But if I use execsnoop (I'm using a slightly hacked version) I can see that we actually end up exec'ing into a series of processes from the executor:

root@nomad0# ./bin/execsnoop.bt
Attaching 3 probes...
TIME            PID     PPID    ARGS
16:55:35.778380 14482   13732   /usr/local/bin/nomad logmon
16:55:36.085446 14490   13731   /usr/local/bin/nomad executor {"LogFile":"/var/nomad/data/alloc/5c03bc61-6bda-56ed-18c4-c2983a10b637/httpd/executor.out","LogLevel":"debug","FSIsolation":true,"Compute":{"tc":2000,"nc":1}}
16:55:36.349924 14498   14495   /usr/local/bin/nomad libcontainer-shim
16:55:36.368309 14498   14495   /usr/local/bin/nomad libcontainer-shim
16:55:36.570185 14500   14495   /bin/busybox httpd -vv -f -p 8001 -h /local

Looking at that, I'm a lot less sure about what the shim is actually doing for us here. Let's dig into this tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From our sidebar discussion: it looks like the only thing the libcontainer-shim does for us today is reconfigures the logging we get from libcontainer. But it also looks like that the logging environment in the init shim has improved in libcontainer quite a bit since this was written. So @pkazmierczak is going to test out removing the shim and verifying we get some reasonable logs out from libcontainer failures.

@pkazmierczak pkazmierczak marked this pull request as draft February 18, 2025 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants