Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Rootless Docker #1727

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions images/base/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ RUN echo "Ensuring scripts are executable ..." \
libseccomp2 pigz \
bash ca-certificates curl rsync \
nfs-common \
jq \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how big is this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about 1MB including deps
https://packages.ubuntu.com/focal/jq

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, paying 1MB for rootless seems worthwhile :-)

&& find /lib/systemd/system/sysinit.target.wants/ -name "systemd-tmpfiles-setup.service" -delete \
&& rm -f /lib/systemd/system/multi-user.target.wants/* \
&& rm -f /etc/systemd/system/*.wants/* \
Expand Down
12 changes: 12 additions & 0 deletions images/base/files/usr/local/bin/entrypoint
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,11 @@ fix_mount() {
# https://systemd.io/CONTAINER_INTERFACE/
# however, we need other things from `docker run --privileged` ...
# and this flag also happens to make /sys rw, amongst other things
#
# EACCES on rootless is negligible.
set +o errexit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you already detect if we're in rootless or not below, instead detect that early on and save it, and switch on it here?
toggling errexit in scripts leads to bugs, it has unintuitive behavior.

mount -o remount,ro /sys
set -o errexit

echo 'INFO: making mounts shared' >&2
# for mount propagation
Expand Down Expand Up @@ -232,6 +236,13 @@ enable_network_magic(){
fi
}

select_containerd_config_toml() {
if ! egrep -q "0[[:space:]]+0[[:space:]]+4294967295" /proc/1/uid_map; then
echo "INFO: Detected rootless provider. Overriding /etc/containerd/config.toml with /etc/containerd/config-rootless.toml" >&2
cp -f /etc/containerd/config-rootless.toml /etc/containerd/config.toml
fi
}

# run pre-init fixups
fix_kmsg
fix_mount
Expand All @@ -242,6 +253,7 @@ fix_product_uuid
configure_proxy
select_iptables
enable_network_magic
select_containerd_config_toml

# we want the command (expected to be systemd) to be PID1, so exec to it
exec "$@"
40 changes: 40 additions & 0 deletions images/base/files/usr/local/bin/ociwrapper
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/bin/bash
#
# A wrapper script to remove .linux.resources.devices, which are meaningless in userns.
# Needs jq.
#
# Workaround until we get proper fixes in containerd and runc
set -eu -o pipefail
RUNTIME="runc"

if egrep -q "0[[:space:]]+0[[:space:]]+4294967295" /proc/self/uid_map; then
# we are not in userns, no need to patch the config
exec $RUNTIME "$@"
exit $?
fi

bundle="."
bundle_flag=""
# FIXME: support `--bundle=STRING` as well
for f in $@; do
if [[ -n $bundle_flag ]]; then
bundle=$f
break
else
case $f in
-b | --bundle)
bundle_flag=$f
;;
esac
fi
done

if [ -f $bundle/config.json ]; then
q="del(.linux.resources.devices) | del(.linux.devices)"
tmp=$(mktemp -d ociwrapper.XXXXXXXX)
jq "$q" <$bundle/config.json >$tmp/config.json
mv $tmp/config.json $bundle/config.json
rm -rf $tmp
fi

exec $RUNTIME "$@"
15 changes: 14 additions & 1 deletion pkg/build/nodeimage/build_impl.go
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,8 @@ func (c *buildContext) buildImage(dir string) error {
return errors.New("failed to find imported pause image")
}
containerdConfig, err := getContainerdConfig(containerdConfigTemplateData{
SandboxImage: pauseImage,
SandboxImage: pauseImage,
DefaultRuntimeName: "runc",
})
if err != nil {
return err
Expand All @@ -196,6 +197,18 @@ func (c *buildContext) buildImage(dir string) error {
if err := createFile(cmder, containerdConfigPath, containerdConfig); err != nil {
return err
}
containerdRootlessConfig, err := getContainerdConfig(containerdConfigTemplateData{
Copy link
Member

@BenTheElder BenTheElder Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to do this without building a special node-image.
the entrypoint can rewrite this at runtime instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we edit TOML in the entrypoint? Is sed robust enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be, since the entrypoint is tied to the config, and at this point user patches have not yet been applied, so we know what the config looks like.

we can sed on default_runtime_name =.* right?

SandboxImage: pauseImage,
DefaultRuntimeName: "ociwrapper",
RestrictOOMScoreAdj: true,
})
if err != nil {
return err
}
const containerdRootlessConfigPath = "/etc/containerd/config-rootless.toml"
if err := createFile(cmder, containerdRootlessConfigPath, containerdRootlessConfig); err != nil {
return err
}

// Save the image changes to a new image
cmd := exec.Command(
Expand Down
13 changes: 11 additions & 2 deletions pkg/build/nodeimage/containerd_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,18 +24,25 @@ import (
)

type containerdConfigTemplateData struct {
SandboxImage string
SandboxImage string
DefaultRuntimeName string
RestrictOOMScoreAdj bool
}

const containerdConfigTemplate = `# explicitly use v2 config format
version = 2

# set default runtime handler to v2, which has a per-pod shim
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
default_runtime_name = "{{.DefaultRuntimeName}}"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.ociwrapper]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.ociwrapper.options]
BinaryName = "ociwrapper"

# Setup a runtime with the magic name ("test-handler") used for Kubernetes
# runtime class tests ...
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.test-handler]
Expand All @@ -49,6 +56,8 @@ version = 2
# allow hugepages controller to be missing
# see https://github.com/containerd/cri/pull/1501
tolerate_missing_hugepages_controller = true
# restrict_oom_score_adj is required if we are running in UserNS (i.e. Rootless Docker/Podman),
restrict_oom_score_adj = {{.RestrictOOMScoreAdj}}
`

func getContainerdConfig(data containerdConfigTemplateData) (string, error) {
Expand Down
4 changes: 1 addition & 3 deletions pkg/cluster/internal/providers/podman/provider.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,8 @@ func (p *Provider) Provision(status *cli.Status, cfg *config.Cluster) (err error
return err
}

// kind doesn't work with podman rootless, surface an error
if os.Geteuid() != 0 {
p.logger.Errorf("podman provider does not work properly in rootless mode")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should stop failing until this works, actually, this was previous state but it was confusing for users

os.Exit(1)
p.logger.Warn("support for rootless mode is experimental, some features may not work")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR body suggests that it doesn't work, if that's the case then this new message seems misleading.

}

// TODO: validate cfg
Expand Down