Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

Document (and/or debug) launching rkt in rkt #2158

Closed
joeatwork opened this issue Feb 8, 2016 · 34 comments
Closed

Document (and/or debug) launching rkt in rkt #2158

joeatwork opened this issue Feb 8, 2016 · 34 comments

Comments

@joeatwork
Copy link

(Posted at the request of @alban ): There are several use cases for launching rkt containers inside of other rkt containers. While this is possible, there are potential pitfalls and configuration requirements for doing this successfully. It would be great if those pitfalls and requirements were documented, or if running rkt in rkt was explicitly supported or not supported by the project.

@alban
Copy link
Member

alban commented Feb 22, 2016

Just tested it:

I start the outer rkt with:

sudo rkt --dns=8.8.8.8 --insecure-options=image  run --interactive docker://debian

Then, I start the inner rkt with:

apt-get update
apt-get install wget
wget https://github.com/coreos/rkt/releases/download/v1.0.0/rkt-v1.0.0.tar.gz
tar xf rkt-v1.0.0.tar.gz
cd rkt-v1.0.0
./rkt run --interactive --insecure-options=image --no-overlay --net=host docker://busybox

It works with the following limitations:

@iaguis iaguis modified the milestones: v1.2.0, v1.1.0 Feb 25, 2016
@iaguis iaguis modified the milestones: v1.4.0, v1.2.0 Mar 18, 2016
@jonboulle jonboulle modified the milestones: v1.5.0, v1.4.0 Apr 13, 2016
@s-urbaniak s-urbaniak modified the milestones: v1.6.0, v1.5.0 Apr 28, 2016
@s-urbaniak
Copy link
Contributor

didn't make it due to OCI/Fest activity

@s-urbaniak s-urbaniak modified the milestones: v1.7.0, v1.6.0 May 12, 2016
@jonboulle
Copy link
Contributor

from #2167:

Sooner than later we'll run kubernetes's kubelet with stage1-fly, which will itself spawn rkt pods. Here are my first thoughts of what we need to investigate, mainly which paths to mount into the stage1-fly environment.

Eventually, the kubelet will spawn rkt pods, and I think we need a couple things for rkt to work properly in rkt fly. Here are some thoughts.
We need /etc/rkt for various rkt and CNI configurations. We have possible conflicts with CNI if the kubelet and the host both execute rkt, so we need at least /var/lib/cni for the host-local allocator.
Then there might be issues with GCing. If /var/lib/rkt will not be mounted the pods started by the kubelet are not going to be GCd by the host's GC timer.
Note that it won't be mounted recursively and will not have overlayfs mounts etc. from the host's pods but that, but I'm not 100% sure about the consequences this will have if rkt GC would be called by the kubelet and only find the empty pod directories. @jonboulle @vcaputo maybe one of you can foresee this?
Where will the rkt API service be running? This is going to be tricky if the kubelet and the host both run pods and don't share resources examined by the API service. @yifan-gu

@alban
Copy link
Member

alban commented May 23, 2016

Moving to next milestone

@alban
Copy link
Member

alban commented May 24, 2016

If we mount /sys without the cgroup fs in /sys/fs/cgroup with #2680, we will need to document how to mount the cgroup fs in this issue.

@jonboulle jonboulle mentioned this issue May 31, 2016
@s-urbaniak
Copy link
Contributor

moving to next milestone, didn't make it due to other activities

@s-urbaniak s-urbaniak modified the milestones: v1.9.0, v1.8.0 Jun 9, 2016
@tmrts tmrts added this to the v1.10.0 milestone Jun 23, 2016
@lucab lucab removed this from the v1.16.0 milestone Sep 29, 2016
@s-urbaniak
Copy link
Contributor

The outstanding issue (having /proc/sys read-write) is tracked in #3245

@s-urbaniak
Copy link
Contributor

#3245 is still WIP, hence bumping.

@s-urbaniak s-urbaniak modified the milestones: v1.19.0, v1.18.0, v1.20.0 Oct 27, 2016
@s-urbaniak
Copy link
Contributor

rkt-in-rkt basically works using #3389, and systemd/systemd#4395 when invoked as follows:

$ sudo rkt run \
  --volume rkt,kind=host,source=/usr/bin/rkt \
  --volume stage1,kind=host,source=/usr/lib/rkt/stage1-images/stage1-coreos.aci \
  --volume etc-rkt,kind=host,source=/etc/rkt \
  --mount volume=rkt,target=/usr/bin/rkt \
  --mount volume=stage1,target=/usr/lib/rkt/stage1-images/stage1-coreos.aci \
  --mount volume=etc-rkt,target=/etc/rkt \
  --insecure-options=all-run,image \
  docker://progrium/busybox \
  --net=host \
  --dns=8.8.8.8 \
  --interactive \
  --exec=/bin/sh

/ # opkg-install --force-depends iptables ca-certificates
/ # rkt run --no-overlay --dns=8.8.8.8 --insecure-options=image --interactive docker://progrium/busybox --exec /bin/sh
Downloading sha256:dfda3e01f2b [=============================]   243 KB / 243 KB
Downloading sha256:00cf8b9f3d2 [=============================]     220 B / 220 B
Downloading sha256:3aaade50789 [=============================]     247 B / 247 B
Downloading sha256:7ff999a2256 [=============================] 2.53 KB / 2.53 KB
Downloading sha256:a3ed95caeb0 [=============================]       32 B / 32 B
Downloading sha256:a3ed95caeb0 [=============================]       32 B / 32 B
Downloading sha256:77c6c00e8b6 [=============================] 2.17 MB / 2.17 MB
Downloading sha256:d2ba336f2e4 [=============================] 33.7 KB / 33.7 KB
run: group "rkt" not found, will use default gid when rendering images
stage1: warning: error setting journal ACLs, you'll need root to read the pod journal: group "rkt" not found
/ # ping www.google.de
PING www.google.de (80.149.20.123): 56 data bytes
64 bytes from 80.149.20.123: seq=0 ttl=59 time=29.820 ms

It reveals the following process tree:

bash
 \_ sudo ./build-rkt-1.19.0+git/target/bin/rkt run --debug --stage1-path=/home/sur/src
     \_ stage1/rootfs/lib64/ld-linux-x86-64.so.2 stage1/rootfs/usr/bin/systemd-nspawn 
         \_ /usr/lib/systemd/systemd --default-standard-output=tty
             \_ /usr/lib/systemd/systemd-journald
             \_ /bin/sh
                 \_ stage1/rootfs/lib64/ld-linux-x86-64.so.2 stage1/rootfs/usr/bin/sys
                     \_ /usr/lib/systemd/systemd --default-standard-output=tty --log-t
                         \_ /usr/lib/systemd/systemd-journald
                         \_ /bin/sh

Overlay in the inner container doesn't work though (needs investigation).

@s-urbaniak
Copy link
Contributor

The proc, and sys mounts in the container now are as follows:

/ # /bin/mount | grep -e 'on \/sys \|on \/proc'
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /proc/sys/kernel/random/boot_id type tmpfs (ro,nosuid,nodev,mode=755)
tmpfs on /proc/kmsg type tmpfs (rw,nosuid,nodev,mode=755)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)

@s-urbaniak
Copy link
Contributor

s-urbaniak commented Nov 20, 2016

When running the container with overlay (aka overlayfs-over-overlayfs), I get the following error which needs investigation:

$ sudo rkt run \
  --volume rkt,kind=host,source=/usr/bin/rkt \
  --volume stage1,kind=host,source=/usr/lib/rkt/stage1-images/stage1-coreos.aci \
  --volume etc-rkt,kind=host,source=/etc/rkt \
  --mount volume=rkt,target=/usr/bin/rkt \
  --mount volume=stage1,target=/usr/lib/rkt/stage1-images/stage1-coreos.aci \
  --mount volume=etc-rkt,target=/etc/rkt \
  --insecure-options=all-run,image \
  docker://progrium/busybox \
  --net=host \
  --dns=8.8.8.8 \
  --interactive \
  --exec=/bin/sh

/ # opkg-install --force-depends iptables ca-certificates
/ # rkt run --debug --dns=8.8.8.8 --insecure-options=image --interactive docker://progrium/busybox --exec /bin/sh
image: using image from file /usr/bin/stage1-src.aci
image: using image from local store for url docker://progrium/busybox
stage0: Preparing stage1
stage0: Writing image manifest
stage0: Loading image sha512-cb2b26ae3f8bbf71c1380b6a2b6f599dcfce023722812e89e6f6bfb1553b90f4
stage0: Writing image manifest
stage0: Writing pod manifest
run: group "rkt" not found, will use default gid when rendering images
stage0: Setting up stage1
stage0: error setting up stage1
  └─error rendering overlay filesystem
    └─problem mounting overlay filesystem
      └─error mounting overlay with options 'lowerdir=/var/lib/rkt/cas/tree/deps-sha512-c4824b4fb40d4c2b1b5f44e24d2f89adb0a0e01d9ee542255f7fee455734c22a/rootfs,upperdir=/var/lib/rkt/pods/run/b50f6362-600a-400b-b71e-1149fcb0b39b/overlay/deps-sha512-c4824b4fb40d4c2b1b5f44e24d2f89adb0a0e01d9ee542255f7fee455734c22a/upper,workdir=/var/lib/rkt/pods/run/b50f6362-600a-400b-b71e-1149fcb0b39b/overlay/deps-sha512-c4824b4fb40d4c2b1b5f44e24d2f89adb0a0e01d9ee542255f7fee455734c22a/work' and dest '/var/lib/rkt/pods/run/b50f6362-600a-400b-b71e-1149fcb0b39b/stage1/rootfs'
        └─invalid argument

@s-urbaniak
Copy link
Contributor

Regarding the overlay-over-overlay problem, I see the following kernel log entry:

[78743.001372] overlayfs: filesystem on '/var/lib/rkt/pods/run/3a821735-5c22-42f4-a3cf-b9eaa8586c2b/overlay/deps-sha512-cc290423b58b10408609a9a52bc11036255d945d2b8c7dbfac3ac1a193f747d3/upper' not supported as upperdir

The kernel docs https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt say:

The upper filesystem will normally be writable and if it
is it must support the creation of trusted.* extended attributes, and
must provide valid d_type in readdir responses, so NFS is not suitable.

Since / in the outer container is mounted on an overlay filesystem the above constraint is not satisfied.

There are several possibilities which work:

  1. Start the outer container using the --no-overlay option. This causes / to be mounted natively.
  2. Mount a host volume on a non-overlay filesystem as /var/lib/rkt to the outer container.
  3. Start the inner container using the --no-overlay option.

@jonboulle
Copy link
Contributor

@s-urbaniak this is on a modern kernel? thought it was fixed in #1537 (which we call out in README.md)

@s-urbaniak
Copy link
Contributor

@jonboulle this is kernel 4.8.8 (running Arch)

@s-urbaniak
Copy link
Contributor

i.e. the test from #1537 (comment) fails with the above message.

@s-urbaniak
Copy link
Contributor

Note that this is another error message "not supported as upperdir" vs. "No such device or address"

@jonboulle
Copy link
Contributor

OK, I suggest closing this out and adding a caveat + follow-up issue for nested overlay (also README note might need a tweak?)

s-urbaniak pushed a commit to s-urbaniak/rkt that referenced this issue Nov 24, 2016
s-urbaniak pushed a commit to s-urbaniak/rkt that referenced this issue Nov 24, 2016
@blalor
Copy link

blalor commented Nov 24, 2016

Should running rkt in a rkt container work in 1.19.0 with --no-overlay, now? I'm still running into a couple of different problems when trying it on CentOS 7 with kernel 4.8.9 (mlkernel from elrepo). I looked at the issues closed on the v1.20.0 milestone for actual code changes related to running rkt in rkt, but nothing stood out to indicate this shouldn't now work with 1.19.0.

Based on @s-urbaniak's examples above (mainly specifying --no-overlay for both containers):

[root@localhost ~]# rkt run --no-overlay --volume rkt,kind=host,source=/usr/bin/rkt --volume stage1,kind=host,source=/usr/lib/rkt --volume etc-rkt,kind=host,source=/etc/rkt  --mount volume=rkt,target=/usr/bin/rkt   --mount volume=stage1,target=/usr/lib/rkt   --mount volume=etc-rkt,target=/etc/rkt   --insecure-options=all   docker://ubuntu   --net=host   --dns=8.8.8.8   --interactive   --exec=/bin/bash
root@rkt-faa1536d-ff94-44ad-ab3d-9847d87d1ce3:/# rkt run --debug --no-overlay --insecure-options=all   docker://progrium/busybox   --net=host   --dns=8.8.8.8   --interactive   --exec=/bin/sh
image: using image from file /usr/lib/rkt/stage1-images/stage1-coreos.aci
image: remote fetching from URL "docker://progrium/busybox"
image: fetching image from docker://progrium/busybox
Downloading sha256:a3ed95caeb0 [=============================]       32 B / 32 B
Downloading sha256:a3ed95caeb0 [=============================]       32 B / 32 B
Downloading sha256:3aaade50789 [=============================]     247 B / 247 B
Downloading sha256:d2ba336f2e4 [=============================] 33.7 KB / 33.7 KB
Downloading sha256:dfda3e01f2b [=============================]   243 KB / 243 KB
Downloading sha256:00cf8b9f3d2 [=============================]     220 B / 220 B
Downloading sha256:7ff999a2256 [=============================] 2.53 KB / 2.53 KB
Downloading sha256:77c6c00e8b6 [=============================] 2.17 MB / 2.17 MB
stage0: Preparing stage1
stage0: Writing image manifest
stage0: Loading image sha512-e350475af8d32ff7973ce7c4940ed6e0abf4c5c8f3da48c3b72b872e0d8516ae
stage0: Writing image manifest
stage0: Writing pod manifest
run: group "rkt" not found, will use default gid when rendering images
stage0: Setting up stage1
stage0: Wrote filesystem to /var/lib/rkt/pods/run/eade84e6-47ce-4993-a30d-160223543b2d
stage0: Pivoting to filesystem /var/lib/rkt/pods/run/eade84e6-47ce-4993-a30d-160223543b2d
stage0: Execing [/var/lib/rkt/pods/run/eade84e6-47ce-4993-a30d-160223543b2d/stage1/rootfs/init --debug --net=host --interactive --local-config=/etc/rkt --dns-conf-mode=resolv=stage0,hosts=default --disable-capabilities-restriction --disable-paths --disable-seccomp eade84e6-47ce-4993-a30d-160223543b2d]
stage1: warning: error setting journal ACLs, you'll need root to read the pod journal
  └─group "rkt" not found
stage1: error getting container subcgroup
  └─could not determine if we're running from a unit file
    └─error calling sd_pid_get_owner_uid: no medium found
root@rkt-faa1536d-ff94-44ad-ab3d-9847d87d1ce3:/# echo $?
254

If I use progrium/busybox as the outer container I get a different error:

[root@localhost ~]# rkt run --no-overlay --volume rkt,kind=host,source=/usr/bin/rkt --volume stage1,kind=host,source=/usr/lib/rkt --volume etc-rkt,kind=host,source=/etc/rkt  --mount volume=rkt,target=/usr/bin/rkt   --mount volume=stage1,target=/usr/lib/rkt   --mount volume=etc-rkt,target=/etc/rkt   --insecure-options=all   docker://progrium/busybox   --net=host   --dns=8.8.8.8   --interactive   --exec=/bin/sh
/ # rkt run --debug --no-overlay --insecure-options=all   docker://progrium/busybox   --net=host   --dns=8.8.8.8   --interactive   --exec=/bin/sh
image: using image from file /usr/lib/rkt/stage1-images/stage1-coreos.aci
image: remote fetching from URL "docker://progrium/busybox"
image: fetching image from docker://progrium/busybox
Downloading sha256:a3ed95caeb0 [=============================]       32 B / 32 B
Downloading sha256:a3ed95caeb0 [=============================]       32 B / 32 B
Downloading sha256:dfda3e01f2b [=============================]   243 KB / 243 KB
Downloading sha256:3aaade50789 [=============================]     247 B / 247 B
Downloading sha256:d2ba336f2e4 [=============================] 33.7 KB / 33.7 KB
Downloading sha256:7ff999a2256 [=============================] 2.53 KB / 2.53 KB
Downloading sha256:77c6c00e8b6 [=============================] 2.17 MB / 2.17 MB
Downloading sha256:00cf8b9f3d2 [=============================]     220 B / 220 B
stage0: Preparing stage1
stage0: Writing image manifest
stage0: Loading image sha512-e350475af8d32ff7973ce7c4940ed6e0abf4c5c8f3da48c3b72b872e0d8516ae
stage0: Writing image manifest
stage0: Writing pod manifest
run: group "rkt" not found, will use default gid when rendering images
stage0: Setting up stage1
stage0: Wrote filesystem to /var/lib/rkt/pods/run/c385b98a-6f4c-4adb-96e4-8e2a06fc5956
stage0: Pivoting to filesystem /var/lib/rkt/pods/run/c385b98a-6f4c-4adb-96e4-8e2a06fc5956
stage0: Execing [/var/lib/rkt/pods/run/c385b98a-6f4c-4adb-96e4-8e2a06fc5956/stage1/rootfs/init --debug --net=host --interactive --local-config=/etc/rkt --dns-conf-mode=resolv=stage0,hosts=default --disable-capabilities-restriction --disable-paths --disable-seccomp c385b98a-6f4c-4adb-96e4-8e2a06fc5956]
stage1: warning: error setting journal ACLs, you'll need root to read the pod journal
  └─group "rkt" not found
stage1: error getting container subcgroup
  └─could not determine if we're running from a unit file
    └─unable to open a handle to the library
/ # echo $?
254

@s-urbaniak
Copy link
Contributor

@blalor The corresponding PR didn't land yet in master, it will land in v1.21.0.

@jonboulle
Copy link
Contributor

@s-urbaniak but this issue is against 1.20.0?

@s-urbaniak
Copy link
Contributor

@jonboulle whoops, you are right, let me reassign this issue to v1.21.0, it is still pointing to the initial target release, and I will also create a follow-up issue regarding overlay.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants