Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"./etc/group doesn't have a proper root mount" while checkpointing singularity #841

Closed
ShunyuYao515 opened this issue Nov 10, 2019 · 9 comments

Comments

@ShunyuYao515
Copy link

ShunyuYao515 commented Nov 10, 2019

I'm running a simple HPC program using Singularity, and my goal is to be able to checkpoint/restart/migrate the whole container using criu. Note that I need to run the Singularity container without sudo privilege.

Since Singularity hasn't integrate criu into itself, I guess I'd just have to manually dump it (sorry for my stupidity), but I don't really know what's the proper way to do it. below is my command to dump it:

sudo criu dump -v4 --tree 16076 --images-dir /home/CRIU/exit_dir --external mnt[]:m --leave-stopped --shell-job

And then criu spits out the following to me:

(00.001884) mnt: autodetected external mount /run/systemd/resolve/stub-resolv.conf for ./etc/resolv.conf
(00.001888) mnt: autodetected external mount //var/tmp for ./var/tmp
(00.001891) mnt: autodetected external mount //tmp for ./tmp
(00.001894) mnt: autodetected external mount /dev/mqueue/ for ./dev/mqueue
(00.001897) mnt: autodetected external mount /dev/hugepages/ for ./dev/hugepages
(00.001900) mnt: autodetected external mount /dev/shm/ for ./dev/shm
(00.001903) mnt: autodetected external mount /dev/pts/ for ./dev/pts
(00.001906) mnt: autodetected external mount /dev/ for ./dev
(00.001909) mnt: autodetected external mount /sys/ for ./sys
(00.001912) mnt: autodetected external mount /proc/sys/fs/binfmt_misc/ for ./proc/sys/fs/binfmt_misc
(00.001914) mnt: autodetected external mount /proc/ for ./proc
(00.001917) mnt: Inspecting sharing on 1306 shared_id 0 master_id 5 (@./etc/resolv.conf)
(00.001920) mnt: Inspecting sharing on 1305 shared_id 0 master_id 0 (@./etc/group)
(00.001922) mnt: The mount 1304 is bind for 1305 (@./etc/passwd -> @./etc/group)
(00.001924) mnt: Inspecting sharing on 1304 shared_id 0 master_id 0 (@./etc/passwd)
(00.001927) mnt: Inspecting sharing on 1303 shared_id 0 master_id 1 (@./var/tmp)
(00.001929) mnt: Inspecting sharing on 1302 shared_id 0 master_id 1 (@./tmp)
(00.001931) mnt: Inspecting sharing on 1301 shared_id 0 master_id 27 (@./dev/mqueue)
(00.001934) mnt: Inspecting sharing on 1300 shared_id 0 master_id 26 (@./dev/hugepages)
(00.001936) mnt: Inspecting sharing on 1299 shared_id 0 master_id 4 (@./dev/shm)
(00.001938) mnt: Inspecting sharing on 1298 shared_id 0 master_id 3 (@./dev/pts)
(00.001940) mnt: Inspecting sharing on 1297 shared_id 0 master_id 2 (@./dev)
(00.001943) mnt: Inspecting sharing on 1296 shared_id 0 master_id 0 (@./sys)
(00.001945) mnt: Inspecting sharing on 1295 shared_id 0 master_id 28 (@./proc/sys/fs/binfmt_misc)
(00.001947) mnt: Inspecting sharing on 1294 shared_id 0 master_id 14 (@./proc)
(00.001949) mnt: Inspecting sharing on 1291 shared_id 0 master_id 0 (@./)
(00.001952) Error (criu/mount.c:670): mnt: 1305:./etc/group doesn't have a proper root mount
(00.001956) Unlock network
(00.001959) Unfreezing tasks into 1
(00.001961) Unseizing 16076 into 1
(00.001968) Error (criu/cr-dump.c:1763): Dumping FAILED.

The full test log is here:
test.log

What did I do wrong here? Appreciate your time & efforts!

@adrianreber
Copy link
Member

There have been other attempts to checkpoint/restore singularity containers. Maybe something there helps you:

As you are trying to run rootless containers with singularity it would be good to know how singularity is creating the containers. As far as I know there are two versions of singularity around. The old version was, to work on older kernels, not using namespaces a lot but was mainly based on setuid chroot()s. The newer version of singularity, as far as I know, tries to actually use namespaces as many other container engines. Which singularity version are you using?

Not sure how useful the following suggestion is. Podman can also be used to run containers rootless in HPC environments (https://podman.io/blogs/2019/09/26/podman-in-hpc.html). Podman's checkpoint/restore support would not help you as it requires running the containers as root, but we know that Podman can checkpoint/restore its root containers. Maybe that makes it easier for rootless containers. Not sure.

@ShunyuYao515
Copy link
Author

Thank you for your reply!

I'm using the 2.5.2 version of Singularity, released on July 3rd, 2018. Singularity is already being deployed for a while in our cluster, so it might be hard to switch container tool, but thanks for the advise!

@avagin
Copy link
Member

avagin commented Nov 12, 2019

(00.001952) Error (criu/mount.c:670): mnt: 1305:./etc/group doesn't have a proper root mount

/etc/group has to be restored as external mount.

you can look at test/zdtm/static/mnt_ext_manual.desc as an example of using the --externel options for mounts.

@ShunyuYao515
Copy link
Author

@avagin Thank you! Now after adding proper "--external" option for /etc/group according to the error message, the dump process proceeded, but now this time there is another error message:

(00.006210) ========================================
(00.006213) Dumping task (pid: 22824)
(00.006215) ========================================
(00.006217) Obtaining task stat ...
(00.006236)
(00.006239) Collecting mappings (pid: 22824)
(00.006241) ----------------------------------------
(00.006902) Error (criu/mount.c:3466): mnt: The root task has another root than mntns: /usr/local/var/singularity/mnt/final
(00.006909) Error (criu/mount.c:153): mnt: Unable to get the root file descriptor of pid 22824
(00.006913) Error (criu/cr-dump.c:1243): Collect mappings (pid: 22824) failed with -1
(00.006973) Unlock network
(00.006979) Unfreezing tasks into 1
(00.006981) Unseizing 22824 into 1
(00.006989) Error (criu/cr-dump.c:1763): Dumping FAILED.

I've checked all the related posted issues about "The root task has another root than mntns" but they don't actually help my case.

Below is the full log. Again, thank you so much!
test.log

@avagin
Copy link
Member

avagin commented Nov 14, 2019

(00.006902) Error (criu/mount.c:3466): mnt: The root task has another root than mntns: /usr/local/var/singularity/mnt/final

If I am not mistaken, this means that the root task changed its root by calling chroot. The criu doesn't support this case. All modern container runtimes use pivot_root to change the root file system.

@avagin
Copy link
Member

avagin commented Nov 14, 2019

Cc: @Snorch

@adrianreber
Copy link
Member

If I am not mistaken, this means that the root task changed its root by calling chroot. The criu doesn't support this case. All modern container runtimes use pivot_root to change the root file system.

That is the reason I was asking for the version of Singularity. I think the 2.x versions of Singularity are based on chroot and setuid from what I have seen.

@ShunyuYao515
Copy link
Author

Now the problem is relaxed into the same one as #600

@ShunyuYao515
Copy link
Author

The error now is a completely different problem. I posted another issue #855. Thank you guys so much for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants