Deleted bind-mount points fail restore #9

stlalpha · 2015-07-27T13:31:49Z

Checkpoints are now completing successfully after enabling evasive devices for this container - however I am getting the following error on restores (checkpoint and restoring on same host) - any ideas? Thanks!:

(00.106760) 1: Found fd 1 (id pipe:[546719]) in inherit fd list (caller inherit_fd_resolve_clash)
(00.106765) 1: Inherit fd 1 moved to 5 to resolve clash
(00.106767) 1: Going to dup 0 into 1
(00.106770) 1: Found fd 2 (id pipe:[546720]) in inherit fd list (caller inherit_fd_resolve_clash)
(00.106773) 1: Inherit fd 2 moved to 6 to resolve clash
(00.106775) 1: Going to dup 0 into 2
(00.106782) 1: Restoring fd 1 (state -> create)
(00.106784) 1: Restoring fd 2 (state -> create)
(00.106787) 1: Restoring fd 3 (state -> create)
(00.106789) 1: Creating pipe pipe_id=0x857e7 id=0x2
(00.106796) 1: Restoring size 0x10000 for 0x857e7
(00.106803) 1: Wait fdinfo pid=1 fd=4
(00.106806) 1: Send fd 7 to /crtools-fd-1-4
(00.106823) 1: Create fd for 3
(00.106827) 1: Restoring fd 4 (state -> create)
(00.106829) 1: Creating pipe pipe_id=0x857e7 id=0x3
(00.106832) 1: Waiting fd for 4
(00.106842) 1: Create fd for 4
(00.106846) 1: Restoring fd 5 (state -> create)
(00.106853) 1: fsnotify: Restore 1 wd for 0x00000000
(00.106856) 1: fsnotify: Opening fhandle 2e:41...
(00.106871) 1: Path /' resolved to./' mountpoint
(00.106971) 1: Error (fsnotify.c:131): fsnotify: Can't open file handle for 0x0000002e:0x000000000000000e: Stale file handle
(00.114893) Error (cr-restore.c:1234): 28246 exited, status=1
(00.115123) Error (cr-restore.c:1927): Restoring FAILED.

The text was updated successfully, but these errors were encountered:

xemul · 2015-07-27T13:42:15Z

Can you show the dump log please?

fl0yd · 2015-07-27T14:56:08Z

I'm working with stlalpha on this, here's a gist with a dump and restore log.

https://gist.github.com/fl0yd/4a4f88ee78e53b9977bd

cyrillos · 2015-07-27T16:13:37Z

It looks like the file being watched has vanished after the dump, which is only possible if underlied filesystem does something weird. I suspect it's AUFS? Could you please show the process tree and ls -l /proc/28382/fd ?

fl0yd · 2015-07-27T18:32:23Z

The old process was vaporized after the checkpoint, so I couldn't get the info from 28382. For cleanliness I bounced the box, and restarted a new container with the command.

https://gist.github.com/fl0yd/47ff63636a06df1a1f0c

cyrillos · 2015-07-27T18:47:55Z

So it's docker watching for something on own root.
├─docker(1016)─┬─init(1222)
...dump...
(00.013734) type aufs source none mnt_id 157 s_dev 0x26 / @ ./ flags 0x200000 options si=91208b708a91129b,dio,dirperm1
...
(00.175493) fsnotify: wd: wd 0x00000001 s_dev 0x00000026 i_ino 0x e mask 0x0800abc8
(00.175505) fsnotify: [fhandle] bytes 0x00000028 type 0x00000063 __handle 0x0000000000000041:0x000000000000000e
(00.175515) fsnotify: Opening fhandle 26:41...
(00.175543) Path /' resolved to./' mountpoint
(00.175618) fsnotify: Handle 26:e is openable
...restore...
(00.093582) 1: fsnotify: Restore 1 wd for 0x00000000
(00.093586) 1: fsnotify: Opening fhandle 26:41...
(00.093600) 1: Path /' resolved to./' mountpoint
(00.093664) 1: Error (fsnotify.c:131): fsnotify: Can't open file handle for 0x00000026:0x000000000000000e: Stale file handle

We might need some help from AUFS camp because this problem is definitely lays somewhere inside AUFS code: on the dump we've tested the watchee if it's openable by file handler but on restore open by file handle failed.

Is there a change to run same configuration on native VFS or something?

avagin · 2015-07-27T18:50:34Z

Cc: @SaiedKazemi

SaiedKazemi · 2015-07-28T00:09:59Z

@cyrillos @avagin
Docker 1.5 (https://github.com/SaiedKazemi/docker/releases) does support VFS (docker -d -D -s vfs) and I have successfully tested C/R'ing simple workloads. Worth a try with this workload.

fl0yd · 2015-07-28T00:35:39Z

in v1.8 the syntax isn’t the same, but when I pass the —volume-driver=vfs flag, the results are the same (or at least appear that way to me).

Docker command: docker run -d --volume-driver=vfs ubuntu /sbin/init
Gist: https://gist.github.com/fl0yd/9629cf9cd773ab18a461 https://gist.github.com/fl0yd/9629cf9cd773ab18a461

Mark

On Jul 27, 2015, at 7:10 PM, Saied Kazemi notifications@github.com wrote:

@cyrillos https://github.com/cyrillos @avagin https://github.com/avagin
Docker 1.5 (https://github.com/SaiedKazemi/docker/releases https://github.com/SaiedKazemi/docker/releases) does support VFS (docker -d -D -s vfs) and I have successfully tested C/R'ing simple workloads. Worth a try with this workload.

—
Reply to this email directly or view it on GitHub https://github.com/xemul/criu/issues/9#issuecomment-125383452.

SaiedKazemi · 2015-07-28T00:47:32Z

@fl0yd
The root of the container should be bind mounted before calling criu restore. This is implicitly done in the case of AUFS and OverlayFS when the container's file system is set up before calling criu. For VFS, we need to bind mount the root. I don't think this code exists in 1.8, hence the failure. Can you just wget docker-1.5.0 from the above URL and try it? It's already prebuilt, so the test should be really quick. Or send me your script and I will try it.

fl0yd · 2015-07-28T00:57:15Z

It’s getting pretty late here, but I’ll give the 1.5 a shot, though we have some dependencies on the 1.8 driver features of docker so while it will be interesting to see if it is vfs or aufs related, it will be just that. However, I would like to say that I am extremely thankful for the time that you’ve spent on this issue already and the great work the team has done on CRIU.

As of right now, the docker script is as simple as:

docker run -d ubuntu /sbin/init
docker checkpoint
docker restore <same ID from step 2>

Aside from having started the container, I haven’t started anything within the container. So it should work as is with 1.5 or 1.7/8

Mark

On Jul 27, 2015, at 7:47 PM, Saied Kazemi notifications@github.com wrote:

@fl0yd https://github.com/fl0yd
The root of the container should be bind mounted before calling criu restore. This is implicitly done in the case of AUFS and OverlayFS when the container's file system is set up before calling criu. For VFS, we need to bind mount the root. I don't think this code exists in 1.8, hence the failure. Can you just wget docker-1.5.0 from the above URL and try it? It's already prebuilt, so the test should be really quick. Or send me your script and I will try it.

—
Reply to this email directly or view it on GitHub https://github.com/xemul/criu/issues/9#issuecomment-125394819.

SaiedKazemi · 2015-07-28T01:32:00Z

My pleasure to help where I can. I just tried your container with both VFS and OverlayFS storage drivers on 3.19.8 (Ubuntu Vivid) kernel patched to fix OverlayFS bugs. Both restores failed with the same error (see below). I think the "deleted" string confuses things but I remember that CRIU has code to handle it. Unfortunately, don't have time to take a closer look at the moment but will definitely come back to it first chance I get.

VFS
(00.021356) 1: 71:./dev/mqueue private 1 shared 0 slave 0
(00.021359) 1: Mounting tmpfs @./proc/kcore (0)
(00.021363) 1: Bind ./dev/null//deleted to ./proc/kcore
(00.021378) 1: Error (mount.c:1921): Can't mount at ./proc/kcore: Not a directory
(00.035149) Error (cr-restore.c:1912): Restoring FAILED.

OverlayFS
(00.021356) 1: 71:./dev/mqueue private 1 shared 0 slave 0
(00.021359) 1: Mounting tmpfs @./proc/kcore (0)
(00.021363) 1: Bind ./dev/null//deleted to ./proc/kcore
(00.021378) 1: Error (mount.c:1921): Can't mount at ./proc/kcore: Not a directory
(00.035149) Error (cr-restore.c:1912): Restoring FAILED.

cyrillos · 2015-07-28T07:11:16Z

This doesn't look like file handle problem anymore though :) Still it seems that we're covering only one case of "deleted" name postfix, I'll re-check.

cyrillos · 2015-07-28T08:11:31Z

@SaiedKazemi Saied, I sent out a patch for handling "deleted" postfix from names, could you please give it a shot once time permit? (sent it into mailing list with you CC'ed)

fl0yd · 2015-07-28T17:42:01Z

I applied the diff from the patch Cyrillos mentioned above and confirm the "deleted" fix works for me.

I am still having issues with the restore using aufs and vfs.

with root mounted at "/" I get:

(00.113321) 1: fsnotify: Restore 0x1 wd for 0x00000000
(00.113325) 1: fsnotify: Opening fhandle 27:41...
(00.113339) 1: Path /' resolved to./' mountpoint
(00.113409) 1: Error (fsnotify.c:131): fsnotify: Can't open file handle for 0x00000027:0x000000000000000e: Stale file handle
(00.120768) Error (cr-restore.c:1927): Restoring FAILED.

with root mounted at /mnt/tmproot via -v /:/mnt/tmproot I get:

(00.010633) Error (mount.c:597): FS mnt ./mnt/tmproot/boot dev 0x800001 root / unsupported id 209
(00.010643) Unlock network
(00.010648) Running network-unlock scripts
(00.010659) RPC
(00.011258) Unfreezing tasks into 1
(00.011534) Unseizing 3840 into 1
(00.011736) Error (cr-dump.c:1996): Dumping FAILED.

Thoughts?

cyrillos · 2015-07-28T17:48:51Z

@fl0yd Mark, I fear I've no clue how docker operates with AUFS (neither how it works with VFS). What I know for sure is that when dump happens we explicitly check that file handle for watchee is openable, which allow us to be certain that we will be able to open it back on the restore.

In turn your error message shows that underlies filesystem has changed own contents and the watchee no longer available.

SaiedKazemi · 2015-07-28T18:12:06Z

@cyrillos @fl0yd
Good news! I was able to successfully checkpoint and restore Mark's container using Docker 1.5 with both VFS and OverlayFS. I applied Cyrill's patch but noticed that "//deleted" was not deleted from new->root. So, as a quick test, I patched parse_mountinfo_ent() to get rid of it and C/R worked.

    int rootlen = strlen(new->root);
    if (rootlen > 9) {
            if (!strcmp(&new->root[rootlen - 9], "//deleted"))
                    new->root[rootlen - 9] = '\0';
    }

Cyrill, the code path doesn't make it to strip_deleted() for new->root. Didn't look into why but I am sure it's not hard to find out why.

cyrillos · 2015-07-28T18:21:55Z

@SaiedKazemi Stripping deleted parts called in two cases:

for opened files which has zero links associated
for files living on devpts filesystem

as far as I know we don't call this helper over mount point paths. Saied could you please show the mount tree which contains such thing? It looks like we should handle it too.

SaiedKazemi · 2015-07-28T18:42:13Z

@cyrillos
Please see https://gist.github.com/SaiedKazemi/347a2e04a1ba78bf2113 for two dump.log files: a6_dump.log before my patch. 36_dump.log after my patch.

cyrillos · 2015-07-28T19:59:44Z

So as far as I understand it's bindmount for deleted entry. Is it a common situation for docker?

cyrillos · 2015-07-28T20:00:36Z

When you strip //deleted part it restores fine, correct?

SaiedKazemi · 2015-07-28T20:19:02Z

@xemul
Yes, it restores fine when "//deleted" is stripped from new->root.

Re how we end up with it, I don't know. I had seen the " (deleted)" suffix which Pavel can explain much better why/when it shows up. But this is the first time I am seeing "//deleted". It's added by dentry_path() in fs/dcache.c when it calls prepend() although it's actually appended!

cyrillos · 2015-07-28T20:48:55Z

@SaiedKazemi We end up this way when remove bindmount

136 130 0:52 /m1//deleted /home/cyrill/m/m2 rw,relatime shared:69 - tmpfs none rw

It is easy to simply call for stripping off this postfix I'm just need some time to think if there some caveats ramin... That said once ready I'll send out the patch.

cyrillos · 2015-07-29T08:58:13Z

@xemul Pavel rename the issue please to "Can't restore counters with deleted bind mount points"

xemul · 2015-07-29T10:50:34Z

Done, thanks :)

stlalpha · 2015-08-06T16:40:23Z

Greetings - is there anything we can do to help move this forward? Thanks for the great software!

cyrillos · 2015-08-06T16:45:25Z

The best would be to send us a patch with this problem fixed ;) Just kidding. Sorry, out of time at the moment. Once time permit I'll do the patch.

xemul · 2015-08-07T10:31:59Z

@cyrillos , does your recent patch titled "[PATCH] files-reg: Rework strip_deleted" helps with this? Or something else should be done?

cyrillos · 2015-08-07T10:48:33Z

The patch itself covers only a part of problem: we need to call it when parsing mount points paths, but still while this does the trick for docker as far as I know, the general solution should be restoring deleted mount points. Thus as I said we can merge the patch since it doesn't hurt but need to do more work on top.

xemul · 2015-08-07T12:04:45Z

Hm... Just curious -- why does this mountpoint appear at all?

cyrillos · 2015-08-07T12:09:49Z

It's bind mount to deleted mount point. You bindmount some dir, then remove the source and you get //deleted name on the target.

xemul · 2015-08-07T12:14:04Z

Yes, this is understood. Why does this happen in our case? Who and what for deletes the target?

cyrillos · 2015-08-07T12:15:30Z

I don't know, it comes from docker log. CC'ing @SaiedKazemi

SaiedKazemi · 2015-08-07T19:11:36Z

@cyrillos @xemul @fl0yd
I am not sure either. Based on what Mark said, this happens when you run the following docker command:
$ docker run -d ubuntu:latest /sbin/init
I don't know if it's related to the storage driver, docker version, or kernel version. But I verified that it had to do with //deleted. Once I removed //deleted from the pathname, I could successfully checkpoint and restore.

fl0yd · 2015-08-07T19:15:25Z

It appears to work with external CRIU checkpoint/restore, but using the internal 1.8 experimental checkpoint and restore, I still get the error about a stale file handle.

(00.092761) 1: Restoring fd 5 (state -> create)
(00.092767) 1: fsnotify: Restore 0x1 wd for 0x00000000
(00.092772) 1: fsnotify: Opening fhandle 28:41...
(00.092786) 1: Path /' resolved to./' mountpoint
(00.092856) 1: Error (fsnotify.c:131): fsnotify: Can't open file handle for 0x00000028:0x000000000000000e: Stale file handle

Mark

On Aug 7, 2015, at 2:11 PM, Saied Kazemi notifications@github.com wrote:

@cyrillos https://github.com/cyrillos @xemul https://github.com/xemul @fl0yd https://github.com/fl0yd
I am not sure either. Based on what Mark said, this happens when you run the following docker command:
$ docker run -d ubuntu:latest /sbin/init
I don't know if it's related to the storage driver, docker version, or kernel version. But I verified that it had to do with //deleted. Once I removed //deleted from the pathname, I could successfully checkpoint and restore.

—
Reply to this email directly or view it on GitHub https://github.com/xemul/criu/issues/9#issuecomment-128798102.

avagin · 2015-08-07T19:16:46Z

Probably /sbin/init deletes /dev/null and creates it again.

SaiedKazemi · 2015-08-07T21:52:48Z

You're right, it must be /sbin/init. We can easily reproduce it manually:

[Terminal A] $ docker run -it ubuntu:latest bash -i
[Terminal B] $ sudo grep deleted /proc/5274/mountinfo
$
[Terminal A] # rm /dev/null
[Terminal B] $ sudo grep deleted /proc/5274/mountinfo
99 155 0:61 /null//deleted /proc/kcore rw,nosuid - tmpfs tmpfs rw,mode=755
100 155 0:61 /null//deleted /proc/timer_stats rw,nosuid - tmpfs tmpfs rw,mode=755

Notice we cannot remove hosts (or hostname):

[Terminal A] # # rm /etc/hosts
rm: cannot remove '/etc/hosts': Device or resource busy

xemul · 2015-09-02T12:37:32Z

Should have been fixed by 80ef8fd and d27f539

It can be dead-lokced: #0 0x00007fafbf49f6ac in __lll_lock_wait_private () from /lib64/libc.so.6 #1 0x00007fafbf44af1c in _L_lock_2460 () from /lib64/libc.so.6 #2 0x00007fafbf44ad57 in __tz_convert () from /lib64/libc.so.6 checkpoint-restore#3 0x00000000004022e2 in test_msg (format=0x404508 "Receive signal %d\n") at msg.c:51 checkpoint-restore#4 <signal handler called> checkpoint-restore#5 0x00007fafbf3f2483 in __GI__IO_vfscanf () from /lib64/libc.so.6 checkpoint-restore#6 0x00007fafbf408f27 in vsscanf () from /lib64/libc.so.6 checkpoint-restore#7 0x00007fafbf4032f7 in sscanf () from /lib64/libc.so.6 checkpoint-restore#8 0x00007fafbf449ba6 in __tzset_parse_tz () from /lib64/libc.so.6 checkpoint-restore#9 0x00007fafbf44c4cb in __tzfile_compute () from /lib64/libc.so.6 checkpoint-restore#10 0x00007fafbf44ae17 in __tz_convert () from /lib64/libc.so.6 checkpoint-restore#11 0x00000000004022e2 in test_msg (format=format@entry=0x40458c "PASS\n") at msg.c:51 checkpoint-restore#12 0x0000000000401ceb in main (argc=<optimized out>, argv=<optimized out>) at ptrace_sig.c:172 https://jira.sw.ru/browse/PSBM-47772 Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Tested-by: Cyrill Gorcunov <gorcunov@openvz.org>

It can be dead-lokced: #0 0x00007fafbf49f6ac in __lll_lock_wait_private () from /lib64/libc.so.6 #1 0x00007fafbf44af1c in _L_lock_2460 () from /lib64/libc.so.6 #2 0x00007fafbf44ad57 in __tz_convert () from /lib64/libc.so.6 #3 0x00000000004022e2 in test_msg (format=0x404508 "Receive signal %d\n") at msg.c:51 #4 <signal handler called> #5 0x00007fafbf3f2483 in __GI__IO_vfscanf () from /lib64/libc.so.6 #6 0x00007fafbf408f27 in vsscanf () from /lib64/libc.so.6 #7 0x00007fafbf4032f7 in sscanf () from /lib64/libc.so.6 #8 0x00007fafbf449ba6 in __tzset_parse_tz () from /lib64/libc.so.6 #9 0x00007fafbf44c4cb in __tzfile_compute () from /lib64/libc.so.6 #10 0x00007fafbf44ae17 in __tz_convert () from /lib64/libc.so.6 #11 0x00000000004022e2 in test_msg (format=format@entry=0x40458c "PASS\n") at msg.c:51 #12 0x0000000000401ceb in main (argc=<optimized out>, argv=<optimized out>) at ptrace_sig.c:172 Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Tested-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

phys_stat_resolve() call mount_resolve_path() which requires that mntinfo_tree in the ns_id struct is initialized. This is a problem we observed with sockets on btrfs volumes: Program received signal SIGSEGV, Segmentation fault. 0x00005555555bb6dd in mount_resolve_path (mntinfo_tree=<optimized out>, path=0x555555875790 "/var/lib/lxd/unix.socket") at criu/mount.c:213 213 criu/mount.c: No such file or directory. (gdb) bt #0 0x00005555555bb6dd in mount_resolve_path (mntinfo_tree=<optimized out>, path=0x555555875790 "/var/lib/lxd/unix.socket") at criu/mount.c:213 #1 0x00005555555be240 in phys_stat_resolve_dev (ns=<optimized out>, st_dev=43, path=<optimized out>) at criu/mount.c:240 #2 0x00005555555be2bb in phys_stat_dev_match (st_dev=<optimized out>, phys_dev=41, ns=ns@entry=0x5555558753a0, path=path@entry=0x555555875790 "/var/lib/lxd/unix.socket") at criu/mount.c:256 checkpoint-restore#3 0x00005555555e75ed in unix_process_name (d=d@entry=0x5555558756e0, tb=tb@entry=0x7fffffffe0c0, m=<optimized out>) at criu/sk-unix.c:565 checkpoint-restore#4 0x00005555555e9378 in unix_collect_one (tb=0x7fffffffe0c0, m=0x555555869f18 <buf+312>) at criu/sk-unix.c:620 checkpoint-restore#5 unix_receive_one (h=0x555555869f08 <buf+296>, arg=<optimized out>) at criu/sk-unix.c:692 checkpoint-restore#6 0x00005555555b85aa in nlmsg_receive (buf=<optimized out>, arg=<optimized out>, err_cb=<optimized out>, cb=<optimized out>, len=<optimized out>) at criu/libnetlink.c:45 checkpoint-restore#7 do_rtnl_req (nl=nl@entry=5, req=req@entry=0x7fffffffe220, size=size@entry=72, receive_callback=0x5555555e9290 <unix_receive_one>, error_callback=0x5555555b83d0 <rtnl_return_err>, error_callback@entry=0x0, arg=arg@entry=0x0) at criu/libnetlink.c:119 checkpoint-restore#8 0x00005555555e9cf7 in do_collect_req (nl=nl@entry=5, req=req@entry=0x7fffffffe220, receive_callback=<optimized out>, arg=arg@entry=0x0, size=72) at criu/sockets.c:610 checkpoint-restore#9 0x00005555555eb1d0 in collect_sockets (ns=ns@entry=0x7fffffffe300) at criu/sockets.c:636 checkpoint-restore#10 0x000055555559ddfc in check_sock_diag () at criu/cr-check.c:118 checkpoint-restore#11 cr_check () at criu/cr-check.c:999 checkpoint-restore#12 0x00005555555872d0 in main (argc=<optimized out>, argv=0x7fffffffe678, envp=<optimized out>) at criu/crtools.c:719 Signed-off-by: Christian Brauner <christian.brauner@canonical.com>

A root mount namespace list is used to resolve paths to unix sockets if they are placed on btrfs. This patch fixes a crash: #0 mount_resolve_path at criu/mount.c:213 #1 phys_stat_resolve_dev at criu/mount.c:240 #2 phys_stat_dev_match at criu/mount.c:256 checkpoint-restore#3 unix_process_name at criu/sk-unix.c:565 checkpoint-restore#4 unix_collect_one at criu/sk-unix.c:620 checkpoint-restore#5 unix_receive_one at criu/sk-unix.c:692 checkpoint-restore#6 nlmsg_receive at criu/libnetlink.c:45 checkpoint-restore#7 do_rtnl_req at criu/libnetlink.c:119 checkpoint-restore#8 do_collect_req at criu/sockets.c:610 checkpoint-restore#9 collect_sockets at criu/sockets.c:636 https://bugzilla.redhat.com/show_bug.cgi?id=1381351 Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>

A root mount namespace list is used to resolve paths to unix sockets if they are placed on btrfs. This patch fixes a crash: #0 mount_resolve_path at criu/mount.c:213 #1 phys_stat_resolve_dev at criu/mount.c:240 #2 phys_stat_dev_match at criu/mount.c:256 #3 unix_process_name at criu/sk-unix.c:565 #4 unix_collect_one at criu/sk-unix.c:620 #5 unix_receive_one at criu/sk-unix.c:692 #6 nlmsg_receive at criu/libnetlink.c:45 #7 do_rtnl_req at criu/libnetlink.c:119 #8 do_collect_req at criu/sockets.c:610 #9 collect_sockets at criu/sockets.c:636 travis-ci: success for cr-check: fill up a root task mount namespace https://bugzilla.redhat.com/show_bug.cgi?id=1381351 Signed-off-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

It can be dead-locked: #0 0x00007fafbf49f6ac in __lll_lock_wait_private () from /lib64/libc.so.6 checkpoint-restore#1 0x00007fafbf44af1c in _L_lock_2460 () from /lib64/libc.so.6 checkpoint-restore#2 0x00007fafbf44ad57 in __tz_convert () from /lib64/libc.so.6 checkpoint-restore#3 0x00000000004022e2 in test_msg (format=0x404508 "Receive signal %d\n") at msg.c:51 checkpoint-restore#4 <signal handler called> checkpoint-restore#5 0x00007fafbf3f2483 in __GI__IO_vfscanf () from /lib64/libc.so.6 checkpoint-restore#6 0x00007fafbf408f27 in vsscanf () from /lib64/libc.so.6 checkpoint-restore#7 0x00007fafbf4032f7 in sscanf () from /lib64/libc.so.6 checkpoint-restore#8 0x00007fafbf449ba6 in __tzset_parse_tz () from /lib64/libc.so.6 checkpoint-restore#9 0x00007fafbf44c4cb in __tzfile_compute () from /lib64/libc.so.6 checkpoint-restore#10 0x00007fafbf44ae17 in __tz_convert () from /lib64/libc.so.6 checkpoint-restore#11 0x00000000004022e2 in test_msg (format=format@entry=0x40458c "PASS\n") at msg.c:51 checkpoint-restore#12 0x0000000000401ceb in main (argc=<optimized out>, argv=<optimized out>) at ptrace_sig.c:172 https://jira.sw.ru/browse/PSBM-47772 Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>

'info' array is off-by-one, nla_parse_nested() requires destination array (i.e. 'info') to have maxtype+1 (i.e. IFLA_INFO_MAX+1) elements: ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffef823e3f8 WRITE of size 48 at 0x7ffef823e3f8 thread T0 #0 0x7f9ab7a3915b in __asan_memset (/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/libasan.so.2+0x8d15b) #1 0x7f9ab6d4e553 in nla_parse (/usr/lib64/libnl-3.so.200+0xa553) #2 0x4acfb7 in dump_one_netdev criu/net.c:445 checkpoint-restore#3 0x4adb60 in dump_one_ethernet criu/net.c:594 checkpoint-restore#4 0x4adb60 in dump_one_link criu/net.c:665 checkpoint-restore#5 0x48af69 in nlmsg_receive criu/libnetlink.c:45 checkpoint-restore#6 0x48af69 in do_rtnl_req criu/libnetlink.c:119 checkpoint-restore#7 0x4b0e86 in dump_links criu/net.c:878 checkpoint-restore#8 0x4b0e86 in dump_net_ns criu/net.c:1651 checkpoint-restore#9 0x4a760d in do_dump_namespaces criu/namespaces.c:985 checkpoint-restore#10 0x4a760d in dump_namespaces criu/namespaces.c:1045 checkpoint-restore#11 0x451ef7 in cr_dump_tasks criu/cr-dump.c:1799 checkpoint-restore#12 0x424588 in main criu/crtools.c:736 checkpoint-restore#13 0x7f9ab67b171f in __libc_start_main (/lib64/libc.so.6+0x2071f) checkpoint-restore#14 0x4253d8 in _start (/criu/criu/criu+0x4253d8) Address 0x7ffef823e3f8 is located in stack of thread T0 at offset 264 in frame #0 0x4ac9ef in dump_one_netdev criu/net.c:364 This frame has 5 object(s): [32, 168) 'netdev' [224, 264) 'info' <== Memory access at offset 264 overflows this variable [320, 1040) 'req' [1088, 3368) 'path' [3424, 3625) 'stable_secret' Increase 'info' size to fix this. Fixes: b705dcc ("net: pass the struct nlattrs to dump() functions") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>

'info' array is off-by-one, nla_parse_nested() requires destination array (i.e. 'info') to have maxtype+1 (i.e. IFLA_INFO_MAX+1) elements: ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffef823e3f8 WRITE of size 48 at 0x7ffef823e3f8 thread T0 #0 0x7f9ab7a3915b in __asan_memset (/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/libasan.so.2+0x8d15b) #1 0x7f9ab6d4e553 in nla_parse (/usr/lib64/libnl-3.so.200+0xa553) #2 0x4acfb7 in dump_one_netdev criu/net.c:445 #3 0x4adb60 in dump_one_ethernet criu/net.c:594 #4 0x4adb60 in dump_one_link criu/net.c:665 #5 0x48af69 in nlmsg_receive criu/libnetlink.c:45 #6 0x48af69 in do_rtnl_req criu/libnetlink.c:119 #7 0x4b0e86 in dump_links criu/net.c:878 #8 0x4b0e86 in dump_net_ns criu/net.c:1651 #9 0x4a760d in do_dump_namespaces criu/namespaces.c:985 #10 0x4a760d in dump_namespaces criu/namespaces.c:1045 #11 0x451ef7 in cr_dump_tasks criu/cr-dump.c:1799 #12 0x424588 in main criu/crtools.c:736 #13 0x7f9ab67b171f in __libc_start_main (/lib64/libc.so.6+0x2071f) #14 0x4253d8 in _start (/criu/criu/criu+0x4253d8) Address 0x7ffef823e3f8 is located in stack of thread T0 at offset 264 in frame #0 0x4ac9ef in dump_one_netdev criu/net.c:364 This frame has 5 object(s): [32, 168) 'netdev' [224, 264) 'info' <== Memory access at offset 264 overflows this variable [320, 1040) 'req' [1088, 3368) 'path' [3424, 3625) 'stable_secret' Increase 'info' size to fix this. Fixes: b705dcc ("net: pass the struct nlattrs to dump() functions") travis-ci: success for net: fix stack out-of-bounds access in dump_one_netdev() Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>

Running the zdtm/static/unlink_regular00 test on Ubuntu 24.04 on aarch64 results in following error: # ./zdtm.py run -t zdtm/static/unlink_regular00 -k always userns is supported === Run 1/1 ================ zdtm/static/unlink_regular00 ==================== Run zdtm/static/unlink_regular00 in ns ==================== Skipping rtc at root Start test Test is SUID ./unlink_regular00 --pidfile=unlink_regular00.pid --outfile=unlink_regular00.out --dirname=unlink_regular00.test Run criu dump *** buffer overflow detected ***: terminated ############# Test zdtm/static/unlink_regular00 FAIL at CRIU dump ############## Test output: ================================ <<< ================================ Send the 9 signal to 47 Wait for zdtm/static/unlink_regular00(47) to die for 0.100000 ##################################### FAIL ##################################### According to the backtrace: #0 __pthread_kill_implementation (threadid=281473158467616, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44 #1 0x0000ffff93477690 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 #2 0x0000ffff9342cb3c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x0000ffff93417e00 in __GI_abort () at ./stdlib/abort.c:79 #4 0x0000ffff9346abf0 in __libc_message_impl (fmt=fmt@entry=0xffff93552a78 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:132 #5 0x0000ffff934e81a8 in __GI___fortify_fail (msg=msg@entry=0xffff93552a28 "buffer overflow detected") at ./debug/fortify_fail.c:24 #6 0x0000ffff934e79e4 in __GI___chk_fail () at ./debug/chk_fail.c:28 #7 0x0000ffff934e9070 in ___snprintf_chk (s=s@entry=0xffffc6ed04a3 "testfile", maxlen=maxlen@entry=4056, flag=flag@entry=2, slen=slen@entry=4053, format=format@entry=0xaaaacffe3888 "link_remap.%d") at ./debug/snprintf_chk.c:29 #8 0x0000aaaacff4b8b8 in snprintf (__fmt=0xaaaacffe3888 "link_remap.%d", __n=4056, __s=0xffffc6ed04a3 "testfile") at /usr/include/aarch64-linux-gnu/bits/stdio2.h:54 #9 create_link_remap (path=path@entry=0xffffc6ed2901 "/zdtm/static/unlink_regular00.test/subdir/testfile", len=len@entry=60, lfd=lfd@entry=20, idp=idp@entry=0xffffc6ed14ec, nsid=nsid@entry=0xaaaada2bac00, parms=parms@entry=0xffffc6ed2808, fallback=0xaaaacff4c6c0 <dump_linked_remap+96>, fallback@entry=0xffffc6ed2797) at criu/files-reg.c:1164 #10 0x0000aaaacff4c6c0 in dump_linked_remap (path=path@entry=0xffffc6ed2901 "/zdtm/static/unlink_regular00.test/subdir/testfile", len=len@entry=60, parms=parms@entry=0xffffc6ed2808, lfd=lfd@entry=20, id=id@entry=12, nsid=nsid@entry=0xaaaada2bac00, fallback=fallback@entry=0xffffc6ed2797) at criu/files-reg.c:1198 #11 0x0000aaaacff4d8b0 in check_path_remap (nsid=0xaaaada2bac00, id=12, lfd=20, parms=0xffffc6ed2808, link=<optimized out>) at criu/files-reg.c:1426 #12 dump_one_reg_file (lfd=20, id=12, p=0xffffc6ed2808) at criu/files-reg.c:1827 #13 0x0000aaaacff51078 in dump_one_file (pid=<optimized out>, fd=4, lfd=20, opts=opts@entry=0xaaaada2ba2c0, ctl=ctl@entry=0xaaaada2c4d50, e=e@entry=0xffffc6ed39c8, dfds=dfds@entry=0xaaaada2c3d40) at criu/files.c:581 #14 0x0000aaaacff5176c in dump_task_files_seized (ctl=ctl@entry=0xaaaada2c4d50, item=item@entry=0xaaaada2b8f80, dfds=dfds@entry=0xaaaada2c3d40) at criu/files.c:657 #15 0x0000aaaacff3d3c0 in dump_one_task (parent_ie=0x0, item=0xaaaada2b8f80) at criu/cr-dump.c:1679 #16 cr_dump_tasks (pid=<optimized out>) at criu/cr-dump.c:2224 #17 0x0000aaaacff163a0 in main (argc=<optimized out>, argv=0xffffc6ed40e8, envp=<optimized out>) at criu/crtools.c:293 This line is the problem: snprintf(tmp + 1, sizeof(link_name) - (size_t)(tmp - link_name - 1), "link_remap.%d", rfe.id); The problem was that the `-1` was on the inside of the braces and not on the outside. This way the destination size was increase by 1 instead of being decreased by 1 which triggered the buffer overflow detection. Signed-off-by: Adrian Reber <areber@redhat.com>

xemul changed the title ~~Restoring container - failure~~ Deleted bind-mount points fail restore Jul 29, 2015

xemul added the bug label Jul 29, 2015

xemul closed this as completed Sep 2, 2015

kikosha mentioned this issue May 29, 2016

Segmentation fault in libnl-3.so when dumping a process #169

Closed

This was referenced Mar 17, 2021

When name is null, it will coredump #1411

Closed

When name is null, it will coredump #1412

Open

throwbear mentioned this issue May 15, 2021

[Python Supervisord] criu restore is successful, But the supervisord program coredump #1477

Open

Deleted bind-mount points fail restore #9

Deleted bind-mount points fail restore #9

Comments

stlalpha commented Jul 27, 2015

xemul commented Jul 27, 2015

fl0yd commented Jul 27, 2015

cyrillos commented Jul 27, 2015

fl0yd commented Jul 27, 2015

cyrillos commented Jul 27, 2015

avagin commented Jul 27, 2015

SaiedKazemi commented Jul 28, 2015

fl0yd commented Jul 28, 2015

SaiedKazemi commented Jul 28, 2015

fl0yd commented Jul 28, 2015

SaiedKazemi commented Jul 28, 2015

cyrillos commented Jul 28, 2015

cyrillos commented Jul 28, 2015

fl0yd commented Jul 28, 2015

cyrillos commented Jul 28, 2015

SaiedKazemi commented Jul 28, 2015

cyrillos commented Jul 28, 2015

SaiedKazemi commented Jul 28, 2015

cyrillos commented Jul 28, 2015

cyrillos commented Jul 28, 2015

SaiedKazemi commented Jul 28, 2015

cyrillos commented Jul 28, 2015

cyrillos commented Jul 29, 2015

xemul commented Jul 29, 2015

stlalpha commented Aug 6, 2015

cyrillos commented Aug 6, 2015

xemul commented Aug 7, 2015

cyrillos commented Aug 7, 2015

xemul commented Aug 7, 2015

cyrillos commented Aug 7, 2015

xemul commented Aug 7, 2015

cyrillos commented Aug 7, 2015

SaiedKazemi commented Aug 7, 2015

fl0yd commented Aug 7, 2015

avagin commented Aug 7, 2015

SaiedKazemi commented Aug 7, 2015

xemul commented Sep 2, 2015