Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alter cpusets mems/cpus to allow restoring on nodes with different amt of resources #33

Open
eliams opened this issue Sep 23, 2015 · 9 comments

Comments

@eliams
Copy link

eliams commented Sep 23, 2015

Hi,

So I want to checkpoint a container on one host and restore it in another one by moving the image directory and using --force option to restore in another container.
So I created a container on host B but restoring failed on host A (checkpoint restore on same host work on both of them).
What I did:
Host B:
docker run -d busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
docker checkpoint --work-dir=< work dir > --image-dir=< image dir >

Host A:
docker create busybox
docker restore --work-dir=< work dir > --image-dir=< image dir copied from host b > --force=true < created container id >

restore logs: http://pastebin.com/jpnhuePH
I don't understand if the /bin/sh mentioned in the error is containers or hosts files.

My environment on both hosts:
Debian based with custom 4.0.0 kernel.
Docker version 1.9.0-dev, build 2919249, experimental
(I am using aufs as storage driver)
CRIU Version: 1.7 GitID: v1.7-45-g7ae72c6

Thanks for the help.

@eliams eliams changed the title Restore container from another host fail. Restore container to another host fail. Sep 23, 2015
@xemul
Copy link
Member

xemul commented Sep 23, 2015

The /bin/sh mentioned there is container's file.

@xemul xemul added the docker label Sep 23, 2015
@eliams
Copy link
Author

eliams commented Sep 23, 2015

So if I get it right the file /bin/sh does not have the same size in containers created from host A and B (The size was saved during checkpoint on host B and then when trying to restore to a new created container on host A it find a different size for that file) ? If this is true how it can be possible since both containers are from the same busybox image ?

I tried to run and checkpoint the container on host A and restore on host B but I got a different error:
http://pastebin.com/6ELkxnX3

@xemul
Copy link
Member

xemul commented Sep 23, 2015

As far as the /bin/sh is concerned -- can you check manually the sizes of it in A and B?

As far as the 2nd error is concerned -- it's ERANGE, smth is written into cpuset.mems that is not a valid mask. What's in you cgroup.img file?

@eliams
Copy link
Author

eliams commented Sep 24, 2015

I was using a different version of the busybox image on both hosts. Using the same version fixed the first error, sorry for missing it.

Here's the cgroup.img file: https://github.com/eliams/tmp/raw/master/criu_image_01/cgroup.img
The all criu image directory is here: https://github.com/eliams/tmp/tree/master/criu_image_01

@xemul
Copy link
Member

xemul commented Sep 24, 2015

                            {
                                "name": "cpuset.cpus", 
                                "value": "0-7"
                            }, 

I guess your other box doesn't accept THAT many CPUs, right?

Well, yes. This is a live migration issue, I would say -- restoring an image on a host with different amount of resources would require some images modification.

You can try to workaround this issue by

  1. decoding cgroup.img file into json format (using crit)
  2. manually fixing the cpuset.cpus value to be OK for your target machine
  3. encoding the json back into cgroup.img (using crit)

After this the restore should work OK.

@xemul xemul changed the title Restore container to another host fail. Restore cpusets on node with differne amt of mem/cpus fails Sep 24, 2015
@xemul xemul changed the title Restore cpusets on node with differne amt of mem/cpus fails Alter cpusets mems/cpus to allow restoring on nodes with different amt of resources Sep 24, 2015
@eliams
Copy link
Author

eliams commented Sep 24, 2015

You are right, I was doing the checkpoint on a machine with 8 cpu and restoring on a machine with 4.
Manually fixing the cpuset.cpus value worked.

Thanks !

@xemul xemul removed the docker label Sep 24, 2015
@xemul
Copy link
Member

xemul commented Sep 24, 2015

You're welcome!

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions
Copy link

github-actions bot commented Apr 7, 2021

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants