Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arch/x86: fpu_state->fpu_state_ia32.xsave hast to be 64-byte aligned #1

Open
wants to merge 250 commits into
base: criu-dev
Choose a base branch
from

Conversation

criupatchwork
Copy link
Owner

No description provided.

xemul and others added 30 commits May 16, 2019 03:24
So, here's the enhanced version of the first try.

Changes are:

1. The wrapper name is criu-ns instead of crns.py
2. The CLI is absolutely the same as for criu, since the script
   re-execl-s criu binary. E.g.
	   scripts/criu-ns dump -t 1234 ...
   just works
3. Caller doesn't need to care about substituting CLI options,
   instead, the scripts analyzes the command line and
   a) replaces -t|--tree argument with virtual pid __if__ the
      target task lives in another pidns
   b) keeps the current cwd (and root) __if__ switches to another
      mntns. A limitation applies here -- cwd path should be the
      same in target ns, no "smart path mapping" is performed. So
      this script is for now only useful for mntns clones (which
      is our main goal at the moment).

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Looks-good-to: Andrey Vagin <avagin@openvz.org>
This is the case when the in/out files are image cache/proxy sockets.

Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
This patch introduces the --remote option and the necessary code changes to
support it. This leaves user the option to decide if the checkpoint data is to
be stored on disk or sent through the network (through the image-proxy).
The latter forwards the data to the destination node where image-cache
receives it.

The overall communication is performed as follows:
src_node CRIU dump -> (sends images through UNIX sockets) ->      image-proxy
								       |
								       V
dst_node: CRIU restore <- (receives images through UNIX sockets)<- image-cache

Communication between image-proxy and image-cache is done through a single
TCP connection.

Running criu with --remote option is like this:

dst_node# criu image-cache -d --port <port> -o /tmp/image-cache.log
dst_node# criu restore --remote -o /tmp/image-cache.log
src_node# criu image-proxy -d --port <port> --address <dst_node> -o /tmp/image-proxy.log
src_node# criu dump -t <pid> --remote -o /tmp/dump.log

    [ xemul:
here's the list of what should be done with the cache/proxy
in order to have them merged into master.

0. Document the whole thing :)
   Please, add articles for newly introduced actions and options to
   https://criu.org/CLI page.
   Also, it would be good to have an article describing the protocols
   involved.

1. Make the unix sockets reside in work-dir.
   The good thing is that we've get rid of the socket name option :)
   But looking at do_open_remote_image() I see that it fchdir-s to
   image dir before connecting to proxy/cache. Better solution is to
   put the socket into workdir.

   1a. After this the option -D|--images-dir should become optional.
       Provided the --remote is given CRIU should work purely on the
       work-dir and not generate anything in the images-dir.

2. Tune up the image_cache and image_proxy commands to accept the
   --status-fd and --pidfile options.
   Presumably the very cr_daemon() call should be equipped with
   everything that should be done for daemonizing and proxy/cache
   tasks should just call it :)

3. Fix local connections not to generate per-image threads. There
   can be many images and it's not nice to stress the system with
   such amount of threads. Please, look at how criu/uffd.c manages
   multiple descriptors with page-faults using the epoll stuff.

   3a. The accept_remote_image_connections() seem not to work well
       with opts.ps_socket scenario as the former just calls accept()
       on whatever socket is passed there, while the opts.ps_socket
       is already an established socket for data transfer.

4. No strings in protocol. Now the hard-coded "RESTORE_FINISH" string
   (and DUMP_FINISHED one) is used to terminate the communication.
   Need to tune up the protobuf objects to send boolean (or integer)
   EOF sign rather that the string.

5. Check how proxy/cache works with incremental dumps. Looking at the
   skip_remote_bytes() I think that image-cache and -proxy still do not
   work well with stacked pages images. Probably for those we'll need
   the page-server or lazy-pages -like protocol that would request the
   needed regions and receive it back rather than read bytes from
   sockets simply to skip those.

6. Add support for cache/proxy into go-phaul code. I haven't yet finished
   with the prototype, but plan to do it soon, so once the above steps
   are done we'll be able to proceed with this one.

]

Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
The current patch brings the implementation of the image proxy and image cache.
These components are necessary to perform in-memory live migration of processes
using CRIU. The image proxy receives images from CRIU Dump/Pre-Dump (through
UNIX sockets) and forwards them to the image cache (through a TCP socket). The
image cache caches image in memory and sends them to CRIU Restore (through
UNIX sockets) when requested.

Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
To suppress protobuf's warning:
> [libprotobuf WARNING google/protobuf/compiler/parser.cc:546]
> No syntax specified for the proto file: remote-image.proto.
> Please use 'syntax = "proto2";' or 'syntax = "proto3";'
> to specify a syntax version. (Defaulted to proto2 syntax.)

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: rodrigo-bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: rodrigo-bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
I see no need to do dynamic init here.

Cc: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
OK, so we have pr_perror() for cases where errno is set (and it makes
sense to show it), and pr_err() for other errors. A correct function
is to be used, depending on the context.

1. pthread_mutex_*() functions don't set errno, therefore pr_perror()
   should not be used.

2. accept() sets errno => makes sense to use pr_perror().

3. read_header() arguably sets errno => use pr_err().

4. open_proc_rw() already prints an error message, there is no need
   for yet another one.

Cc: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
1. Use xmalloc() where possible.

2. There is no need to print an error message, as xmalloc()
   has already printed it for you.

Cc: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
In those error paths where we don't have errno set,
don't use pr_perror(), use pr_err() instead.

Cc: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When --remote option is specified, read_local_page tries to pread from a
socket, and fails with "Illegal seek" error.
Restore single pread call for regular image files case and introduce
maybe_read_page_img_cache version of maybe_read_page method.

Generally-approved-by: Rodrigo Bruno <rbruno@gsd.inesc-id.pt>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There is no real need to have both.

Signed-off-by: Omri Kramer <omri.kramer@gmail.com>
Singed-off-by: Lior Fisch <fischlior@gmail.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It's simply impossible (yet), so emit a warning.

Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The opts.remote is always false in this code.

Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We have two places to check for parent via page server -- as
a part of _OPEN req and explicit req. Make the latter code
be in-sync with the opening one.

Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Those may not support sendfiles, so use read/write-s instead

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Drop the constants for default cache host/port and page size because
they are not used anywhere.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
1) fix sfle memory leak on get_fle_for_scm error
2) fix gfd open descriptor leak on get_fle_for_scm error
3-6) fix buf memory leak on read and pwrite errors

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
avagin and others added 3 commits September 10, 2019 19:25
Right now we use pushq, but it pushes sign-extended value, so if the
parasite code is placed higher that 2Gb, we will see something like
this:

   0xf7efd5b0:	pushq  $0x23
   0xf7efd5b2:	pushq  $0xfffffffff7efd5b9
=> 0xf7efd5b7:	lretq

Actually we want to push 0xf7efd5b9 instead of 0xfffffffff7efd5b9.

Fixes: checkpoint-restore#398

Cc: Dmitry Safonov <dima@arista.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Acked-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
`pushq` sign-extends the value. Which is a bummer as the label's address
may be higher that 2Gb, which means that the sign-bit will be set.

As it long-jumps with ia32 selector, %r11 can be scratched.
Use %r11 register as a temporary to push the 32-bit address.

Complements: a9a7602 ("arch/x86: push correct eip on the stack
before lretq")
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Before the 5.2 kernel, only fpu_state->fpu_state_64.xsave has to be
64-byte aligned. But staring with the 5.2 kernel, the same is required
for pu_state->fpu_state_ia32.xsave.

The behavior was changed in:
c2ff9e9a3d9d ("x86/fpu: Merge the two code paths in __fpu__restore_sig()")

Signed-off-by: Andrei Vagin <avagin@gmail.com>
criupatchwork pushed a commit that referenced this pull request Mar 21, 2020
Segmentation fault was raised while trying to restore a process with
tty. Coredump file says this is caused by uninitialized tty_mutex:
        (gdb) where
        #0  0x00000000004d7270 in atomic_add_return (i=1, v=0x0) at
        include/common/asm/atomic.h:34
        #1  0x00000000004d7398 in mutex_lock (m=0x0) at
        include/common/lock.h:151
        #2  0x00000000004d840c in __pty_open_ptmx_index (index=3, flags=2,
        cb=0x4dce50 <open_pty>, arg=0x11, path=0x5562e0 "ptmx") at
        criu/tty.c:603
        checkpoint-restore#3  0x00000000004dced8 in pty_create_ptmx_index (dfd=17, index=3,
        flags=2) at criu/tty.c:2384

since init_tty_mutex() is reentrantable, just calling it before
mutex_lock()

Signed-off-by: Deng Guangxing <dengguangxing@huawei.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
criupatchwork pushed a commit that referenced this pull request Mar 21, 2020
CID 190778 (#1 of 1): Read from pointer after free (USE_AFTER_FREE)
7. deref_after_free: Dereferencing freed pointer rop.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
criupatchwork pushed a commit that referenced this pull request May 18, 2020
CID 190778 (#1 of 1): Read from pointer after free (USE_AFTER_FREE)
7. deref_after_free: Dereferencing freed pointer rop.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 302719 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS)

 img_raw_fd(img) is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 302718 (#1 of 1): Double close (USE_AFTER_FREE)
 Calling close(int) closes handle sockfd which has already been closed.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 302715 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS)
 fd is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 302714 (#1 of 1): Resource leak (RESOURCE_LEAK)
 Variable dirnew going out of scope leaks the storage it points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 302712 (#1 of 1): Resource leak (RESOURCE_LEAK)
 Variable build_id going out of scope leaks the storage it points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 302711 (#1 of 1): Logically dead code (DEADCODE)
 Execution cannot reach the expression pr->io_complete inside this statement: if (ret == 0 && pr->io_comp....

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 226486 (#1 of 2): Resource leak (RESOURCE_LEAK)
 Variable mi going out of scope leaks the storage it points to.

CID 226486 (#2 of 2): Resource leak (RESOURCE_LEAK)
 Variable mi going out of scope leaks the storage it points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 226485 (#1 of 3): Resource leak (RESOURCE_LEAK)
 Variable events going out of scope leaks the storage it points to

CID 226485 (#2 of 3): Resource leak (RESOURCE_LEAK)
 Variable events going out of scope leaks the storage it points to

CID 226485 (checkpoint-restore#3 of 3): Resource leak (RESOURCE_LEAK)
 Variable events going out of scope leaks the storage it points to

Also changed epoll_prepare() to check return value of epoll_create()
against '< 0' instead if '== -1' to make coverity happy.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 226484 (#1 of 1): Double close (USE_AFTER_FREE)
 Calling close(int) closes handle fd which has already been closed.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 226483 (#1 of 1): Resource leak (RESOURCE_LEAK)
 Variable p going out of scope leaks the storage it points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 226482 (#1 of 1): Resource leak (RESOURCE_LEAK)
 Variable path going out of scope leaks the storage it points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 226480 (#1 of 1): Copy into fixed size buffer (STRING_OVERFLOW)
 You might overrun the 4096-character fixed-size string root_link.name by copying new->root without checking the length.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 226478 (#1 of 2): Double close (USE_AFTER_FREE)
 Calling close(int) closes handle fd which has already been closed.

CID 226478 (#2 of 2): Double close (USE_AFTER_FREE)
 Calling close(int) closes handle fd which has already been closed.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 226477 (#1 of 1): Resource leak (RESOURCE_LEAK)
 Variable fd_dir going out of scope leaks the storage it points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 192963 (#1 of 1): Improper use of negative value (NEGATIVE_RETURNS)
 dup(sk) is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 192961 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS)
 sockfd is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 178391 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS)
 sk is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 92720 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS)
 pfd is passed to a parameter that cannot be negative.

CID 92747 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS)
 pfd is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 73378 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS)
 sk is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 192968 (#1 of 1): Improper use of negative value (NEGATIVE_RETURNS)
 dup(fd) is passed to a parameter that cannot be negative. [show details]

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 181217 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
 Passing null pointer mntns to mntns_get_root_fd, which dereferences it.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 73358 (#1 of 1): Improper use of negative value (NEGATIVE_RETURNS)
 sk is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
avagin pushed a commit that referenced this pull request Dec 20, 2020
CID 302713 (#1 of 1): Missing varargs init or cleanup (VARARGS)
 va_end was not called for argptr.

Signed-off-by: Adrian Reber <areber@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.