Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LibOS] Protect file descriptors and mmap'ed memory areas during checkpointing #1601

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

stefanberger
Copy link
Contributor

@stefanberger stefanberger commented Oct 11, 2023

Description of the changes

3 Test cases are provided that close file descriptors and munmap and change protectiong flags of mmap'ed memory areas during a concurrently running fork's checkpointing stage. To protect file descriptors and mmap'ed memory during the checkpointing a read-writer lock is introduced of which the writer lock is grabbed during the checkpointing and the reader lock is grabbed during close() and munmap() and mprotect() operations. This prevents freeing of file descriptors and unmapping and changing of protection flags of mmap'ed memory areas during the checkpointing. All test cases fail for as long as the fix patches are not applied to libos.

Fixes: #1156

How to test this PR?

cd gramine/lib/test/regression
gramine-test build
gramine-direct fork_and_close
gramine-direct fork_and_munmap
gramine-direct fork_and_mprotect

This change is Reviewable

@stefanberger stefanberger force-pushed the stefanberger/checkpoint_lock.v2 branch 4 times, most recently from 0548ef1 to d1efced Compare October 17, 2023 12:32
@stefanberger stefanberger force-pushed the stefanberger/checkpoint_lock.v2 branch 2 times, most recently from a33e11a to d121287 Compare January 20, 2024 02:41
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Checkpointing may fail if it assumes memory protection flags that
have not been set on the memory, yet. Therefore, make the code
sequence of

  bkeep_mprotect()

  ...

  PalVirtualMemoryProtect()

atomic so that tracked memory protection flags have also been applied to
the memory.

Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
This test case will occasionally show the following error:

> while :; do gramine-direct fork_and_mmap; done
...
[P2:T3:fork_and_mmap] error: Sending IPC process-exit notification \
   failed: Connection reset by peer (ECONNRESET)

Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
When mmap() is run concurrently to a fork() & checkpoint, mmap may add
not-yet-mapped memory areas to the memory mapped areas of a process.
If the checkpoint code wants to transfer them to the destination then
this leads to memory access errors. Therefore, make the addition of
the memory area and the actual memory mapping atomic relative to the
checkpointing.

In particular, make code sequences like the following one atomic
relative to checkpointing:

  bkeep_mmap_xyz()

  ...

  PalVirtualMemoryAlloc()

Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
@stefanberger stefanberger force-pushed the stefanberger/checkpoint_lock.v2 branch from d121287 to 1acd0fb Compare February 27, 2024 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[LibOS] fork in multi-threaded app may fail
1 participant