-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LibOS] Update mmapped regions when writing to encrypted files #1523
[LibOS] Update mmapped regions when writing to encrypted files #1523
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status: all files reviewed, 15 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)
-- commits
line 2 at r1:
Why do you only concentrate on encrypted files? I think the same problem happens on chroot
and tmpfs
files. Please double-check these other files, and if the problem is there -- also update their code.
libos/include/libos_vma.h
line 134 at r1 (raw file):
/* Call `read` for file mappings in given range (should be page-aligned) */ int read_range(uintptr_t begin, uintptr_t end);
These names are way too generic, and they are global to the whole codebase.
So let's change to something more specific. I suggest reload_mmaped_from_file_range()
and reload_mmaped_from_file_handle()
libos/src/bookkeep/libos_vma.c
line 516 at r1 (raw file):
if (vma->file) { put_handle(vma->file); (void)__atomic_sub_fetch(&vma->file->num_mmapped, 1, __ATOMIC_RELEASE);
It can be simply __ATOMIC_RELAXED
libos/src/bookkeep/libos_vma.c
line 804 at r1 (raw file):
if (new_vma->file) { get_handle(new_vma->file); (void)__atomic_add_fetch(&new_vma->file->num_mmapped, 1, __ATOMIC_RELEASE);
ditto
libos/src/bookkeep/libos_vma.c
line 1052 at r1 (raw file):
if (new_vma->file) { get_handle(new_vma->file); (void)__atomic_add_fetch(&new_vma->file->num_mmapped, 1, __ATOMIC_RELEASE);
ditto
libos/src/bookkeep/libos_vma.c
line 1368 at r1 (raw file):
struct libos_handle* hdl = arg; if (vma->flags & (VMA_UNMAPPED | VMA_INTERNAL | MAP_ANONYMOUS))
I think you should also add | MAP_PRIVATE
.
IIUC, privately-mapped pages are populated from file only once (on the first mmap), and then they are completely decoupled from a file.
Only shared-mapped pages track the changes in the file. Also, the case we're trying to fix has MAP_SHARED
: https://github.com/haraldh/shmem-test/blob/d7ee6880a2c156bd8e73fd287a3f491962d6a1bf/main.c#L89
libos/src/fs/chroot/encrypted.c
line 445 at r1 (raw file):
if (ret < 0) { unlock(&hdl->inode->lock); goto out;
Because out
now is only one line, I prefer here to simply return ret
libos/src/fs/chroot/encrypted.c
line 459 at r1 (raw file):
ret = read_handle(hdl); out: return ret < 0 ? ret : (ssize_t)actual_count;
Here I would prefer to remove out:
and simply have return (ssize_t)actual_count
libos/test/fs/read_write_mmap.c
line 3 at r1 (raw file):
#include "common.h" static void read_write(const char* file_path) {
Better name read_write_mmap()
libos/test/fs/read_write_mmap.c
line 16 at r1 (raw file):
void* buf_read = alloc_buffer(size); void* m = mmap_fd(file_path, fd, PROT_READ | PROT_WRITE, 0, size);
MAP_SHARED
must be added. UPDATE: It is already implicitly added:
gramine/libos/test/fs/common.c
Line 101 in 10c2668
void* address = mmap(NULL, size, protection, MAP_SHARED, fd, offset); |
libos/test/fs/read_write_mmap.c
line 26 at r1 (raw file):
read_fd(file_path, fd, buf_read, size); printf("read(%s) RW (mmap) OK\n", file_path);
You have the exact same string repeating twice. Maybe add numbers to make strings unique, and check in test_enc.py
with these numbers?
Like this:
...
printf("read(%s) 1 RW (mmap) OK\n", file_path);
...
printf("read(%s) 2 RW (mmap) OK\n", file_path);
libos/test/fs/read_write_mmap.c
line 28 at r1 (raw file):
printf("read(%s) RW (mmap) OK\n", file_path); if (memcmp(m, buf_read, size) != 0) fatal_error("Read data is different from what was written in the mapping\n");
Read data via read()
libos/test/fs/read_write_mmap.c
line 32 at r1 (raw file):
fill_random(buf_write, size); seek_fd(file_path, fd, 0, SEEK_SET); printf("seek(%s) RW (mmap) OK\n", file_path);
ditto
libos/test/fs/read_write_mmap.c
line 40 at r1 (raw file):
printf("read(%s) RW (mmap) OK\n", file_path); if (memcmp(buf_write, buf_read, size) != 0) fatal_error("Read data is different from what was written\n");
Read data via read() is different from what was written via write()
libos/test/fs/read_write_mmap.c
line 42 at r1 (raw file):
fatal_error("Read data is different from what was written\n"); if (memcmp(buf_write, m, size) != 0) fatal_error("Read data in the mapping is different from what was written\n");
... was written via write()
libos/test/fs/test_enc.py
line 108 at r1 (raw file):
self.assertIn('munmap_fd(' + size + ') OK', stdout) self.assertIn('close(' + file_path + ') RW (mmap) OK', stdout)
Why only test_enc.py
? Why not make it generic for all: normal, enc, and tmpfs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 12 files reviewed, 15 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why do you only concentrate on encrypted files? I think the same problem happens on
chroot
andtmpfs
files. Please double-check these other files, and if the problem is there -- also update their code.
Done.
libos/include/libos_vma.h
line 134 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
These names are way too generic, and they are global to the whole codebase.
So let's change to something more specific. I suggest
reload_mmaped_from_file_range()
andreload_mmaped_from_file_handle()
Done.
libos/src/bookkeep/libos_vma.c
line 516 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
It can be simply
__ATOMIC_RELAXED
Done.
libos/src/bookkeep/libos_vma.c
line 804 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
ditto
Done.
libos/src/bookkeep/libos_vma.c
line 1052 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
ditto
Done.
libos/src/bookkeep/libos_vma.c
line 1368 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I think you should also add
| MAP_PRIVATE
.IIUC, privately-mapped pages are populated from file only once (on the first mmap), and then they are completely decoupled from a file.
Only shared-mapped pages track the changes in the file. Also, the case we're trying to fix has
MAP_SHARED
: https://github.com/haraldh/shmem-test/blob/d7ee6880a2c156bd8e73fd287a3f491962d6a1bf/main.c#L89
Correct, fixed.
libos/src/fs/chroot/encrypted.c
line 445 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Because
out
now is only one line, I prefer here to simplyreturn ret
Done.
libos/src/fs/chroot/encrypted.c
line 459 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Here I would prefer to remove
out:
and simply havereturn (ssize_t)actual_count
OK, now I explicitly ignore the return of reload_mmaped_from_file_handle()
.
libos/test/fs/read_write_mmap.c
line 3 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Better name
read_write_mmap()
Done.
libos/test/fs/read_write_mmap.c
line 26 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
You have the exact same string repeating twice. Maybe add numbers to make strings unique, and check in
test_enc.py
with these numbers?Like this:
... printf("read(%s) 1 RW (mmap) OK\n", file_path); ... printf("read(%s) 2 RW (mmap) OK\n", file_path);
Done.
libos/test/fs/read_write_mmap.c
line 28 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Read data via read()
Done.
libos/test/fs/read_write_mmap.c
line 32 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
ditto
Done.
libos/test/fs/read_write_mmap.c
line 40 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Read data via read() is different from what was written via write()
Done.
libos/test/fs/read_write_mmap.c
line 42 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
... was written via write()
Done.
libos/test/fs/test_enc.py
line 108 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why only
test_enc.py
? Why not make it generic for all: normal, enc, and tmpfs?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 12 files reviewed, 15 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "squash! " found in commit messages' one-liners (waiting on @dimakuv)
Previously, kailun-qin (Kailun Qin) wrote…
Done.
One more note is that for chroot
, there's no such issue w/ Gramine-direct while w/ Gramine-SGX, shared memory-mapped files with write permission (e.g., mmap(PROT_WRITE, MAP_SHARED, fd)
) is not allowed. But I updated it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "squash! " found in commit messages' one-liners (waiting on @kailun-qin)
a discussion (no related file):
PR looks good to me. Just to be sure, can you run your newly added test on chroot
, enc
and tmpfs
in direct and sgx modes and make sure that all 6 runs fail without your fix?
Previously, kailun-qin (Kailun Qin) wrote…
One more note is that for
chroot
, there's no such issue w/ Gramine-direct while w/ Gramine-SGX, shared memory-mapped files with write permission (e.g.,mmap(PROT_WRITE, MAP_SHARED, fd)
) is not allowed. But I updated it anyway.
Yes, you're right, that's correct. Thanks for updating it, that's exactly what I wanted.
libos/src/bookkeep/libos_vma.c
line 1385 at r3 (raw file):
} static int read_all(uintptr_t begin, uintptr_t end, struct libos_handle* hdl) {
Please also change this name, to reload_mmaped_from_file_all()
libos/test/fs/test_fs.py
line 99 at r3 (raw file):
# Gramine's implementation of file_map doesn't currently support shared memory-mapped # files with write permission in PAL/Linux-SGX (like mmap(PROT_WRITE, MAP_SHARED, fd)).
files
-> regular chroot files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 12 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
PR looks good to me. Just to be sure, can you run your newly added test on
chroot
,enc
andtmpfs
in direct and sgx modes and make sure that all 6 runs fail without your fix?
Yes (actually 5 tests, 1 skipped), pls see below outputs tested based on commit
gramine-direct:
==================================================== test session starts =====================================================
...
test_enc.py::TC_50_EncryptedFiles::test_111_read_write_mmap FAILED [ 33%]
test_fs.py::TC_00_FileSystem::test_111_read_write_mmap PASSED [ 66%]
test_tmpfs.py::TC_10_Tmpfs::test_111_read_write_mmap FAILED [100%]
...
========================================================== FAILURES ==========================================================
_______________________________________ TC_50_EncryptedFiles.test_111_read_write_mmap ________________________________________
...
---------------------------------------------------- Captured stdout call ----------------------------------------------------
[0.024] Random seed: 1693377998
[0.024] open(tmp/enc_output/test_111) RW (mmap) OK
[0.051] mmap_fd(1048576) OK
[0.109] read(tmp/enc_output/test_111) 1 RW (mmap) OK
[0.135] seek(tmp/enc_output/test_111) 1 RW (mmap) OK
[0.151] write(tmp/enc_output/test_111) RW (mmap) OK
[0.151] seek(tmp/enc_output/test_111) 2 RW (mmap) OK
[0.163] read(tmp/enc_output/test_111) 2 RW (mmap) OK
[0.163] ERROR: Read data in the mapping is different from what was written via write()
____________________________________________ TC_10_Tmpfs.test_111_read_write_mmap ____________________________________________
...
---------------------------------------------------- Captured stdout call ----------------------------------------------------
[0.018] Random seed: 1693377998
[0.018] open(/mnt-tmpfs/test_111) RW (mmap) OK
[0.020] mmap_fd(1048576) OK
[0.046] read(/mnt-tmpfs/test_111) 1 RW (mmap) OK
[0.071] seek(/mnt-tmpfs/test_111) 1 RW (mmap) OK
[0.071] write(/mnt-tmpfs/test_111) RW (mmap) OK
[0.071] seek(/mnt-tmpfs/test_111) 2 RW (mmap) OK
[0.071] read(/mnt-tmpfs/test_111) 2 RW (mmap) OK
[0.072] ERROR: Read data in the mapping is different from what was written via write()
===================================== 2 failed, 1 passed, 58 deselected in 0.87 seconds
gramine-sgx:
==================================================== test session starts =====================================================
...
test_enc.py::TC_50_EncryptedFiles::test_111_read_write_mmap FAILED [ 33%]
test_fs.py::TC_00_FileSystem::test_111_read_write_mmap SKIPPED [ 66%]
test_tmpfs.py::TC_10_Tmpfs::test_111_read_write_mmap FAILED [100%]
========================================================== FAILURES ==========================================================
_______________________________________ TC_50_EncryptedFiles.test_111_read_write_mmap ________________________________________
...
---------------------------------------------------- Captured stdout call ----------------------------------------------------
[0.607] Random seed: 1693378128
[0.607] open(tmp/enc_output/test_111) RW (mmap) OK
[0.633] mmap_fd(1048576) OK
[0.683] read(tmp/enc_output/test_111) 1 RW (mmap) OK
[0.707] seek(tmp/enc_output/test_111) 1 RW (mmap) OK
[0.723] write(tmp/enc_output/test_111) RW (mmap) OK
[0.723] seek(tmp/enc_output/test_111) 2 RW (mmap) OK
[0.732] read(tmp/enc_output/test_111) 2 RW (mmap) OK
[0.732] ERROR: Read data in the mapping is different from what was written via write()
...
____________________________________________ TC_10_Tmpfs.test_111_read_write_mmap ____________________________________________
...
---------------------------------------------------- Captured stdout call ----------------------------------------------------
[0.591] Random seed: 1693378129
[0.591] open(/mnt-tmpfs/test_111) RW (mmap) OK
[0.598] mmap_fd(1048576) OK
[0.623] read(/mnt-tmpfs/test_111) 1 RW (mmap) OK
[0.648] seek(/mnt-tmpfs/test_111) 1 RW (mmap) OK
[0.648] write(/mnt-tmpfs/test_111) RW (mmap) OK
[0.648] seek(/mnt-tmpfs/test_111) 2 RW (mmap) OK
[0.648] read(/mnt-tmpfs/test_111) 2 RW (mmap) OK
[0.648] ERROR: Read data in the mapping is different from what was written via write()
===================================== 2 failed, 1 skipped, 58 deselected in 2.09 seconds =====================================
libos/src/bookkeep/libos_vma.c
line 1385 at r3 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Please also change this name, to
reload_mmaped_from_file_all()
Done.
libos/test/fs/test_fs.py
line 99 at r3 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
files
->regular chroot files
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 12 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
a discussion (no related file):
pls see below outputs tested based on commit
commit 02f7a85, sorry about missing the ID somehow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners
a discussion (no related file):
Previously, kailun-qin (Kailun Qin) wrote…
pls see below outputs tested based on commit
commit 02f7a85, sorry about missing the ID somehow
Awesome, thanks for these logs! Everything looks good.
Hi @kailun-qin @dimakuv what is your plan with regard to this PR? When do you plan to merge it? I am asking because I have the same exact issue and this PR helps :) Thanks a lot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dzobbe Thanks for the information and great to know that it works!
We have at least two requests now :). @mkow: would you mind taking a look when you get a chance? Thanks!
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but probably not this week, I'm sick :(
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners
Thanks @mkow much appreciated |
Hi @mkow this is just a reminder on this PR. If you could have a look, that would be great. Thanks a lot in advance ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 8 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)
libos/include/libos_handle.h
line 173 at r4 (raw file):
We hence simply check on this field to determine whether to perform the updates to make the overhead negligible if none mmapped.
What if someone wrote to the same file using a different handle than it was mapped from?
libos/src/fs/chroot/encrypted.c
line 457 at r4 (raw file):
/* If there are any MAP_SHARED mappings for the file, this will read data from `enc`. */ if (__atomic_load_n(&hdl->num_mmapped, __ATOMIC_ACQUIRE) != 0) (void)reload_mmaped_from_file_handle(hdl);
Shouldn't we fail here? (but probably without updating file position, in case the caller wants to retry? not sure)
The host on SGX can fail this single call, forcing the memory mappings to be outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)
libos/include/libos_handle.h
line 173 at r4 (raw file):
What if someone wrote to the same file using a different handle than it was mapped from?
Good question! Well, I think we can support it, albeit with certain complexities and overheads. Do we really want to do this?
libos/src/fs/chroot/encrypted.c
line 457 at r4 (raw file):
Shouldn't we fail here? (but probably without updating file position, in case the caller wants to retry? not sure)
Not sure +1. But I'm fine w/ checking failure here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)
libos/include/libos_handle.h
line 173 at r4 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
What if someone wrote to the same file using a different handle than it was mapped from?
Good question! Well, I think we can support it, albeit with certain complexities and overheads. Do we really want to do this?
Actually, very good point.
Can't we move this field to the inode
? All the FSes that we care about correctly support inodes now (tmpfs, chroot, encrypted).
Then we can do things like:
static bool vma_filter_needs_read(struct libos_vma* vma, void* arg) {
struct libos_handle* hdl = arg;
if (vma->flags & (VMA_UNMAPPED | VMA_INTERNAL | MAP_ANONYMOUS | MAP_PRIVATE))
return false;
assert(vma->file);
if (hdl && hdl->inode && vma->file->inode && vma->file->inode != hdl->inode)
return false;
...
}
...
/* If there are any MAP_SHARED mappings for the file, this will read data from `enc`. */
if (__atomic_load_n(&hdl->inode->num_mmapped, __ATOMIC_ACQUIRE) != 0)
(void)reload_mmaped_from_file_handle(hdl);
...
This should be a very simple change. And given that hdl->inode
is an immutable field, we don't need to do any additional locking.
@kailun-qin Please check whether it works (I think it should) and update the PR and the test correspondingly (make two sub-tests, one as you have currently, the other one with two opens of the same file).
libos/src/bookkeep/libos_vma.c
line 1410 at r4 (raw file):
ret = file->fs->fs_ops->read(file, (void*)read_begin, read_end - read_begin, (file_off_t*)&vma_info->file_offset);
I just realized that we can be interrupted here (this func can return a benign PAL_ERROR_INTERRUPTED).
I think we should do a while loop like this:
gramine/libos/src/libos_utils.c
Lines 10 to 26 in d052d18
int read_exact(PAL_HANDLE handle, void* buf, size_t size) { | |
size_t read = 0; | |
while (read < size) { | |
size_t tmp_read = size - read; | |
int ret = PalStreamRead(handle, /*offset=*/0, &tmp_read, (char*)buf + read); | |
if (ret < 0) { | |
if (ret == -PAL_ERROR_INTERRUPTED || ret == -PAL_ERROR_TRYAGAIN) { | |
continue; | |
} | |
return pal_to_unix_errno(ret); | |
} else if (tmp_read == 0) { | |
return -ENODATA; | |
} | |
read += tmp_read; | |
} | |
return 0; | |
} |
(Unfortunately can't reuse that read_exact()
function as it doesn't support the offset, so need to write our own loop.)
libos/src/fs/chroot/encrypted.c
line 457 at r4 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
Shouldn't we fail here? (but probably without updating file position, in case the caller wants to retry? not sure)
Not sure +1. But I'm fine w/ checking failure here.
+1. But I'm not sure if we want to restore the file position... After all, the underlying file operation already happened, and there's no way to rewind this operation. So e.g. the underlying file size was already extended, and what do we do with hdl->inode->size
in this case? We can't restore hdl->inode->size
to the before-the-write value, because this will lead to discrepancy between the real on-the-host size and our in-gramine size...
So I currently vote for writing a log_warning()
in the error case here, but keep return actual_count
(i.e., the success of the write function itself).
We can even put a BUG()
here, as I don't think there are legitimate benign cases where this reload could fail. @mkow WDYT?
libos/src/fs/chroot/fs.c
line 325 at r4 (raw file):
/* If there are any MAP_SHARED mappings for the file, this will read data from `hdl`. */ if (__atomic_load_n(&hdl->num_mmapped, __ATOMIC_ACQUIRE) != 0) (void)reload_mmaped_from_file_handle(hdl);
ditto (restore pos
and hdl->inode->size
on error)
libos/src/fs/tmpfs/fs.c
line 272 at r4 (raw file):
/* If there are any MAP_SHARED mappings for the file, this will read data from `hdl`. */ if (__atomic_load_n(&hdl->num_mmapped, __ATOMIC_ACQUIRE) != 0) (void)reload_mmaped_from_file_handle(hdl);
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
libos/include/libos_handle.h
line 173 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Actually, very good point.
Can't we move this field to the
inode
? All the FSes that we care about correctly support inodes now (tmpfs, chroot, encrypted).Then we can do things like:
static bool vma_filter_needs_read(struct libos_vma* vma, void* arg) { struct libos_handle* hdl = arg; if (vma->flags & (VMA_UNMAPPED | VMA_INTERNAL | MAP_ANONYMOUS | MAP_PRIVATE)) return false; assert(vma->file); if (hdl && hdl->inode && vma->file->inode && vma->file->inode != hdl->inode) return false; ... } ... /* If there are any MAP_SHARED mappings for the file, this will read data from `enc`. */ if (__atomic_load_n(&hdl->inode->num_mmapped, __ATOMIC_ACQUIRE) != 0) (void)reload_mmaped_from_file_handle(hdl); ...
This should be a very simple change. And given that
hdl->inode
is an immutable field, we don't need to do any additional locking.@kailun-qin Please check whether it works (I think it should) and update the PR and the test correspondingly (make two sub-tests, one as you have currently, the other one with two opens of the same file).
Thanks for the great suggestion! I was overcomplicating things. Done.
libos/src/bookkeep/libos_vma.c
line 1410 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I just realized that we can be interrupted here (this func can return a benign PAL_ERROR_INTERRUPTED).
I think we should do a while loop like this:
gramine/libos/src/libos_utils.c
Lines 10 to 26 in d052d18
int read_exact(PAL_HANDLE handle, void* buf, size_t size) { size_t read = 0; while (read < size) { size_t tmp_read = size - read; int ret = PalStreamRead(handle, /*offset=*/0, &tmp_read, (char*)buf + read); if (ret < 0) { if (ret == -PAL_ERROR_INTERRUPTED || ret == -PAL_ERROR_TRYAGAIN) { continue; } return pal_to_unix_errno(ret); } else if (tmp_read == 0) { return -ENODATA; } read += tmp_read; } return 0; } (Unfortunately can't reuse that
read_exact()
function as it doesn't support the offset, so need to write our own loop.)
Done.
libos/src/fs/chroot/encrypted.c
line 457 at r4 (raw file):
After all, the underlying file operation already happened, and there's no way to rewind this operation.
I'm do doing a warning log for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r5, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "squash! " and "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)
libos/src/bookkeep/libos_vma.c
line 1414 at r5 (raw file):
size_t to_read = size - read; ret = file->fs->fs_ops->read(file, (void*)(read_begin + read), to_read, (file_off_t*)&vma_info->file_offset);
Wait, this looks like a bug. The last argument is updated by the read()
callback, but that is not what we want. This helper function must not update the current file offset.
I think you need to do smth like this:
size_t size = read_end - read_begin;
size_t read = 0;
file_off_t file_offset = (file_off_t)vma_info->file_offset;
while (read < size) {
size_t to_read = size - read;
ret = file->fs->fs_ops->read(file, (void*)(read_begin + read), to_read,
&file_offset);
...
This way you'll start reading with the vma_info->file_offset
and on each iteration of the while loop, the new helper variable file_offset
will be updated to reflect from what file offset to read the next chunk.
libos/src/bookkeep/libos_vma.c
line 1419 at r5 (raw file):
continue; } goto out;
You also need to translate the PAL error code into the UNIX error code before going out:
ret = pal_to_unix_errno(ret);
goto out;
libos/test/fs/read_write_mmap.c
line 64 at r5 (raw file):
int fd2 = open_output_fd(file_path, /*rdwr=*/true); printf("open(%s) RW fd2 (mmap) OK\n", file_path);
Why do you say (mmap)
? This fd2 is used for write()
.
libos/test/fs/read_write_mmap.c
line 69 at r5 (raw file):
close(fd2); fatal_error("ftruncate fd2\n"); }
Why do you need this "extend to 1MB size" operation on fd2? It will call write()
which will automatically extend the file, so looks this code is redundant?
libos/test/fs/read_write_mmap.c
line 77 at r5 (raw file):
fill_random(buf_write, size); write_fd(file_path, fd2, buf_write, size); printf("write(%s) RW fd2 (mmap) OK\n", file_path);
ditto (why (mmap)
)
libos/test/fs/read_write_mmap.c
line 88 at r5 (raw file):
printf("close(%s) RW fd1 (mmap) OK\n", file_path); close_fd(file_path, fd2); printf("close(%s) RW fd2 (mmap) OK\n", file_path);
ditto (why (mmap)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 8 of 13 files reviewed, 8 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
libos/src/bookkeep/libos_vma.c
line 1414 at r5 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Wait, this looks like a bug. The last argument is updated by the
read()
callback, but that is not what we want. This helper function must not update the current file offset.I think you need to do smth like this:
size_t size = read_end - read_begin; size_t read = 0; file_off_t file_offset = (file_off_t)vma_info->file_offset; while (read < size) { size_t to_read = size - read; ret = file->fs->fs_ops->read(file, (void*)(read_begin + read), to_read, &file_offset); ...
This way you'll start reading with the
vma_info->file_offset
and on each iteration of the while loop, the new helper variablefile_offset
will be updated to reflect from what file offset to read the next chunk.
Good catch, done.
libos/src/bookkeep/libos_vma.c
line 1419 at r5 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
You also need to translate the PAL error code into the UNIX error code before going out:
ret = pal_to_unix_errno(ret); goto out;
It's actually returning unix error codes (the above is actually wrong). Done.
libos/test/fs/read_write_mmap.c
line 64 at r5 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why do you say
(mmap)
? This fd2 is used forwrite()
.
Done.
libos/test/fs/read_write_mmap.c
line 69 at r5 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why do you need this "extend to 1MB size" operation on fd2? It will call
write()
which will automatically extend the file, so looks this code is redundant?
Done.
libos/test/fs/read_write_mmap.c
line 77 at r5 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
ditto (why
(mmap)
)
Done.
libos/test/fs/read_write_mmap.c
line 88 at r5 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
ditto (why
(mmap)
)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r6, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)
libos/src/fs/chroot/encrypted.c
line 457 at r4 (raw file):
We can even put a
BUG()
here, as I don't think there are legitimate benign cases where this reload could fail. @mkow WDYT?
+1, I think this is a good idea. The main problem here is that the untrusted host can force it too fail, but in benign cases this shouldn't happen. And in case of the failure we end up with inconsistent state visible to the app.
libos/include/libos_handle.h
line 173 at r4 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
Thanks for the great suggestion! I was overcomplicating things. Done.
Yeah, moving to inode looks good. Will continue reviewing now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 13 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
libos/src/fs/chroot/encrypted.c
line 457 at r4 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
We can even put a
BUG()
here, as I don't think there are legitimate benign cases where this reload could fail. @mkow WDYT?+1, I think this is a good idea. The main problem here is that the untrusted host can force it too fail, but in benign cases this shouldn't happen. And in case of the failure we end up with inconsistent state visible to the app.
Done.
libos/src/fs/chroot/fs.c
line 325 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
ditto (restore
pos
andhdl->inode->size
on error)
Done.
libos/src/fs/tmpfs/fs.c
line 272 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
ditto
Done.
ae315f3
to
85fd0fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 8 files at r1, 1 of 9 files at r2, 2 of 10 files at r5, 4 of 5 files at r6, 3 of 4 files at r7, all commit messages.
Reviewable status: 12 of 13 files reviewed, 10 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @kailun-qin)
libos/include/libos_vma.h
line 133 at r8 (raw file):
Call
read
This IMO doesn't describe well what it does, the read itself is not the essence, but reading and writing the data to the mapped region? I.e. reloading.
libos/include/libos_vma.h
line 136 at r8 (raw file):
int reload_mmaped_from_file_range(uintptr_t begin, uintptr_t end); /* Call `read` for file mappings of `hdl` */
ditto
libos/src/bookkeep/libos_vma.c
line 516 at r8 (raw file):
if (vma->file) { put_handle(vma->file); (void)__atomic_sub_fetch(&vma->file->inode->num_mmapped, 1, __ATOMIC_RELAXED);
inode
field is optional, I think you should IF on it?
libos/src/bookkeep/libos_vma.c
line 516 at r8 (raw file):
if (vma->file) { put_handle(vma->file); (void)__atomic_sub_fetch(&vma->file->inode->num_mmapped, 1, __ATOMIC_RELAXED);
vma->file
may have already been freed by this line.
libos/src/bookkeep/libos_vma.c
line 516 at r8 (raw file):
if (vma->file) { put_handle(vma->file); (void)__atomic_sub_fetch(&vma->file->inode->num_mmapped, 1, __ATOMIC_RELAXED);
Can we assert here that it doesn't go below 0? (or rather: doesn't underflow)
libos/src/bookkeep/libos_vma.c
line 804 at r8 (raw file):
if (new_vma->file) { get_handle(new_vma->file); (void)__atomic_add_fetch(&new_vma->file->inode->num_mmapped, 1, __ATOMIC_RELAXED);
ditto (inode
)
libos/src/bookkeep/libos_vma.c
line 1052 at r8 (raw file):
if (new_vma->file) { get_handle(new_vma->file); (void)__atomic_add_fetch(&new_vma->file->inode->num_mmapped, 1, __ATOMIC_RELAXED);
ditto (inode
)
libos/src/bookkeep/libos_vma.c
line 1365 at r8 (raw file):
} static bool vma_filter_needs_read(struct libos_vma* vma, void* arg) {
Suggestion:
vma_filter_needs_reload
libos/src/bookkeep/libos_vma.c
line 1454 at r8 (raw file):
} int reload_mmaped_from_file_range(uintptr_t begin, uintptr_t end) {
This looks unused? I guess we can simplify this whole code then?
libos/test/fs/read_write_mmap.c
line 19 at r8 (raw file):
printf("mmap_fd(%zu) OK\n", size); fill_random(m, size); int ret = msync(m, size, MS_SYNC);
Could you explain how does this case actually work with your PR? I could have missed something, but you didn't change anything in msync flow, and writing to memory is not triggering anything, so how is the writeback triggered? Was msync already doing that before the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 of 13 files reviewed, 10 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
libos/include/libos_vma.h
line 133 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Call
read
This IMO doesn't describe well what it does, the read itself is not the essence, but reading and writing the data to the mapped region? I.e. reloading.
Not relevant any more. Done.
libos/include/libos_vma.h
line 136 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto
Done.
libos/src/bookkeep/libos_vma.c
line 516 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
inode
field is optional, I think you should IF on it?
Done.
libos/src/bookkeep/libos_vma.c
line 516 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
vma->file
may have already been freed by this line.
Good catch! Done.
libos/src/bookkeep/libos_vma.c
line 516 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Can we assert here that it doesn't go below 0? (or rather: doesn't underflow)
Done.
libos/src/bookkeep/libos_vma.c
line 804 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto (
inode
)
Done.
libos/src/bookkeep/libos_vma.c
line 1052 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto (
inode
)
Done.
libos/src/bookkeep/libos_vma.c
line 1365 at r8 (raw file):
} static bool vma_filter_needs_read(struct libos_vma* vma, void* arg) {
Done.
libos/src/bookkeep/libos_vma.c
line 1454 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
This looks unused? I guess we can simplify this whole code then?
Done.
libos/test/fs/read_write_mmap.c
line 19 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Could you explain how does this case actually work with your PR? I could have missed something, but you didn't change anything in msync flow, and writing to memory is not triggering anything, so how is the writeback triggered? Was msync already doing that before the PR?
msync()
will eventually go to the wirte()
callback:
gramine/libos/src/fs/libos_fs_util.c
Line 209 in d052d18
ssize_t count = hdl->fs->fs_ops->write(hdl, write_addr, write_size, &pos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r7, 2 of 2 files at r9, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)
a discussion (no related file):
Something goes wrong:
04:08:29 ________________________ TC_01_Bootstrap.test_200_exec _________________________
04:08:29 test_libos.py:201: in test_200_exec
04:08:29 stdout, _ = self.run_binary(['exec'])
04:08:29 ../../../usr/lib/python3.8/site-packages/graminelibos/regression.py:224: in run_binary
04:08:29 _returncode, stdout, stderr = run_command(cmd, timeout=timeout, **kwds)
04:08:29 ../../../usr/lib/python3.8/site-packages/graminelibos/regression.py:162: in run_command
04:08:29 raise subprocess.CalledProcessError(proc.returncode, cmd, raw_stdout, raw_stderr)
04:08:29 E subprocess.CalledProcessError: Command '['/home/jenkins/workspace/graphene-direct-sanitizers/usr/lib/x86_64-linux-gnu/gramine/direct/loader', '/home/jenkins/workspace/graphene-direct-sanitizers/usr/lib/x86_64-linux-gnu/gramine/direct/libpal.so', 'init', 'exec']' returned non-zero exit status 1.
04:08:29 ----------------------------- Captured stderr call -----------------------------
04:08:29 [0.012] [P1:T1:exec_victim] assert failed ../libos/src/bookkeep/libos_vma.c:518 num_mmapped < UINT64_MAX
Please debug and fix.
libos/src/bookkeep/libos_vma.c
line 516 at r8 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
Done.
It would be easier to understand your assert if you would change to __atomic_fetch_sub()
(then you would compare that "the previous value was greater than 0" in the assert).
libos/src/bookkeep/libos_vma.c
line 1454 at r8 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
Done.
For the context: we initially mimicked the msync_all()
functions (see below), but yes, technically reload_mmaped_from_file_range()
was unused. I'm fine with removing it and simplifying a bit, though we lost a nice readability uniformity here.
libos/src/bookkeep/libos_vma.c
line 1425 at r9 (raw file):
if (ret < 0) { ret = pal_to_unix_errno(ret); break;
Why not goto out
? Then you won't need a weird && !ret
in the for loop.
libos/src/bookkeep/libos_vma.c
line 1439 at r9 (raw file):
break; } else if (count == 0) { break;
Why did you remove ret = -ENODATA
?
libos/test/fs/read_write_mmap.c
line 19 at r8 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
msync()
will eventually go to thewirte()
callback:. As we added the reloading logic into them, they should be triggered as expected.gramine/libos/src/fs/libos_fs_util.c
Line 209 in d052d18
ssize_t count = hdl->fs->fs_ops->write(hdl, write_addr, write_size, &pos);
Yep, as @kailun-qin explained. msync()
-> write()
callback -> this PR updates this callback with reload_mmaped_from_file_handle()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 12 of 13 files reviewed, 13 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
libos/src/bookkeep/libos_vma.c
line 516 at r8 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
It would be easier to understand your assert if you would change to
__atomic_fetch_sub()
(then you would compare that "the previous value was greater than 0" in the assert).
Done again.
libos/src/bookkeep/libos_vma.c
line 1425 at r9 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why not
goto out
? Then you won't need a weird&& !ret
in the for loop.
hmm... && !ret
servees for both the break
here (outer loop) and the break
below (inner loop). Note that here when mprotect fails we need to break the outer loop, while below when the read callback fails we need to break the inner loop. These two break
s need to goto different places -- with the first goto free_vma_info_array()
directly while the latter goto restore permission w/ mprotect
(which is still within the outer loop). I just don't want to introduce two gotos, so I used this approach.
libos/src/bookkeep/libos_vma.c
line 1439 at r9 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why did you remove
ret = -ENODATA
?
This will cause tests like test_204_copy_dir_mmap_whole
to fail when reloading on munmap()
. The test copies size
length of data (less than size of one page, say 256 bytes):
gramine/libos/test/fs/copy_mmap_whole.c
Lines 10 to 14 in d052d18
// copy data | |
if (ftruncate(fo, size) != 0) | |
fatal_error("ftruncate(%s, %zu) failed: %s\n", output_path, size, strerror(errno)); | |
printf("ftruncate(%zu) output OK\n", size); | |
memcpy(out, in, size); |
munmap()
: gramine/libos/src/sys/libos_mmap.c
Line 350 in d052d18
length = ALLOC_ALIGN_UP(length); |
-ENODATA
when nothing to read, then munmap()
fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 12 of 13 files reviewed, 13 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Something goes wrong:
04:08:29 ________________________ TC_01_Bootstrap.test_200_exec _________________________ 04:08:29 test_libos.py:201: in test_200_exec 04:08:29 stdout, _ = self.run_binary(['exec']) 04:08:29 ../../../usr/lib/python3.8/site-packages/graminelibos/regression.py:224: in run_binary 04:08:29 _returncode, stdout, stderr = run_command(cmd, timeout=timeout, **kwds) 04:08:29 ../../../usr/lib/python3.8/site-packages/graminelibos/regression.py:162: in run_command 04:08:29 raise subprocess.CalledProcessError(proc.returncode, cmd, raw_stdout, raw_stderr) 04:08:29 E subprocess.CalledProcessError: Command '['/home/jenkins/workspace/graphene-direct-sanitizers/usr/lib/x86_64-linux-gnu/gramine/direct/loader', '/home/jenkins/workspace/graphene-direct-sanitizers/usr/lib/x86_64-linux-gnu/gramine/direct/libpal.so', 'init', 'exec']' returned non-zero exit status 1. 04:08:29 ----------------------------- Captured stderr call ----------------------------- 04:08:29 [0.012] [P1:T1:exec_victim] assert failed ../libos/src/bookkeep/libos_vma.c:518 num_mmapped < UINT64_MAX
Please debug and fix.
Trying to fix by adding the missing ref counter increase on split_vma()
, let's give it another try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r10, all commit messages.
Reviewable status: all files reviewed, 15 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)
libos/src/bookkeep/libos_vma.c
line 1425 at r9 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
hmm...
&& !ret
servees for both thebreak
here (outer loop) and thebreak
below (inner loop). Note that here when mprotect fails we need to break the outer loop, while below when the read callback fails we need to break the inner loop. These twobreak
s need to goto different places -- with the first gotofree_vma_info_array()
directly while the latter goto restore permission w/mprotect
(which is still within the outer loop). I just don't want to introduce two gotos, so I used this approach.
I'm sorry, but this is unreadable to me. Could you please split the body of this while loop (i.e., reloading a single VMA) into a helper static function? In ~2 years, when somebody will need to modify this code, they will definitely introduce a bug here...
libos/src/bookkeep/libos_vma.c
line 1439 at r9 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
This will cause tests like
test_204_copy_dir_mmap_whole
to fail when reloading onmunmap()
. The test copiessize
length of data (less than size of one page, say 256 bytes):wheras we align up to page-aligned (4K) ongramine/libos/test/fs/copy_mmap_whole.c
Lines 10 to 14 in d052d18
// copy data if (ftruncate(fo, size) != 0) fatal_error("ftruncate(%s, %zu) failed: %s\n", output_path, size, strerror(errno)); printf("ftruncate(%zu) output OK\n", size); memcpy(out, in, size); munmap()
:. If we returngramine/libos/src/sys/libos_mmap.c
Line 350 in d052d18
length = ALLOC_ALIGN_UP(length); -ENODATA
when nothing to read, thenmunmap()
fails.
Thanks, it makes sense. We previously copy-pasted that code from read_exact()
, but you're right, here we don't read the exact size
-- it can be less than that.
libos/src/bookkeep/libos_vma.c
line 214 at r10 (raw file):
if (new_vma->file) { if (new_vma->file->inode) (void)__atomic_add_fetch(&new_vma->file->inode->num_mmapped, 1, __ATOMIC_RELAXED);
Woudn't it make sense to move this inside copy_vma()
?
libos/src/bookkeep/libos_vma.c
line 1441 at r10 (raw file):
break; } else if (count == 0) { break;
Ouch, @kailun-qin's comment made me realize that we have totally redundant paths like this:
munmap()
->msync_all()
->fs_ops->msync()
->fs_ops->write()
->reload_mmaped_from_file_handle()
mmap(MAP_FIXED_NOREPLACE)
->msync_all()
->fs_ops->msync()
->fs_ops->write()
->reload_mmaped_from_file_handle()
msync()
-> ->msync_all()
->fs_ops->msync()
->fs_ops->write()
->reload_mmaped_from_file_handle()
In other words, when VMA contents are being msynced, it calls the write
callback which in turn blindly reloads the VMA contents. Which is totally bogus.
@kailun-qin What if we add some boolean flag to struct libos_vma_info
, and we set this flag inside msync_all()
and then unset it? And we'll check this flag here -- if it is set, then we skip all this reloading logic for the VMA.
Otherwise we are doomed to have these meaningless read/write operations on each munmap, which is a frequent op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r9, 1 of 1 files at r10, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @kailun-qin)
libos/include/libos_vma.h
line 133 at r10 (raw file):
int msync_handle(struct libos_handle* hdl); /* Call `reload` for file mappings of `hdl` */
There's no "reload" function or anything like that, so it shouldn't be backticked. Just use it as an English verb.
Suggestion:
/* Reload file mappings of `hdl` */
libos/src/bookkeep/libos_vma.c
line 1441 at r10 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Ouch, @kailun-qin's comment made me realize that we have totally redundant paths like this:
munmap()
->msync_all()
->fs_ops->msync()
->fs_ops->write()
->reload_mmaped_from_file_handle()
mmap(MAP_FIXED_NOREPLACE)
->msync_all()
->fs_ops->msync()
->fs_ops->write()
->reload_mmaped_from_file_handle()
msync()
-> ->msync_all()
->fs_ops->msync()
->fs_ops->write()
->reload_mmaped_from_file_handle()
In other words, when VMA contents are being msynced, it calls the
write
callback which in turn blindly reloads the VMA contents. Which is totally bogus.@kailun-qin What if we add some boolean flag to
struct libos_vma_info
, and we set this flag insidemsync_all()
and then unset it? And we'll check this flag here -- if it is set, then we skip all this reloading logic for the VMA.Otherwise we are doomed to have these meaningless read/write operations on each munmap, which is a frequent op.
But this idea doesn't seem to be correct against race conditions? What in the meantime someone does something which actually should reload a mapping here? The msync-in-progress will turn the reloading into a no-op?
libos/test/fs/read_write_mmap.c
line 19 at r8 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Yep, as @kailun-qin explained.
msync()
->write()
callback -> this PR updates this callback withreload_mmaped_from_file_handle()
.
I see, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 of 13 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
libos/include/libos_vma.h
line 133 at r10 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
There's no "reload" function or anything like that, so it shouldn't be backticked. Just use it as an English verb.
Done.
libos/src/bookkeep/libos_vma.c
line 1425 at r9 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I'm sorry, but this is unreadable to me. Could you please split the body of this while loop (i.e., reloading a single VMA) into a helper static function? In ~2 years, when somebody will need to modify this code, they will definitely introduce a bug here...
Done.
libos/src/bookkeep/libos_vma.c
line 214 at r10 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Woudn't it make sense to move this inside
copy_vma()
?
Done.
libos/src/bookkeep/libos_vma.c
line 1441 at r10 (raw file):
But this idea doesn't seem to be correct against race conditions? What in the meantime someone does something which actually should reload a mapping here?
Yes, it can be racy.
The msync-in-progress will turn the reloading into a no-op?
Yes, I think this was what @dimakuv would like to propose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r11, 1 of 1 files at r12, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)
a discussion (no related file):
Previously, kailun-qin (Kailun Qin) wrote…
Trying to fix by adding the missing ref counter increase on
split_vma()
, let's give it another try.
Looks like problem solved, thanks.
libos/src/bookkeep/libos_vma.c
line 1441 at r10 (raw file):
But this idea doesn't seem to be correct against race conditions? What in the meantime someone does something which actually should reload a mapping here? The msync-in-progress will turn the reloading into a no-op?
Yes, there is a possibility of data races. We even discussed it very long time ago with Pawel (when he implemented msync()
, which structure Kailun took for this "reload VMAs"). Here's a relevant link: #550 (review)
Note that:
- We already have possibility for data races in this func anyway, see the NOTE in the code.
- It's impossible to take a global VMA lock here, because this lock is very widely used (e.g., if a
fs_ops->read()
callback wants to malloc, it will probably land into the VMA code and will try to re-acquire the VMA lock). In other words, we can't blindly protect with a global VMA lock because we'll have deadlocks due to nested locking. - An argument that it is fine: this
reload_mmaped_from_file_handle()
is always called from an operation on an active VMA (e.g.write()
ormunmap()
syscalls). So at least the current VMA will not be freed/remaped while we're in this func. There is a possibility that another thread of the app will suddenly do something with other VMAs that are backed by this file handle, but it means a buggy application.
Otherwise we are doomed to have these meaningless read/write operations on each munmap, which is a frequent op.
On a second thought, this munmap-of-file-backed-region shouldn't be a frequent operation. Typically apps load e.g. a binary image inside process memory and just keep it there forever, or load e.g. a database file and keep it there also forever. So munmap should be rare actually.
Moreover, if we imagine the same file mmapped twice (I don't know why anyone would do it), then the current munmap logic starts to make sense -- yes, the first mmapped region (the one that is being munmapped) will be redundantly reloaded, but the second mmapped region will correct be reloaded. So this looks reasonable after all.
@kailun-qin Can we add a NOTE
comment describing this inefficiency?
libos/src/bookkeep/libos_vma.c
line 1429 at r12 (raw file):
goto out; } else if (count == 0) { goto out;
Aren't you supposed to break
here? Or at least specify ret = 0
before.
Otherwise looks like you have undefined value of ret
in this case.
Also, you could add a quick comment for this case, that it's possible that the underlying file contents do not cover the whole VMA region, and thus count == 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
libos/src/bookkeep/libos_vma.c
line 1441 at r10 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
But this idea doesn't seem to be correct against race conditions? What in the meantime someone does something which actually should reload a mapping here? The msync-in-progress will turn the reloading into a no-op?
Yes, there is a possibility of data races. We even discussed it very long time ago with Pawel (when he implemented
msync()
, which structure Kailun took for this "reload VMAs"). Here's a relevant link: #550 (review)Note that:
- We already have possibility for data races in this func anyway, see the NOTE in the code.
- It's impossible to take a global VMA lock here, because this lock is very widely used (e.g., if a
fs_ops->read()
callback wants to malloc, it will probably land into the VMA code and will try to re-acquire the VMA lock). In other words, we can't blindly protect with a global VMA lock because we'll have deadlocks due to nested locking.- An argument that it is fine: this
reload_mmaped_from_file_handle()
is always called from an operation on an active VMA (e.g.write()
ormunmap()
syscalls). So at least the current VMA will not be freed/remaped while we're in this func. There is a possibility that another thread of the app will suddenly do something with other VMAs that are backed by this file handle, but it means a buggy application.
Otherwise we are doomed to have these meaningless read/write operations on each munmap, which is a frequent op.
On a second thought, this munmap-of-file-backed-region shouldn't be a frequent operation. Typically apps load e.g. a binary image inside process memory and just keep it there forever, or load e.g. a database file and keep it there also forever. So munmap should be rare actually.
Moreover, if we imagine the same file mmapped twice (I don't know why anyone would do it), then the current munmap logic starts to make sense -- yes, the first mmapped region (the one that is being munmapped) will be redundantly reloaded, but the second mmapped region will correct be reloaded. So this looks reasonable after all.
@kailun-qin Can we add a
NOTE
comment describing this inefficiency?
Done.
libos/src/bookkeep/libos_vma.c
line 1429 at r12 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Aren't you supposed to
break
here? Or at least specifyret = 0
before.Otherwise looks like you have undefined value of
ret
in this case.Also, you could add a quick comment for this case, that it's possible that the underlying file contents do not cover the whole VMA region, and thus
count == 0
.
Done.
342cd13
to
fd57351
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r13, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r11, 1 of 1 files at r13, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)
libos/src/bookkeep/libos_vma.c
line 1380 at r13 (raw file):
return false; assert(vma->file);
Actually, why is this true? Aren't we iterating over all VMAs, including the ones without a backing file?
libos/src/bookkeep/libos_vma.c
line 1382 at r13 (raw file):
assert(vma->file); if (hdl && hdl->inode && vma->file->inode && vma->file->inode != hdl->inode)
Doesn't this check work a bit unexpectedly? I think that now, if you pass a handle without a file (hdl->inode == NULL
), which is clearly expected/allowed by this function, it will reload all the mappings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)
libos/src/bookkeep/libos_vma.c
line 1380 at r13 (raw file):
Aren't we iterating over all VMAs, including the ones without a backing file?
For those VMAs w/o a backing file, shouldn't they contain MAP_ANONYMOUS
and would return earlier? (Or am I missing sth?)
libos/src/bookkeep/libos_vma.c
line 1382 at r13 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Doesn't this check work a bit unexpectedly? I think that now, if you pass a handle without a file (
hdl->inode == NULL
), which is clearly expected/allowed by this function, it will reload all the mappings?
But reload is invoked from the write
callback where it's actually guaranteed to have hdl->inode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)
libos/src/bookkeep/libos_vma.c
line 1380 at r13 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
Aren't we iterating over all VMAs, including the ones without a backing file?
For those VMAs w/o a backing file, shouldn't they contain
MAP_ANONYMOUS
and would return earlier? (Or am I missing sth?)
Ah, right.
libos/src/bookkeep/libos_vma.c
line 1382 at r13 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
But reload is invoked from the
write
callback where it's actually guaranteed to havehdl->inode
?
Ok, but then why checking it in runtime? It looks like this was a valid and possible case. If it's impossible then it should be an assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 12 of 13 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
libos/src/bookkeep/libos_vma.c
line 1382 at r13 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Ok, but then why checking it in runtime? It looks like this was a valid and possible case. If it's impossible then it should be an assert.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r14, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)
libos/src/bookkeep/libos_vma.c
line 1382 at r13 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
Done.
@kailun-qin I think it's still a bit misleading.
I agree that hdl->inode
must always exist (because this func is ultimately invoked from the write()
callbacks that indeed have hdl->inode
). The assert assert(hdl && hdl->inode)
is fine with me.
However, I don't like the assert on assert(vma->file->inode)
. In this code, we're just browsing through all file-backed VMA mappings, and since our documentation says that libos_handle::inode
can be optional -- even though I think that's not true for files that can be mmapped -- we must not blindly assert on vma->file->inode
. This will also be similar to how you check vma->file->inode
in other places in this PR.
So in all, that's my suggestion:
struct libos_handle* hdl = arg;
assert(hdl && hdl->inode); /* guaranteed to have inode because invoked from `write` callback */
if (vma->flags & (VMA_UNMAPPED | VMA_INTERNAL | MAP_ANONYMOUS | MAP_PRIVATE))
return false;
assert(vma->file); /* check above filtered out non-file-backed mappings */
if (!vma->file->inode || vma->file->inode != hdl->inode)
return false;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 12 of 13 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
libos/src/bookkeep/libos_vma.c
line 1382 at r13 (raw file):
Thanks for the suggestion!
even though I think that's not true for files that can be mmapped
Yes, this was my previous assumption.
since our documentation says that
libos_handle::inode
can be optional, ...,we must not blindly assert onvma->file->inode
Yes, I'm updating to the proposed version -- which makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r15, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r15, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners
fc9f497
to
2426d53
Compare
Previously, Gramine didn't propagate changes to a mmaped file back to memory. This is unexpected for apps that use both mmap and read/write on the same file (e.g., etcd-io/bbolt has such implementation to write data via `write()` and read it via the mmaped memory). This commit introduces the reloading of affected mmaped regions on `chroot_encrypted`, `chroot` and `tmpfs` file writes. To reduce performance penalty in normal cases (where apps use either `write()` or `mmap()`), this commit also creates a new field in the inode to record the number of mmaped regions of this file. We hence simply check on this field to determine whether to perform the reloading to make the overhead negligible if none mmaped. Signed-off-by: Kailun Qin <kailun.qin@intel.com>
2426d53
to
1ea3e60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r16, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: Intel)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r16, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved
Description of the changes
Previously, Gramine didn't propagate changes to a mmaped encrypted file back to the memory. This is unexpected for apps that use both mmap and read/write on the same file (e.g., etcd-io/bbolt has such implementation to write data via
write()
and read it via the mmaped memory).This commit introduces overwriting of affected mmapped regions on encrypted file writes. To reduce performance penalty in normal cases (where apps use either
write()
ormmap()
), this commit also creates a new field in the file handle to record the number of mmaped regions of this file. We hence simply check on this field to determine whether to perform the updates to make the overhead negligible if none mmapped.Fixes #1432 and #1591.
How to test this PR?
Newly introduced test + CI.
This change is