-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LibOS] Add flock
syscall
#1416
Conversation
45e4953
to
77d6627
Compare
77d6627
to
7a9d4dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
libos/include/libos_fs_lock.h
line 142 at r1 (raw file):
* subsequent behavior is undefined). */ bool has_flock_locks(struct libos_dentry* dent);
In review of #1212, @mkow raised a point whether it's ok to have UB when fcntl and flock locks are mixed. Linux actually allows the mix, by completely separating these two types of locks (i.e., one type of locks doesn't interact with the other type of locks in any way -- which is a split-brain behavior, but that's what Linux does).
If we'll go with the weird Linux behavior, it will be pretty simple (all conflict-finding logic should just have a clear split: iterate only on fcntl locks or iterate only on flock locks). Then we can remove this has_flock_locks()
function and we'll also be bug-for-bug compatible with Linux.
It will also probably require a new LibOS test that mixes the two locks.
I'm not a big fan of introducing the weird Linux behavior... But what we currently have is also pretty bad -- we just UB in such case, with confusing semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 13 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)
a discussion (no related file):
I found a bug in this code, currently fixing it...
For info, I see this bogus output in fcntl_lock
test:
1: fcntl(fd, F_SETLK, {F_UNLCK, SEEK_SET, 5, 10}) = 0
file_lock: fcntl (POSIX): pid=1: r[15..19]
file_lock: flock (BSD): handle id=14757395258967641292: r[25..39]
file_lock: fcntl (POSIX): pid=1: r[1000..end]
I.e., after unsetting the lock, I somehow get a stray BSD lock. This is clearly wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 13 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I found a bug in this code, currently fixing it...
For info, I see this bogus output in
fcntl_lock
test:1: fcntl(fd, F_SETLK, {F_UNLCK, SEEK_SET, 5, 10}) = 0 file_lock: fcntl (POSIX): pid=1: r[15..19] file_lock: flock (BSD): handle id=14757395258967641292: r[25..39] file_lock: fcntl (POSIX): pid=1: r[1000..end]
I.e., after unsetting the lock, I somehow get a stray BSD lock. This is clearly wrong.
Done, fixed (see the first fixup commit).
-- commits
line 2 at r2:
My bad, I forgot [LibOS]
here. Will do at final rebase.
ba48485
to
649fd3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 13 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners
libos/include/libos_fs_lock.h
line 142 at r1 (raw file):
I'm not a big fan of introducing the weird Linux behavior... But what we currently have is also pretty bad -- we just UB in such case, with confusing semantics.
By the way, another simple idea is to disallow POSIX and BSD locks to be used in one app. Basically, introduce two global booleans, one for POSIX and one for BSD, and make sure they are never true together. This will be a very quick fix to the problem, while we could work on the proper mixing of POSIX plus BSD locks in a future PR.
d099391
to
ece36fe
Compare
649fd3b
to
fcc5937
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
My bad, I forgot
[LibOS]
here. Will do at final rebase.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 13 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)
struct libos_file_lock::handle_id
that serves to distinguish flock locks from fcntl locks.
This is quite obscure. How would the code look like if we had an enum LOCK_TYPE
+ a union with the lock-type-specific fields? The checks between the locks needs IFs anyways.
How much logic will be shared between these two types, if we make them independent? I see that Linux also shares the code between them, but has an explicit flag for the lock type.
libos/include/libos_fs_lock.h
line 142 at r1 (raw file):
If we'll go with the weird Linux behavior, it will be pretty simple (all conflict-finding logic should just have a clear split: iterate only on fcntl locks or iterate only on flock locks).
I'd go with this if it's actually simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 13 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
How much logic will be shared between these two types, if we make them independent? I see that Linux also shares the code between them, but has an explicit flag for the lock type.
Yes, the implementation was actually inspired by Linux.
Let me make it more explicit with enum LOCK_TYPE
. I can also use a union, though Linux doesn't do it. In case anyone wonders how it is implemented in Linux, start here: https://elixir.bootlin.com/linux/v6.4/source/include/linux/filelock.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 13 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
How much logic will be shared between these two types, if we make them independent? I see that Linux also shares the code between them, but has an explicit flag for the lock type.
Yes, the implementation was actually inspired by Linux.
Let me make it more explicit with
enum LOCK_TYPE
. I can also use a union, though Linux doesn't do it. In case anyone wonders how it is implemented in Linux, start here: https://elixir.bootlin.com/linux/v6.4/source/include/linux/filelock.h
UPDATE: I tried with a union, but it touches a lot of unrelated code. I will probably create a branch or a DONTMERGE PR with this change alone, just to show how ugly it looks.
So for this PR, I will not introduce a union.
Also, because type
is already used for RLCK
, WLCK
, UNLCK
, I am going with the word family
(i.e. enum LOCK_FAMILY
). This is currently the best I can come up with, without modifying 80% of the file-lock code. In the future, I plan to have a "rename" PR that will reshape all objects in file-locking code.
libos/src/sys/libos_fcntl.c
line 63 at r4 (raw file):
* positive overflow. */ static int flock_to_file_lock(struct flock* fl, struct libos_handle* hdl, uint64_t handle_id,
I should remove this change -- this func will be used purely for POSIX locks.
libos/src/sys/libos_fcntl.c
line 318 at r4 (raw file):
struct libos_file_lock file_lock; ret = flock_to_file_lock(&fl, hdl, hdl->id, &file_lock);
I will remove call to this function -- I can just create file_lock
object directly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 13 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)
-- commits
line 10 at r4:
Done (family
field).
How much logic will be shared between these two types, if we make them independent?
Looks like I didn't answer this question before. A lot of logic is shared between the two types, I would say around 80% of the code is common for POSIX and BSD locks.
-- commits
line 12 at r4:
TODO: modify the commit message to mention:
- flock locks and fcntl locks are distinguished via
struct libos_file_lock::family
- currently flock and fcntl locks cannot be mixed -- Gramine will loudly fail if it detects such attempts
libos/src/sys/libos_fcntl.c
line 63 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I should remove this change -- this func will be used purely for POSIX locks.
Done
libos/src/sys/libos_fcntl.c
line 318 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I will remove call to this function -- I can just create
file_lock
object directly here.
Done
Previously, mkow (Michał Kowalczyk) wrote…
So conforming to the Linux behavior turned out to be not so simple. The main obstacle is the fact that "list of locks" is shared between POSIX and BSD locks, and this list is supposed to be ordered in a special way (for POSIX locks). See:
This is not that hard to fix, there are two ways:
Given the above, I currently decided to go in another direction: disallow POSIX and BSD locks to be used in the same app. This is much easier to implement, and doesn't regress Gramine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 13 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)
libos/include/libos_fs_lock.h
line 142 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
So conforming to the Linux behavior turned out to be not so simple. The main obstacle is the fact that "list of locks" is shared between POSIX and BSD locks, and this list is supposed to be ordered in a special way (for POSIX locks). See:
- https://github.com/gramineproject/gramine/blob/fc242aa79dd3ca30dbd6274367b8921fa875b6fc/libos/src/fs/libos_fs_lock.c#L53-L55?plain=1
- https://github.com/gramineproject/gramine/blob/fc242aa79dd3ca30dbd6274367b8921fa875b6fc/libos/src/fs/libos_fs_lock.c#L225?plain=1 (this whole func assumes the ordering)
This is not that hard to fix, there are two ways:
- Keep this single "list of locks", and just modify the POSIX ordering code to "jump over" BSD locks. Sounds a bit ugly.
- Split "list of locks" into two lists, one for POSIX locks (and ordered) and one for BSD locks (doesn't need to be ordered). Requires a lot of changes.
Given the above, I currently decided to go in another direction: disallow POSIX and BSD locks to be used in the same app. This is much easier to implement, and doesn't regress Gramine.
Done. This also lets us get rid of has_flock_locks()
func.
libos/include/libos_handle.h
line 152 at r6 (raw file):
* tracking of handle FDs. On the other hand, Gramine anyway doesn't support the * "daemonization" case described above, so the assumption of "last" FD works fine. */
FYI: This problem is a bit similar to "re-parenting":
- IPC leader does not wait for all processes to finish #21
- [LibOS] Allow ipc worker thread of leader process to exit after the other processes #1168
The thing is: we don't have a "master" process (like an init
process) that would collect info from all processes and make decisions based on this system-wide knowledge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r6.
Reviewable status: 1 of 13 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
libos/include/libos_fs_lock.h
line 36 at r6 (raw file):
* - The main process has to be able to look up the same file, so locking will not work for files in * local-process-only filesystems (tmpfs). * - There is no deadlock detection (EDEADLK).
This is only for fcntl-locks, flock doesn't have such thing.
libos/include/libos_fs_lock.h
line 79 at r6 (raw file):
remove all locks held by
All BSD locks? Or also POSIX?
(also for the sentence above)
libos/include/libos_fs_lock.h
line 86 at r6 (raw file):
* - For POSIX (fcntl) locks, for the given PID and range, replace the existing locks held by the * given PID for that range. * - for BSD (flock) locks, replace the existing locks held by the given handle ID.
ditto
libos/include/libos_fs_lock.h
line 109 at r6 (raw file):
* If the lock could be placed, `out_file_lock->type` is set to `F_UNLCK`. Otherwise, * `out_file_lock` fields (`type`, `start, `end`, `pid`, `handle_id`) are set to details of a * conflicting lock.
It was like this before, but how does this work? It sounds inherently racy - this function only checks whether you can place the lock, but what prevents the state from changing between the return form here and the setting of the lock?
libos/src/fs/libos_fs_lock.c
line 20 at r6 (raw file):
/* Used to disallow mixing of POSIX and BSD locks. Protected by `g_fs_lock_lock`. */ static bool g_file_lock_posix_used = false; static bool g_file_lock_flock_used = false;
Oh, so this is global, not per-file? Why?
Also, does it detect the situation where process A creates an flock, and process B creates an fcntl lock?
libos/src/fs/libos_fs_lock.c
line 185 at r6 (raw file):
otherwise it may exhibit unexpected behavior
This is outdated, now we fail loudly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 13 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
libos/include/libos_fs_lock.h
line 36 at r6 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
This is only for fcntl-locks, flock doesn't have such thing.
Done.
libos/include/libos_fs_lock.h
line 79 at r6 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
remove all locks held by
All BSD locks? Or also POSIX?
(also for the sentence above)
Done. POSIX and BSD locks cannot (currently) coexist in the same Gramine instance, so I thought it didn't need an explicit mention. But Ok.
libos/include/libos_fs_lock.h
line 86 at r6 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto
Done.
libos/include/libos_fs_lock.h
line 109 at r6 (raw file):
This method is inherently racy, yes.
Please read this snippet from man 2 fcntl
:
**F\_GETLK** (_struct flock \*_) On input to this call, _lock_ describes a lock we would like to place on the file. If the lock could be placed, **fcntl**() does not actually place it, but returns **F\_UNLCK** in the _l\_type_ field of _lock_ and leaves the other fields of the structure unchanged. If one or more incompatible locks would prevent this lock being placed, then **fcntl**() returns details about one of those locks in the _l\_type_, _l\_whence_, _l\_start_, and _l\_len_ fields of _lock_. If the conflicting lock is a traditional (process-associated) record lock, then the _l\_pid_ field is set to the PID of the process holding that lock. If the conflicting lock is an open file description lock, then _l\_pid_ is set to -1. **Note that the returned information may already be out of date by the time the caller inspects it.**
Note the very last sentence. We adhere to the same semantics.
libos/src/fs/libos_fs_lock.c
line 20 at r6 (raw file):
Oh, so this is global, not per-file? Why?
I can do it per-file (per dentry). Ok, actually, let me just do it.
Also, does it detect the situation where process A creates an flock, and process B creates an fcntl lock?
Yes. Recall that all actual file-locking operations are done in the master (leader) process. And these variables are checked in the function file_lock_set_or_add_request()
which is only called in the master process. So even if process A (leader) creates an flock and process B (child) creates an fcntl lock, the fcntl lock request will be sent to process A and processed in process A anyway.
I will add a comment and an additional test case for this.
libos/src/fs/libos_fs_lock.c
line 185 at r6 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
otherwise it may exhibit unexpected behavior
This is outdated, now we fail loudly.
Done.
c23cfca
to
520a019
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 13 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
TODO: modify the commit message to mention:
- flock locks and fcntl locks are distinguished via
struct libos_file_lock::family
- currently flock and fcntl locks cannot be mixed -- Gramine will loudly fail if it detects such attempts
...cannot be mixed on the same file
libos/src/fs/libos_fs_lock.c
line 20 at r6 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Oh, so this is global, not per-file? Why?
I can do it per-file (per dentry). Ok, actually, let me just do it.
Also, does it detect the situation where process A creates an flock, and process B creates an fcntl lock?
Yes. Recall that all actual file-locking operations are done in the master (leader) process. And these variables are checked in the function
file_lock_set_or_add_request()
which is only called in the master process. So even if process A (leader) creates an flock and process B (child) creates an fcntl lock, the fcntl lock request will be sent to process A and processed in process A anyway.I will add a comment and an additional test case for this.
Done 2x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 6 files at r5, 2 of 5 files at r6, 2 of 4 files at r7.
Reviewable status: 7 of 13 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
libos/include/libos_fs_lock.h
line 79 at r6 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Done. POSIX and BSD locks cannot (currently) coexist in the same Gramine instance, so I thought it didn't need an explicit mention. But Ok.
Hmm, but in the discussion above you said "...cannot be mixed on the same file"?
libos/include/libos_fs_lock.h
line 109 at r6 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
This method is inherently racy, yes.
Please read this snippet from
man 2 fcntl
:**F\_GETLK** (_struct flock \*_) On input to this call, _lock_ describes a lock we would like to place on the file. If the lock could be placed, **fcntl**() does not actually place it, but returns **F\_UNLCK** in the _l\_type_ field of _lock_ and leaves the other fields of the structure unchanged. If one or more incompatible locks would prevent this lock being placed, then **fcntl**() returns details about one of those locks in the _l\_type_, _l\_whence_, _l\_start_, and _l\_len_ fields of _lock_. If the conflicting lock is a traditional (process-associated) record lock, then the _l\_pid_ field is set to the PID of the process holding that lock. If the conflicting lock is an open file description lock, then _l\_pid_ is set to -1. **Note that the returned information may already be out of date by the time the caller inspects it.**
Note the very last sentence. We adhere to the same semantics.
Ugh, weird but ok.
Btw. what's the difference between "a traditional (process-associated) record lock" and "an open file description lock"? Sounds like this sub-call should also look at BSD locks? I'm confused...
libos/include/libos_handle.h
line 152 at r6 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
FYI: This problem is a bit similar to "re-parenting":
- IPC leader does not wait for all processes to finish #21
- [LibOS] Allow ipc worker thread of leader process to exit after the other processes #1168
The thing is: we don't have a "master" process (like an
init
process) that would collect info from all processes and make decisions based on this system-wide knowledge.
Ouch. This sounds bad, it's not just a leak, but we can incorrectly unlock the lock while it still should be locked?
I think I would prefer to leak resources or deadlock in this case than incorrectly unlock the lock.
libos/src/bookkeep/libos_handle.c
line 357 at r7 (raw file):
static uint64_t local_counter = 0; new_handle->id = ((uint64_t)g_process.pid << 32) | __atomic_add_fetch(&local_counter, 1, __ATOMIC_RELAXED);
Can we fail if the counter overflows?
Also, is it possible to overflow the PID in Gramine?
In any of these two cases it would lead to 2 different locks being aliased, which is potentially security-relevant.
libos/src/fs/libos_fs_lock.c
line 56 at r7 (raw file):
/* Used to disallow mixing of POSIX and BSD locks on the same file (dentry). Note that all file * locking requests are processed by the leader process, so even if POSIX and BSD locks are * created in different processes, they will end up in leader and will update these fields. */
Suggestion:
they will end up in the leader and it will update these fields
libos/src/fs/libos_fs_lock.c
line 180 at r7 (raw file):
Migrating the discussion from the old PR:
mkow:
at least one of them is a write lock
There are no write locks in flock? At least not in the part implemented in this PR.
dimakuv:
Why do you say it? Flock locks implement writes, by piggy-backing on already-implemented
F_WRLCK
logic.See handling of
LOCK_EX
(aka write lock aka exclusive lock) inlibos_syscall_flock()
.
But the flock API doesn't have write-locks, and this sentence is talking about flocks (not how we mapped it into the common struct inside LibOS). Flock API calls this "exclusive locks".
libos/src/ipc/libos_ipc_fs_lock.c
line 155 at r7 (raw file):
.end = file_lock2.end, .pid = file_lock2.pid, .handle_id = file_lock2.handle_id,
TODO for myself after we have the final version of this PR: Check if all uses of these modified structures handle this new field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 13 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)
libos/include/libos_fs_lock.h
line 79 at r6 (raw file):
Hmm, but in the discussion above you said "...cannot be mixed on the same file"?
Yes, true. I wrote my comment before I modified from "cannot be mixed in the same instance" to "cannot be mixed on the same file".
libos/include/libos_fs_lock.h
line 109 at r6 (raw file):
Btw. what's the difference between "a traditional (process-associated) record lock" and "an open file description lock"? Sounds like this sub-call should also look at BSD locks? I'm confused...
No, there are several "families" of locks. BSD (flock) locks have nothing to do with this man-page description.
I will only mention three families, not to confuse you even further:
- POSIX (fcntl) locks aka "traditional (process-associated) record locks"
- BSD (flock) locks
- OFD (Open File Description) locks aka File-private POSIX locks (old name)
Gramine currently implements the first two and does not implement the third. Here are several good references on OFDs:
- https://www.gnu.org/software/libc/manual/html_node/Open-File-Description-Locks.html
- https://lwn.net/Articles/586904/
- https://gavv.net/articles/file-locks/#open-file-description-locks-fcntl
In the second (LWN) article, the author of this family of locks describes what OFDs are very well. (In the original patches, OFD locks were called "File-private POSIX locks".) Several snippets:
File-private POSIX locks are an attempt to take elements of both BSD-style and POSIX locks and combine them into a more threading-friendly file locking API.
The only real problem with BSD locks is that they are whole-file locks. POSIX locks, on the other hand can operate on arbitrary byte ranges within a file. While whole-file locks are useful (and indeed, many applications just lock entire files even with POSIX locks), they are not sufficient for many cases. Applications such as databases need granular locking in order to allow for better parallelism.
I will assert that what is needed is a new type of lock that is a hybrid of the two — a byte-range lock that has BSD-like semantics for inheritance across fork() and on close().
Maybe one day we'll implement OFD locks in Gramine. Then the If the conflicting lock is an open file description lock, then _l\_pid_ is set to -1.
will become relevant to us. For now, just ignore this.
libos/include/libos_handle.h
line 152 at r6 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Ouch. This sounds bad, it's not just a leak, but we can incorrectly unlock the lock while it still should be locked?
I think I would prefer to leak resources or deadlock in this case than incorrectly unlock the lock.
Done.
libos/src/bookkeep/libos_handle.c
line 357 at r7 (raw file):
Can we fail if the counter overflows?
Done.
Also, is it possible to overflow the PID in Gramine?
I guess you mean "can PIDs be reused"? The answer is yes. The code is a bit complicated, but here it is:
- https://github.com/gramineproject/gramine/blob/fc242aa79dd3ca30dbd6274367b8921fa875b6fc/libos/include/libos_types.h#L30-L31?plain=1
- https://github.com/gramineproject/gramine/blob/fc242aa79dd3ca30dbd6274367b8921fa875b6fc/libos/src/ipc/libos_ipc_pid.c#L59-L102?plain=1
- https://github.com/gramineproject/gramine/blob/fc242aa79dd3ca30dbd6274367b8921fa875b6fc/libos/src/ipc/libos_ipc_pid.c#L112-L117?plain=1
- https://github.com/gramineproject/gramine/blob/fc242aa79dd3ca30dbd6274367b8921fa875b6fc/libos/src/bookkeep/libos_pid.c#L78?plain=1
In short, what happens with PIDs/TIDs is the following:
- Gramine's IPC leader maintains a tree of ID ranges. Initially this tree is empty, and each new request (from the IPC leader process itself or from its child processes) creates a new ID range and adds it to the table. Each ID range is capped at
MAX_RANGE_SIZE
(32) IDs. - ID ranges start at ID=0 and grow in 32-ID chunks.
- ID ranges grow until
PID_MAX
which is2^22 - 1
(~4 million). Then Gramine tries to reuse them.- Even if some child processes released their ID ranges, the IPC leader does not "jump back" and re-use those ID ranges immediately.
- Instead, the IPC leader waits until the ID rolling integer (
g_last_id
) exceedsPID_MAX
and only then starts reusing ID ranges. - This logic is encapsulated in these snippets:
- https://github.com/gramineproject/gramine/blob/fc242aa79dd3ca30dbd6274367b8921fa875b6fc/libos/src/ipc/libos_ipc_pid.c#L66?plain=1
- https://github.com/gramineproject/gramine/blob/fc242aa79dd3ca30dbd6274367b8921fa875b6fc/libos/src/ipc/libos_ipc_pid.c#L114?plain=1
To achieve this max limit, we need to:
- either create ~4 million threads,
- or (the worst case) create
4194304 / 32 = 131072
single-thread processes.
After this limit, Gramine starts reusing PIDs.
This was like this before. Do we care about fixing it in this PR?
libos/src/fs/libos_fs_lock.c
line 56 at r7 (raw file):
/* Used to disallow mixing of POSIX and BSD locks on the same file (dentry). Note that all file * locking requests are processed by the leader process, so even if POSIX and BSD locks are * created in different processes, they will end up in leader and will update these fields. */
Done.
libos/src/fs/libos_fs_lock.c
line 180 at r7 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Migrating the discussion from the old PR:
mkow:
at least one of them is a write lock
There are no write locks in flock? At least not in the part implemented in this PR.
dimakuv:
Why do you say it? Flock locks implement writes, by piggy-backing on already-implemented
F_WRLCK
logic.See handling of
LOCK_EX
(aka write lock aka exclusive lock) inlibos_syscall_flock()
.But the flock API doesn't have write-locks, and this sentence is talking about flocks (not how we mapped it into the common struct inside LibOS). Flock API calls this "exclusive locks".
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r7, 3 of 3 files at r8.
Reviewable status: 7 of 13 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
libos/include/libos_fs_lock.h
line 109 at r6 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Btw. what's the difference between "a traditional (process-associated) record lock" and "an open file description lock"? Sounds like this sub-call should also look at BSD locks? I'm confused...
No, there are several "families" of locks. BSD (flock) locks have nothing to do with this man-page description.
I will only mention three families, not to confuse you even further:
- POSIX (fcntl) locks aka "traditional (process-associated) record locks"
- BSD (flock) locks
- OFD (Open File Description) locks aka File-private POSIX locks (old name)
Gramine currently implements the first two and does not implement the third. Here are several good references on OFDs:
- https://www.gnu.org/software/libc/manual/html_node/Open-File-Description-Locks.html
- https://lwn.net/Articles/586904/
- https://gavv.net/articles/file-locks/#open-file-description-locks-fcntl
In the second (LWN) article, the author of this family of locks describes what OFDs are very well. (In the original patches, OFD locks were called "File-private POSIX locks".) Several snippets:
File-private POSIX locks are an attempt to take elements of both BSD-style and POSIX locks and combine them into a more threading-friendly file locking API.
The only real problem with BSD locks is that they are whole-file locks. POSIX locks, on the other hand can operate on arbitrary byte ranges within a file. While whole-file locks are useful (and indeed, many applications just lock entire files even with POSIX locks), they are not sufficient for many cases. Applications such as databases need granular locking in order to allow for better parallelism.
I will assert that what is needed is a new type of lock that is a hybrid of the two — a byte-range lock that has BSD-like semantics for inheritance across fork() and on close().
Maybe one day we'll implement OFD locks in Gramine. Then the
If the conflicting lock is an open file description lock, then _l\_pid_ is set to -1.
will become relevant to us. For now, just ignore this.
Uhhh, third file locking mechanism? I'll just pretend I've never learned about this.
libos/src/bookkeep/libos_handle.c
line 357 at r7 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Can we fail if the counter overflows?
Done.
Also, is it possible to overflow the PID in Gramine?
I guess you mean "can PIDs be reused"? The answer is yes. The code is a bit complicated, but here it is:
- https://github.com/gramineproject/gramine/blob/fc242aa79dd3ca30dbd6274367b8921fa875b6fc/libos/include/libos_types.h#L30-L31?plain=1
- https://github.com/gramineproject/gramine/blob/fc242aa79dd3ca30dbd6274367b8921fa875b6fc/libos/src/ipc/libos_ipc_pid.c#L59-L102?plain=1
- https://github.com/gramineproject/gramine/blob/fc242aa79dd3ca30dbd6274367b8921fa875b6fc/libos/src/ipc/libos_ipc_pid.c#L112-L117?plain=1
- https://github.com/gramineproject/gramine/blob/fc242aa79dd3ca30dbd6274367b8921fa875b6fc/libos/src/bookkeep/libos_pid.c#L78?plain=1
In short, what happens with PIDs/TIDs is the following:
- Gramine's IPC leader maintains a tree of ID ranges. Initially this tree is empty, and each new request (from the IPC leader process itself or from its child processes) creates a new ID range and adds it to the table. Each ID range is capped at
MAX_RANGE_SIZE
(32) IDs.- ID ranges start at ID=0 and grow in 32-ID chunks.
- ID ranges grow until
PID_MAX
which is2^22 - 1
(~4 million). Then Gramine tries to reuse them.
- Even if some child processes released their ID ranges, the IPC leader does not "jump back" and re-use those ID ranges immediately.
- Instead, the IPC leader waits until the ID rolling integer (
g_last_id
) exceedsPID_MAX
and only then starts reusing ID ranges.- This logic is encapsulated in these snippets:
- https://github.com/gramineproject/gramine/blob/fc242aa79dd3ca30dbd6274367b8921fa875b6fc/libos/src/ipc/libos_ipc_pid.c#L66?plain=1
- https://github.com/gramineproject/gramine/blob/fc242aa79dd3ca30dbd6274367b8921fa875b6fc/libos/src/ipc/libos_ipc_pid.c#L114?plain=1
To achieve this max limit, we need to:
- either create ~4 million threads,
- or (the worst case) create
4194304 / 32 = 131072
single-thread processes.After this limit, Gramine starts reusing PIDs.
This was like this before. Do we care about fixing it in this PR?
I think we should fix it, because this PR makes crossing this limit dangerous.
libos/src/bookkeep/libos_handle.c
line 358 at r8 (raw file):
new_handle->id = ((uint64_t)g_process.pid << 32) | __atomic_add_fetch(&local_id_counter, 1, __ATOMIC_RELAXED); if ((new_handle->id & 0xFFFFFFFF) == 0) {
I think it would be simpler if you assigned the result of __atomic_add_fetch()
above to a variable?
libos/src/bookkeep/libos_handle.c
line 499 at r8 (raw file):
/* Do not perform an unlock operation if this process has children. This may leak * resources or deadlock, but protects against unsafe behavior. See comments near * `created_by_process` in libos_handle.h header. */
Does this also works if we had A->B->C, but process B died?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 7 of 13 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)
libos/include/libos_fs_lock.h
line 109 at r6 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Uhhh, third file locking mechanism? I'll just pretend I've never learned about this.
Well, then I won't suggest to search for F_SETLEASE
:)
libos/src/bookkeep/libos_handle.c
line 357 at r7 (raw file):
I think we should fix it, because this PR makes crossing this limit dangerous.
Quick clarification: do you want us to put a hard limit on the number of created IDs? I.e., we must fail instead of wrapping around? This is a regression from previous Gramine behavior...
libos/src/bookkeep/libos_handle.c
line 499 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Does this also works if we had A->B->C, but process B died?
No, it doesn't. I don't know how to handle this case without a central authority that manages handles/objects.
I don't know what to do in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 7 of 13 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
libos/include/libos_fs_lock.h
line 109 at r6 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Well, then I won't suggest to search for
F_SETLEASE
:)
😱
libos/src/bookkeep/libos_handle.c
line 357 at r7 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I think we should fix it, because this PR makes crossing this limit dangerous.
Quick clarification: do you want us to put a hard limit on the number of created IDs? I.e., we must fail instead of wrapping around? This is a regression from previous Gramine behavior...
Yes, that seems to be the safest option. At least assuming that normal workloads can't really hit it accidentally.
libos/src/bookkeep/libos_handle.c
line 499 at r8 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
No, it doesn't. I don't know how to handle this case without a central authority that manages handles/objects.
I don't know what to do in this case.
Uhh, sounds bad. Can't we use the central process which already manages the locks to check this condition? Or would it be too complex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 14 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)
libos/src/bookkeep/libos_handle.c
line 357 at r7 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Yes, that seems to be the safest option. At least assuming that normal workloads can't really hit it accidentally.
Done.
libos/src/bookkeep/libos_handle.c
line 358 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
I think it would be simpler if you assigned the result of
__atomic_add_fetch()
above to a variable?
Done.
libos/src/bookkeep/libos_handle.c
line 499 at r8 (raw file):
Can't we use the central process which already manages the locks to check this condition? Or would it be too complex?
Which condition exactly? If we would implement in the central process, then we should implement the correct solution: which is to keep track of the number of references to each opened handle, and only fire some event (like LOCK_UN
) when the refcount goes down to zero.
Basically, this will require system-wide refcounting. Which means a lot of IPC scaffolding: an IPC request when the ref to the handle is created in any process, an IPC request when the ref to the handle is closed in any process, an IPC request when the refs of handles are increased during fork (because it serves as a dup). This is a huge task.
Sorry, I don't think there is any simple fix to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r9.
Reviewable status: 8 of 14 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @kailun-qin)
libos/src/bookkeep/libos_handle.c
line 499 at r8 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Can't we use the central process which already manages the locks to check this condition? Or would it be too complex?
Which condition exactly? If we would implement in the central process, then we should implement the correct solution: which is to keep track of the number of references to each opened handle, and only fire some event (like
LOCK_UN
) when the refcount goes down to zero.Basically, this will require system-wide refcounting. Which means a lot of IPC scaffolding: an IPC request when the ref to the handle is created in any process, an IPC request when the ref to the handle is closed in any process, an IPC request when the refs of handles are increased during fork (because it serves as a dup). This is a huge task.
Sorry, I don't think there is any simple fix to this.
Maybe @kailun-qin has any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 13 files at r1, 2 of 6 files at r5, 2 of 4 files at r7, 1 of 3 files at r8, 1 of 2 files at r9, all commit messages.
Reviewable status: 13 of 14 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
-- commits
line 12 at r9:
interruptible
Code quote:
interruptable
libos/include/libos_fs_lock.h
line 86 at r9 (raw file):
* - For POSIX (fcntl) locks, for the given PID and range, replace the existing POSIX locks held by * the given PID for that range. * - for BSD (flock) locks, replace the existing BSD locks held by the given handle ID.
For
Code quote:
for
libos/src/bookkeep/libos_handle.c
line 499 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Maybe @kailun-qin has any ideas?
I agree that it sounds bad. But sorry that I don't see a simple fix that can be correct at the same time (as we've decided that we'd never want to incorrectly unlock the lock). :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 17 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)
Previously, kailun-qin (Kailun Qin) wrote…
interruptible
Will do at final rebase
libos/include/libos_fs_lock.h
line 86 at r9 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
For
Done.
libos/src/bookkeep/libos_handle.c
line 499 at r8 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
I agree that it sounds bad. But sorry that I don't see a simple fix that can be correct at the same time (as we've decided that we'd never want to incorrectly unlock the lock). :(
Done, kinda. I tried different ideas, but I always end up with "system-wide handle refcounting". Which is not easy to implement, and can't be crammed into this PR for sure.
So I introduced sys.experimental__enable_flock
and this feature becomes experimental. I don't know what else we can with this PR.
libos/src/bookkeep/libos_handle.c
line 502 at r10 (raw file):
return -EBUSY; } unlock(&g_process.children_lock);
Btw, if we decide to go with this sys.experimental__enable_flock
thing, then I can remove this whole hunk about g_process.children
check. It works only in the simple cases anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 8 files at r10.
Reviewable status: 13 of 17 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @kailun-qin)
Documentation/manifest-syntax.rst
line 607 at r10 (raw file):
.. warning:: This syscall is still under development and may contain security vulnerabilities. This is temporary; the syscall will be enabled in future
Suggestion:
the syscall will be enabled by default in the future
libos/src/bookkeep/libos_handle.c
line 499 at r8 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Done, kinda. I tried different ideas, but I always end up with "system-wide handle refcounting". Which is not easy to implement, and can't be crammed into this PR for sure.
So I introduced
sys.experimental__enable_flock
and this feature becomes experimental. I don't know what else we can with this PR.
Sounds like the only reasonable solution for now, I completely forgot we have such an option.
libos/src/bookkeep/libos_handle.c
line 502 at r10 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Btw, if we decide to go with this
sys.experimental__enable_flock
thing, then I can remove this whole hunk aboutg_process.children
check. It works only in the simple cases anyway.
I'd remove it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 of 17 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)
Documentation/manifest-syntax.rst
line 607 at r10 (raw file):
.. warning:: This syscall is still under development and may contain security vulnerabilities. This is temporary; the syscall will be enabled in future
Done.
libos/src/bookkeep/libos_handle.c
line 502 at r10 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
I'd remove it then.
Done.
92f104b
to
6d084a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 of 17 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @mkow)
a discussion (no related file):
I did a tentative rebase (squashed all fixup commits and updated to master)
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
...cannot be mixed on the same file
Done
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Will do at final rebase
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 of 17 files reviewed, 8 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @mkow)
a discussion (no related file):
Dump of notes and TODOs for the future:
- Allow mixing POSIX and BSD locks, like Linux does. The two families should basically ignore each other (as if they are acquired on different files). This should be simple to implement, but need to be careful with a "list of locks" -- it will have both, and at least POSIX locks are ordered in that list, so the ordering code should be complicated. Alternatively, we can have two "lists of locks", one for POSIX and one for BSD.
- Fix the problem described in
libos_handle.h
. This will require some sort of system-wide handle refcounting. There's also an interesting asymmetry, which was probably a bad idea after all:handle_id
is easy to make unique and to inherit, but it's very hard to undo (in the sense of releasing locks implicitly, when handles are closed and processes terminated). clear_flock_locks()
is called on every file-related handle close. This is very expensive, as it sends an IPC message to the leader. There should be optimizations to not send the IPC message. Just imagine a Python workload which opens and closes 1,000s of script files during its execution, each such close sending an IPC message to the leader (which will be most probably useless as there are no flocks on that file).- Both commands ("acquire shared lock", "acquire exclusive lock", "release lock") and states ("this is an exclusive lock", "this is a shared lock") are represented by the same
libos_file_lock
struct. This is wrong -- they should actually be two different structs. Consider for exampleF_UNLCK
-- it is a command only and never a state; this is easily confusing for readers. Also, because we decided to reuse fcntl'sflock
struct and its macros, we end up with bad field names -- the type of the lock is calledfamily
, whereas the command/state of the lock is calledtype
. - The currently enabled tests are trivial. We need 3-4 powerful tests that will cover complex flows: e.g. a test with 3 processes, where an intermediate process terminates. Or e.g. a test with a complex tree of processes, grabbing and releasing locks on different files. Maybe even a randomized version of such test.
- We may want to revert changes to
libos_ipc_pid.c
which we made to guarantee uniqueness ofhandle_id
. This is in case we will replacehandle_id
with something that doesn't require such uniqueness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 13 files at r1, 1 of 4 files at r7, 4 of 8 files at r10, 2 of 2 files at r11, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv and @kailun-qin)
libos/test/regression/flock_lock.c
line 50 at r12 (raw file):
} static void try_flock(int fd, int operation, int expect_ret) {
ditto in all places below
Suggestion:
expected_ret
libos/test/regression/flock_lock.c
line 60 at r12 (raw file):
static void try_fcntl(int fd, int operation, int expect_ret, int expect_errno) { struct flock fl = { .l_type = F_RDLCK,
This should be a parameter of this function, it's a bit weird now.
Code quote:
.l_type = F_RDLCK,
libos/test/regression/flock_lock.c
line 103 at r12 (raw file):
do { ret = read(pipe[0], &c, sizeof(c)); } while (ret == -1 && errno == EINTR);
What's actually causing these EINTRs in these tests? What are this function and the one above actually doing conceptually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r5, 1 of 3 files at r8, 1 of 2 files at r9, 6 of 8 files at r10, 2 of 2 files at r11, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)
libos/include/libos_handle.h
line 151 at r12 (raw file):
* P2. Unfortunately, Gramine lacks system-wide tracking of handle FDs. We currently err * on the side of caution and do *not* perform an operation if the process has children; * this may leak resources or deadlock but at least we don't introduce unsafe behavior.
Why do we still keep this?
Code quote:
* on the side of caution and do *not* perform an operation if the process has children;
* this may leak resources or deadlock but at least we don't introduce unsafe behavior.
The `flock` implementation introduces flock (aka BSD) locks, in addition to already-existing fcntl (aka POSIX) locks. The flock heavily reuses the current logic of `libos_file_lock` and `dent_file_locks`. Because flock is still under development, it is disabled by default and can be switched on via `sys.experimental__enable_flock` manifest option. Since flock locks are tied to the handle (aka open file description), we introduce a new field `struct libos_file_lock::handle_id` that uniquely identifies the handle across all Gramine processes from the same instance. `handle_id` is composed of a PID and an ever-increasing local counter; to guarantee uniqueness, re-use and integer wraparound of PIDs and local counters are disallowed. Currently flock and fcntl locks cannot be mixed on the same file. Gramine will loudly fail if it detects such attempts. This is different from Linux which allows such mixing. Similarly to fcntl locks, interruptible flock locks are not supported. Co-authored-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com> Signed-off-by: Liang Ma <liang3.ma@intel.com> Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
6d084a2
to
0665da5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 15 of 18 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @mkow)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I did a tentative rebase (squashed all fixup commits and updated to master)
And rebased again with small fixes.
libos/include/libos_handle.h
line 151 at r12 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
Why do we still keep this?
Done. Thanks, this was outdated.
libos/test/regression/fcntl_lock.c
line 24 at r13 (raw file):
#include <unistd.h> #include "common.h"
FYI: modifications to this file are purely for uniformity with flock_lock.c
libos/test/regression/flock_lock.c
line 50 at r12 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto in all places below
Done.
libos/test/regression/flock_lock.c
line 60 at r12 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
This should be a parameter of this function, it's a bit weird now.
Done.
libos/test/regression/flock_lock.c
line 103 at r12 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
What's actually causing these EINTRs in these tests? What are this function and the one above actually doing conceptually?
Done. You're right, there is no EINTR possible in this test (there are no signal handlers).
I simplified these helper functions. These functions are needed to synchronize several threads or several processes -- e.g. second thread waits on read_pipe()
until the first thread finishes its stuff, then first thread performs write_pipe()
to unblock the second thread. So a simple sync primitive.
This was taken from fcntl_lock.c
test as-is. I changed that other test too, for uniformity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r13, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @mkow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r13, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Dump of notes and TODOs for the future:
- Allow mixing POSIX and BSD locks, like Linux does. The two families should basically ignore each other (as if they are acquired on different files). This should be simple to implement, but need to be careful with a "list of locks" -- it will have both, and at least POSIX locks are ordered in that list, so the ordering code should be complicated. Alternatively, we can have two "lists of locks", one for POSIX and one for BSD.
- Fix the problem described in
libos_handle.h
. This will require some sort of system-wide handle refcounting. There's also an interesting asymmetry, which was probably a bad idea after all:handle_id
is easy to make unique and to inherit, but it's very hard to undo (in the sense of releasing locks implicitly, when handles are closed and processes terminated).clear_flock_locks()
is called on every file-related handle close. This is very expensive, as it sends an IPC message to the leader. There should be optimizations to not send the IPC message. Just imagine a Python workload which opens and closes 1,000s of script files during its execution, each such close sending an IPC message to the leader (which will be most probably useless as there are no flocks on that file).- Both commands ("acquire shared lock", "acquire exclusive lock", "release lock") and states ("this is an exclusive lock", "this is a shared lock") are represented by the same
libos_file_lock
struct. This is wrong -- they should actually be two different structs. Consider for exampleF_UNLCK
-- it is a command only and never a state; this is easily confusing for readers. Also, because we decided to reuse fcntl'sflock
struct and its macros, we end up with bad field names -- the type of the lock is calledfamily
, whereas the command/state of the lock is calledtype
.- The currently enabled tests are trivial. We need 3-4 powerful tests that will cover complex flows: e.g. a test with 3 processes, where an intermediate process terminates. Or e.g. a test with a complex tree of processes, grabbing and releasing locks on different files. Maybe even a randomized version of such test.
- We may want to revert changes to
libos_ipc_pid.c
which we made to guarantee uniqueness ofhandle_id
. This is in case we will replacehandle_id
with something that doesn't require such uniqueness.
Moved these items to a proper GitHub issue: #1433
Description of the changes
The
flock
implementation introduces flock (aka BSD) locks, in addition to already-existing fcntl (aka POSIX) locks. The flock heavily reuses the current logic oflibos_file_lock
anddent_file_locks
.Since flock locks are tied to the handle, we introduce a new field
struct libos_file_lock::handle_id
that serves to distinguish flock locks from fcntl locks. Similarly to fcntl locks, interruptable flock locks are not supported.This is a reincarnation of #1212. It was easier to make it a new PR, then to continue working on the old PR.
Closes #1212.
How to test this PR?
New tests are added.
This change is