Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LibOS] Add flock syscall #1416

Merged
merged 1 commit into from
Jun 30, 2023
Merged

[LibOS] Add flock syscall #1416

merged 1 commit into from
Jun 30, 2023

Conversation

dimakuv
Copy link

@dimakuv dimakuv commented Jun 23, 2023

Description of the changes

The flock implementation introduces flock (aka BSD) locks, in addition to already-existing fcntl (aka POSIX) locks. The flock heavily reuses the current logic of libos_file_lock and dent_file_locks.

Since flock locks are tied to the handle, we introduce a new field struct libos_file_lock::handle_id that serves to distinguish flock locks from fcntl locks. Similarly to fcntl locks, interruptable flock locks are not supported.

This is a reincarnation of #1212. It was easier to make it a new PR, then to continue working on the old PR.

Closes #1212.

How to test this PR?

New tests are added.


This change is Reviewable

@dimakuv dimakuv marked this pull request as draft June 23, 2023 09:47
@dimakuv dimakuv force-pushed the dimakuv/flock-syscall branch from 45e4953 to 77d6627 Compare June 23, 2023 09:56
@dimakuv dimakuv changed the base branch from master to dimakuv/rename-fs-and-posix-locks June 23, 2023 09:57
@dimakuv dimakuv force-pushed the dimakuv/flock-syscall branch from 77d6627 to 7a9d4dd Compare June 23, 2023 10:04
@dimakuv dimakuv marked this pull request as ready for review June 23, 2023 10:50
Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)


libos/include/libos_fs_lock.h line 142 at r1 (raw file):

 * subsequent behavior is undefined).
 */
bool has_flock_locks(struct libos_dentry* dent);

In review of #1212, @mkow raised a point whether it's ok to have UB when fcntl and flock locks are mixed. Linux actually allows the mix, by completely separating these two types of locks (i.e., one type of locks doesn't interact with the other type of locks in any way -- which is a split-brain behavior, but that's what Linux does).

If we'll go with the weird Linux behavior, it will be pretty simple (all conflict-finding logic should just have a clear split: iterate only on fcntl locks or iterate only on flock locks). Then we can remove this has_flock_locks() function and we'll also be bug-for-bug compatible with Linux.

It will also probably require a new LibOS test that mixes the two locks.

I'm not a big fan of introducing the weird Linux behavior... But what we currently have is also pretty bad -- we just UB in such case, with confusing semantics.

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)

a discussion (no related file):
I found a bug in this code, currently fixing it...

For info, I see this bogus output in fcntl_lock test:

1: fcntl(fd, F_SETLK, {F_UNLCK, SEEK_SET,    5,   10}) = 0

file_lock: fcntl (POSIX): pid=1: r[15..19]
file_lock:  flock (BSD): handle id=14757395258967641292: r[25..39]
file_lock: fcntl (POSIX): pid=1: r[1000..end]

I.e., after unsetting the lock, I somehow get a stray BSD lock. This is clearly wrong.


Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I found a bug in this code, currently fixing it...

For info, I see this bogus output in fcntl_lock test:

1: fcntl(fd, F_SETLK, {F_UNLCK, SEEK_SET,    5,   10}) = 0

file_lock: fcntl (POSIX): pid=1: r[15..19]
file_lock:  flock (BSD): handle id=14757395258967641292: r[25..39]
file_lock: fcntl (POSIX): pid=1: r[1000..end]

I.e., after unsetting the lock, I somehow get a stray BSD lock. This is clearly wrong.

Done, fixed (see the first fixup commit).



-- commits line 2 at r2:
My bad, I forgot [LibOS] here. Will do at final rebase.

@dimakuv dimakuv force-pushed the dimakuv/flock-syscall branch from ba48485 to 649fd3b Compare June 26, 2023 17:07
Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners


libos/include/libos_fs_lock.h line 142 at r1 (raw file):

I'm not a big fan of introducing the weird Linux behavior... But what we currently have is also pretty bad -- we just UB in such case, with confusing semantics.

By the way, another simple idea is to disallow POSIX and BSD locks to be used in one app. Basically, introduce two global booleans, one for POSIX and one for BSD, and make sure they are never true together. This will be a very quick fix to the problem, while we could work on the proper mixing of POSIX plus BSD locks in a future PR.

@dimakuv dimakuv force-pushed the dimakuv/rename-fs-and-posix-locks branch from d099391 to ece36fe Compare June 27, 2023 11:36
Base automatically changed from dimakuv/rename-fs-and-posix-locks to master June 27, 2023 12:35
@dimakuv dimakuv force-pushed the dimakuv/flock-syscall branch from 649fd3b to fcc5937 Compare June 27, 2023 12:45
Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)


-- commits line 2 at r2:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

My bad, I forgot [LibOS] here. Will do at final rebase.

Done

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


-- commits line 10 at r4:

struct libos_file_lock::handle_id that serves to distinguish flock locks from fcntl locks.

This is quite obscure. How would the code look like if we had an enum LOCK_TYPE + a union with the lock-type-specific fields? The checks between the locks needs IFs anyways.

How much logic will be shared between these two types, if we make them independent? I see that Linux also shares the code between them, but has an explicit flag for the lock type.


libos/include/libos_fs_lock.h line 142 at r1 (raw file):

If we'll go with the weird Linux behavior, it will be pretty simple (all conflict-finding logic should just have a clear split: iterate only on fcntl locks or iterate only on flock locks).

I'd go with this if it's actually simple.

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)


-- commits line 10 at r4:

How much logic will be shared between these two types, if we make them independent? I see that Linux also shares the code between them, but has an explicit flag for the lock type.

Yes, the implementation was actually inspired by Linux.

Let me make it more explicit with enum LOCK_TYPE. I can also use a union, though Linux doesn't do it. In case anyone wonders how it is implemented in Linux, start here: https://elixir.bootlin.com/linux/v6.4/source/include/linux/filelock.h

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)


-- commits line 10 at r4:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

How much logic will be shared between these two types, if we make them independent? I see that Linux also shares the code between them, but has an explicit flag for the lock type.

Yes, the implementation was actually inspired by Linux.

Let me make it more explicit with enum LOCK_TYPE. I can also use a union, though Linux doesn't do it. In case anyone wonders how it is implemented in Linux, start here: https://elixir.bootlin.com/linux/v6.4/source/include/linux/filelock.h

UPDATE: I tried with a union, but it touches a lot of unrelated code. I will probably create a branch or a DONTMERGE PR with this change alone, just to show how ugly it looks.

So for this PR, I will not introduce a union.

Also, because type is already used for RLCK, WLCK, UNLCK, I am going with the word family (i.e. enum LOCK_FAMILY). This is currently the best I can come up with, without modifying 80% of the file-lock code. In the future, I plan to have a "rename" PR that will reshape all objects in file-locking code.


libos/src/sys/libos_fcntl.c line 63 at r4 (raw file):

 * positive overflow.
 */
static int flock_to_file_lock(struct flock* fl, struct libos_handle* hdl, uint64_t handle_id,

I should remove this change -- this func will be used purely for POSIX locks.


libos/src/sys/libos_fcntl.c line 318 at r4 (raw file):

    struct libos_file_lock file_lock;
    ret = flock_to_file_lock(&fl, hdl, hdl->id, &file_lock);

I will remove call to this function -- I can just create file_lock object directly here.

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)


-- commits line 10 at r4:
Done (family field).

How much logic will be shared between these two types, if we make them independent?

Looks like I didn't answer this question before. A lot of logic is shared between the two types, I would say around 80% of the code is common for POSIX and BSD locks.


-- commits line 12 at r4:
TODO: modify the commit message to mention:

  • flock locks and fcntl locks are distinguished via struct libos_file_lock::family
  • currently flock and fcntl locks cannot be mixed -- Gramine will loudly fail if it detects such attempts

libos/src/sys/libos_fcntl.c line 63 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I should remove this change -- this func will be used purely for POSIX locks.

Done


libos/src/sys/libos_fcntl.c line 318 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I will remove call to this function -- I can just create file_lock object directly here.

Done

@dimakuv
Copy link
Author

dimakuv commented Jun 28, 2023

libos/include/libos_fs_lock.h line 142 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

If we'll go with the weird Linux behavior, it will be pretty simple (all conflict-finding logic should just have a clear split: iterate only on fcntl locks or iterate only on flock locks).

I'd go with this if it's actually simple.

So conforming to the Linux behavior turned out to be not so simple. The main obstacle is the fact that "list of locks" is shared between POSIX and BSD locks, and this list is supposed to be ordered in a special way (for POSIX locks). See:

This is not that hard to fix, there are two ways:

  • Keep this single "list of locks", and just modify the POSIX ordering code to "jump over" BSD locks. Sounds a bit ugly.
  • Split "list of locks" into two lists, one for POSIX locks (and ordered) and one for BSD locks (doesn't need to be ordered). Requires a lot of changes.

Given the above, I currently decided to go in another direction: disallow POSIX and BSD locks to be used in the same app. This is much easier to implement, and doesn't regress Gramine.

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)


libos/include/libos_fs_lock.h line 142 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

So conforming to the Linux behavior turned out to be not so simple. The main obstacle is the fact that "list of locks" is shared between POSIX and BSD locks, and this list is supposed to be ordered in a special way (for POSIX locks). See:

This is not that hard to fix, there are two ways:

  • Keep this single "list of locks", and just modify the POSIX ordering code to "jump over" BSD locks. Sounds a bit ugly.
  • Split "list of locks" into two lists, one for POSIX locks (and ordered) and one for BSD locks (doesn't need to be ordered). Requires a lot of changes.

Given the above, I currently decided to go in another direction: disallow POSIX and BSD locks to be used in the same app. This is much easier to implement, and doesn't regress Gramine.

Done. This also lets us get rid of has_flock_locks() func.


libos/include/libos_handle.h line 152 at r6 (raw file):

     *        tracking of handle FDs. On the other hand, Gramine anyway doesn't support the
     *        "daemonization" case described above, so the assumption of "last" FD works fine.
     */

FYI: This problem is a bit similar to "re-parenting":

The thing is: we don't have a "master" process (like an init process) that would collect info from all processes and make decisions based on this system-wide knowledge.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r6.
Reviewable status: 1 of 13 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)


libos/include/libos_fs_lock.h line 36 at r6 (raw file):

 * - The main process has to be able to look up the same file, so locking will not work for files in
 *   local-process-only filesystems (tmpfs).
 * - There is no deadlock detection (EDEADLK).

This is only for fcntl-locks, flock doesn't have such thing.


libos/include/libos_fs_lock.h line 79 at r6 (raw file):

remove all locks held by

All BSD locks? Or also POSIX?

(also for the sentence above)


libos/include/libos_fs_lock.h line 86 at r6 (raw file):

 * - For POSIX (fcntl) locks, for the given PID and range, replace the existing locks held by the
 *   given PID for that range.
 * - for BSD (flock) locks, replace the existing locks held by the given handle ID.

ditto


libos/include/libos_fs_lock.h line 109 at r6 (raw file):

 * If the lock could be placed, `out_file_lock->type` is set to `F_UNLCK`. Otherwise,
 * `out_file_lock` fields (`type`, `start, `end`, `pid`, `handle_id`) are set to details of a
 * conflicting lock.

It was like this before, but how does this work? It sounds inherently racy - this function only checks whether you can place the lock, but what prevents the state from changing between the return form here and the setting of the lock?


libos/src/fs/libos_fs_lock.c line 20 at r6 (raw file):

/* Used to disallow mixing of POSIX and BSD locks. Protected by `g_fs_lock_lock`. */
static bool g_file_lock_posix_used = false;
static bool g_file_lock_flock_used = false;

Oh, so this is global, not per-file? Why?
Also, does it detect the situation where process A creates an flock, and process B creates an fcntl lock?


libos/src/fs/libos_fs_lock.c line 185 at r6 (raw file):

otherwise it may exhibit unexpected behavior

This is outdated, now we fail loudly.

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 13 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)


libos/include/libos_fs_lock.h line 36 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This is only for fcntl-locks, flock doesn't have such thing.

Done.


libos/include/libos_fs_lock.h line 79 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

remove all locks held by

All BSD locks? Or also POSIX?

(also for the sentence above)

Done. POSIX and BSD locks cannot (currently) coexist in the same Gramine instance, so I thought it didn't need an explicit mention. But Ok.


libos/include/libos_fs_lock.h line 86 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


libos/include/libos_fs_lock.h line 109 at r6 (raw file):

This method is inherently racy, yes.

Please read this snippet from man 2 fcntl:

  **F\_GETLK** (_struct flock \*_)
         On input to this call, _lock_ describes a lock we would like
         to place on the file.  If the lock could be placed,
         **fcntl**() does not actually place it, but returns **F\_UNLCK** in
         the _l\_type_ field of _lock_ and leaves the other fields of
         the structure unchanged.

         If one or more incompatible locks would prevent this lock
         being placed, then **fcntl**() returns details about one of
         those locks in the _l\_type_, _l\_whence_, _l\_start_, and _l\_len_
         fields of _lock_.  If the conflicting lock is a traditional
         (process-associated) record lock, then the _l\_pid_ field is
         set to the PID of the process holding that lock.  If the
         conflicting lock is an open file description lock, then
         _l\_pid_ is set to -1.  **Note that the returned information
         may already be out of date by the time the caller inspects
         it.**

Note the very last sentence. We adhere to the same semantics.


libos/src/fs/libos_fs_lock.c line 20 at r6 (raw file):

Oh, so this is global, not per-file? Why?

I can do it per-file (per dentry). Ok, actually, let me just do it.

Also, does it detect the situation where process A creates an flock, and process B creates an fcntl lock?

Yes. Recall that all actual file-locking operations are done in the master (leader) process. And these variables are checked in the function file_lock_set_or_add_request() which is only called in the master process. So even if process A (leader) creates an flock and process B (child) creates an fcntl lock, the fcntl lock request will be sent to process A and processed in process A anyway.

I will add a comment and an additional test case for this.


libos/src/fs/libos_fs_lock.c line 185 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

otherwise it may exhibit unexpected behavior

This is outdated, now we fail loudly.

Done.

@dimakuv dimakuv force-pushed the dimakuv/flock-syscall branch from c23cfca to 520a019 Compare June 28, 2023 17:45
Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 13 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)


-- commits line 12 at r4:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

TODO: modify the commit message to mention:

  • flock locks and fcntl locks are distinguished via struct libos_file_lock::family
  • currently flock and fcntl locks cannot be mixed -- Gramine will loudly fail if it detects such attempts

...cannot be mixed on the same file


libos/src/fs/libos_fs_lock.c line 20 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Oh, so this is global, not per-file? Why?

I can do it per-file (per dentry). Ok, actually, let me just do it.

Also, does it detect the situation where process A creates an flock, and process B creates an fcntl lock?

Yes. Recall that all actual file-locking operations are done in the master (leader) process. And these variables are checked in the function file_lock_set_or_add_request() which is only called in the master process. So even if process A (leader) creates an flock and process B (child) creates an fcntl lock, the fcntl lock request will be sent to process A and processed in process A anyway.

I will add a comment and an additional test case for this.

Done 2x

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 6 files at r5, 2 of 5 files at r6, 2 of 4 files at r7.
Reviewable status: 7 of 13 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)


libos/include/libos_fs_lock.h line 79 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Done. POSIX and BSD locks cannot (currently) coexist in the same Gramine instance, so I thought it didn't need an explicit mention. But Ok.

Hmm, but in the discussion above you said "...cannot be mixed on the same file"?


libos/include/libos_fs_lock.h line 109 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This method is inherently racy, yes.

Please read this snippet from man 2 fcntl:

  **F\_GETLK** (_struct flock \*_)
         On input to this call, _lock_ describes a lock we would like
         to place on the file.  If the lock could be placed,
         **fcntl**() does not actually place it, but returns **F\_UNLCK** in
         the _l\_type_ field of _lock_ and leaves the other fields of
         the structure unchanged.

         If one or more incompatible locks would prevent this lock
         being placed, then **fcntl**() returns details about one of
         those locks in the _l\_type_, _l\_whence_, _l\_start_, and _l\_len_
         fields of _lock_.  If the conflicting lock is a traditional
         (process-associated) record lock, then the _l\_pid_ field is
         set to the PID of the process holding that lock.  If the
         conflicting lock is an open file description lock, then
         _l\_pid_ is set to -1.  **Note that the returned information
         may already be out of date by the time the caller inspects
         it.**

Note the very last sentence. We adhere to the same semantics.

Ugh, weird but ok.
Btw. what's the difference between "a traditional (process-associated) record lock" and "an open file description lock"? Sounds like this sub-call should also look at BSD locks? I'm confused...


libos/include/libos_handle.h line 152 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

FYI: This problem is a bit similar to "re-parenting":

The thing is: we don't have a "master" process (like an init process) that would collect info from all processes and make decisions based on this system-wide knowledge.

Ouch. This sounds bad, it's not just a leak, but we can incorrectly unlock the lock while it still should be locked?
I think I would prefer to leak resources or deadlock in this case than incorrectly unlock the lock.


libos/src/bookkeep/libos_handle.c line 357 at r7 (raw file):

    static uint64_t local_counter = 0;
    new_handle->id = ((uint64_t)g_process.pid << 32)
                      | __atomic_add_fetch(&local_counter, 1, __ATOMIC_RELAXED);

Can we fail if the counter overflows?
Also, is it possible to overflow the PID in Gramine?

In any of these two cases it would lead to 2 different locks being aliased, which is potentially security-relevant.


libos/src/fs/libos_fs_lock.c line 56 at r7 (raw file):

    /* Used to disallow mixing of POSIX and BSD locks on the same file (dentry). Note that all file
     * locking requests are processed by the leader process, so even if POSIX and BSD locks are
     * created in different processes, they will end up in leader and will update these fields. */

Suggestion:

they will end up in the leader and it will update these fields

libos/src/fs/libos_fs_lock.c line 180 at r7 (raw file):
Migrating the discussion from the old PR:

mkow:

at least one of them is a write lock

There are no write locks in flock? At least not in the part implemented in this PR.

dimakuv:

Why do you say it? Flock locks implement writes, by piggy-backing on already-implemented F_WRLCK logic.

See handling of LOCK_EX (aka write lock aka exclusive lock) in libos_syscall_flock().

But the flock API doesn't have write-locks, and this sentence is talking about flocks (not how we mapped it into the common struct inside LibOS). Flock API calls this "exclusive locks".


libos/src/ipc/libos_ipc_fs_lock.c line 155 at r7 (raw file):

        .end = file_lock2.end,
        .pid = file_lock2.pid,
        .handle_id = file_lock2.handle_id,

TODO for myself after we have the final version of this PR: Check if all uses of these modified structures handle this new field.

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 13 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)


libos/include/libos_fs_lock.h line 79 at r6 (raw file):

Hmm, but in the discussion above you said "...cannot be mixed on the same file"?

Yes, true. I wrote my comment before I modified from "cannot be mixed in the same instance" to "cannot be mixed on the same file".


libos/include/libos_fs_lock.h line 109 at r6 (raw file):

Btw. what's the difference between "a traditional (process-associated) record lock" and "an open file description lock"? Sounds like this sub-call should also look at BSD locks? I'm confused...

No, there are several "families" of locks. BSD (flock) locks have nothing to do with this man-page description.

I will only mention three families, not to confuse you even further:

  1. POSIX (fcntl) locks aka "traditional (process-associated) record locks"
  2. BSD (flock) locks
  3. OFD (Open File Description) locks aka File-private POSIX locks (old name)

Gramine currently implements the first two and does not implement the third. Here are several good references on OFDs:

In the second (LWN) article, the author of this family of locks describes what OFDs are very well. (In the original patches, OFD locks were called "File-private POSIX locks".) Several snippets:

File-private POSIX locks are an attempt to take elements of both BSD-style and POSIX locks and combine them into a more threading-friendly file locking API.

The only real problem with BSD locks is that they are whole-file locks. POSIX locks, on the other hand can operate on arbitrary byte ranges within a file. While whole-file locks are useful (and indeed, many applications just lock entire files even with POSIX locks), they are not sufficient for many cases. Applications such as databases need granular locking in order to allow for better parallelism.

I will assert that what is needed is a new type of lock that is a hybrid of the two — a byte-range lock that has BSD-like semantics for inheritance across fork() and on close().


Maybe one day we'll implement OFD locks in Gramine. Then the If the conflicting lock is an open file description lock, then _l\_pid_ is set to -1. will become relevant to us. For now, just ignore this.


libos/include/libos_handle.h line 152 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Ouch. This sounds bad, it's not just a leak, but we can incorrectly unlock the lock while it still should be locked?
I think I would prefer to leak resources or deadlock in this case than incorrectly unlock the lock.

Done.


libos/src/bookkeep/libos_handle.c line 357 at r7 (raw file):

Can we fail if the counter overflows?

Done.

Also, is it possible to overflow the PID in Gramine?

I guess you mean "can PIDs be reused"? The answer is yes. The code is a bit complicated, but here it is:

In short, what happens with PIDs/TIDs is the following:

  1. Gramine's IPC leader maintains a tree of ID ranges. Initially this tree is empty, and each new request (from the IPC leader process itself or from its child processes) creates a new ID range and adds it to the table. Each ID range is capped at MAX_RANGE_SIZE (32) IDs.
  2. ID ranges start at ID=0 and grow in 32-ID chunks.
  3. ID ranges grow until PID_MAX which is 2^22 - 1 (~4 million). Then Gramine tries to reuse them.

To achieve this max limit, we need to:

  • either create ~4 million threads,
  • or (the worst case) create 4194304 / 32 = 131072 single-thread processes.

After this limit, Gramine starts reusing PIDs.

This was like this before. Do we care about fixing it in this PR?


libos/src/fs/libos_fs_lock.c line 56 at r7 (raw file):

    /* Used to disallow mixing of POSIX and BSD locks on the same file (dentry). Note that all file
     * locking requests are processed by the leader process, so even if POSIX and BSD locks are
     * created in different processes, they will end up in leader and will update these fields. */

Done.


libos/src/fs/libos_fs_lock.c line 180 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Migrating the discussion from the old PR:

mkow:

at least one of them is a write lock

There are no write locks in flock? At least not in the part implemented in this PR.

dimakuv:

Why do you say it? Flock locks implement writes, by piggy-backing on already-implemented F_WRLCK logic.

See handling of LOCK_EX (aka write lock aka exclusive lock) in libos_syscall_flock().

But the flock API doesn't have write-locks, and this sentence is talking about flocks (not how we mapped it into the common struct inside LibOS). Flock API calls this "exclusive locks".

Done.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r7, 3 of 3 files at r8.
Reviewable status: 7 of 13 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)


libos/include/libos_fs_lock.h line 109 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Btw. what's the difference between "a traditional (process-associated) record lock" and "an open file description lock"? Sounds like this sub-call should also look at BSD locks? I'm confused...

No, there are several "families" of locks. BSD (flock) locks have nothing to do with this man-page description.

I will only mention three families, not to confuse you even further:

  1. POSIX (fcntl) locks aka "traditional (process-associated) record locks"
  2. BSD (flock) locks
  3. OFD (Open File Description) locks aka File-private POSIX locks (old name)

Gramine currently implements the first two and does not implement the third. Here are several good references on OFDs:

In the second (LWN) article, the author of this family of locks describes what OFDs are very well. (In the original patches, OFD locks were called "File-private POSIX locks".) Several snippets:

File-private POSIX locks are an attempt to take elements of both BSD-style and POSIX locks and combine them into a more threading-friendly file locking API.

The only real problem with BSD locks is that they are whole-file locks. POSIX locks, on the other hand can operate on arbitrary byte ranges within a file. While whole-file locks are useful (and indeed, many applications just lock entire files even with POSIX locks), they are not sufficient for many cases. Applications such as databases need granular locking in order to allow for better parallelism.

I will assert that what is needed is a new type of lock that is a hybrid of the two — a byte-range lock that has BSD-like semantics for inheritance across fork() and on close().


Maybe one day we'll implement OFD locks in Gramine. Then the If the conflicting lock is an open file description lock, then _l\_pid_ is set to -1. will become relevant to us. For now, just ignore this.

Uhhh, third file locking mechanism? I'll just pretend I've never learned about this.


libos/src/bookkeep/libos_handle.c line 357 at r7 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can we fail if the counter overflows?

Done.

Also, is it possible to overflow the PID in Gramine?

I guess you mean "can PIDs be reused"? The answer is yes. The code is a bit complicated, but here it is:

In short, what happens with PIDs/TIDs is the following:

  1. Gramine's IPC leader maintains a tree of ID ranges. Initially this tree is empty, and each new request (from the IPC leader process itself or from its child processes) creates a new ID range and adds it to the table. Each ID range is capped at MAX_RANGE_SIZE (32) IDs.
  2. ID ranges start at ID=0 and grow in 32-ID chunks.
  3. ID ranges grow until PID_MAX which is 2^22 - 1 (~4 million). Then Gramine tries to reuse them.

To achieve this max limit, we need to:

  • either create ~4 million threads,
  • or (the worst case) create 4194304 / 32 = 131072 single-thread processes.

After this limit, Gramine starts reusing PIDs.

This was like this before. Do we care about fixing it in this PR?

I think we should fix it, because this PR makes crossing this limit dangerous.


libos/src/bookkeep/libos_handle.c line 358 at r8 (raw file):

    new_handle->id = ((uint64_t)g_process.pid << 32)
                      | __atomic_add_fetch(&local_id_counter, 1, __ATOMIC_RELAXED);
    if ((new_handle->id & 0xFFFFFFFF) == 0) {

I think it would be simpler if you assigned the result of __atomic_add_fetch() above to a variable?


libos/src/bookkeep/libos_handle.c line 499 at r8 (raw file):

            /* Do not perform an unlock operation if this process has children. This may leak
             * resources or deadlock, but protects against unsafe behavior. See comments near
             * `created_by_process` in libos_handle.h header. */

Does this also works if we had A->B->C, but process B died?

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 13 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)


libos/include/libos_fs_lock.h line 109 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Uhhh, third file locking mechanism? I'll just pretend I've never learned about this.

Well, then I won't suggest to search for F_SETLEASE :)


libos/src/bookkeep/libos_handle.c line 357 at r7 (raw file):

I think we should fix it, because this PR makes crossing this limit dangerous.

Quick clarification: do you want us to put a hard limit on the number of created IDs? I.e., we must fail instead of wrapping around? This is a regression from previous Gramine behavior...


libos/src/bookkeep/libos_handle.c line 499 at r8 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Does this also works if we had A->B->C, but process B died?

No, it doesn't. I don't know how to handle this case without a central authority that manages handles/objects.

I don't know what to do in this case.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 13 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)


libos/include/libos_fs_lock.h line 109 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Well, then I won't suggest to search for F_SETLEASE :)

😱


libos/src/bookkeep/libos_handle.c line 357 at r7 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I think we should fix it, because this PR makes crossing this limit dangerous.

Quick clarification: do you want us to put a hard limit on the number of created IDs? I.e., we must fail instead of wrapping around? This is a regression from previous Gramine behavior...

Yes, that seems to be the safest option. At least assuming that normal workloads can't really hit it accidentally.


libos/src/bookkeep/libos_handle.c line 499 at r8 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

No, it doesn't. I don't know how to handle this case without a central authority that manages handles/objects.

I don't know what to do in this case.

Uhh, sounds bad. Can't we use the central process which already manages the locks to check this condition? Or would it be too complex?

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 14 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)


libos/src/bookkeep/libos_handle.c line 357 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Yes, that seems to be the safest option. At least assuming that normal workloads can't really hit it accidentally.

Done.


libos/src/bookkeep/libos_handle.c line 358 at r8 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I think it would be simpler if you assigned the result of __atomic_add_fetch() above to a variable?

Done.


libos/src/bookkeep/libos_handle.c line 499 at r8 (raw file):

Can't we use the central process which already manages the locks to check this condition? Or would it be too complex?

Which condition exactly? If we would implement in the central process, then we should implement the correct solution: which is to keep track of the number of references to each opened handle, and only fire some event (like LOCK_UN) when the refcount goes down to zero.

Basically, this will require system-wide refcounting. Which means a lot of IPC scaffolding: an IPC request when the ref to the handle is created in any process, an IPC request when the ref to the handle is closed in any process, an IPC request when the refs of handles are increased during fork (because it serves as a dup). This is a huge task.

Sorry, I don't think there is any simple fix to this.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r9.
Reviewable status: 8 of 14 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @kailun-qin)


libos/src/bookkeep/libos_handle.c line 499 at r8 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can't we use the central process which already manages the locks to check this condition? Or would it be too complex?

Which condition exactly? If we would implement in the central process, then we should implement the correct solution: which is to keep track of the number of references to each opened handle, and only fire some event (like LOCK_UN) when the refcount goes down to zero.

Basically, this will require system-wide refcounting. Which means a lot of IPC scaffolding: an IPC request when the ref to the handle is created in any process, an IPC request when the ref to the handle is closed in any process, an IPC request when the refs of handles are increased during fork (because it serves as a dup). This is a huge task.

Sorry, I don't think there is any simple fix to this.

Maybe @kailun-qin has any ideas?

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 13 files at r1, 2 of 6 files at r5, 2 of 4 files at r7, 1 of 3 files at r8, 1 of 2 files at r9, all commit messages.
Reviewable status: 13 of 14 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)


-- commits line 12 at r9:
interruptible

Code quote:

interruptable

libos/include/libos_fs_lock.h line 86 at r9 (raw file):

 * - For POSIX (fcntl) locks, for the given PID and range, replace the existing POSIX locks held by
 *   the given PID for that range.
 * - for BSD (flock) locks, replace the existing BSD locks held by the given handle ID.

For

Code quote:

for

libos/src/bookkeep/libos_handle.c line 499 at r8 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Maybe @kailun-qin has any ideas?

I agree that it sounds bad. But sorry that I don't see a simple fix that can be correct at the same time (as we've decided that we'd never want to incorrectly unlock the lock). :(

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 17 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)


-- commits line 12 at r9:

Previously, kailun-qin (Kailun Qin) wrote…

interruptible

Will do at final rebase


libos/include/libos_fs_lock.h line 86 at r9 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

For

Done.


libos/src/bookkeep/libos_handle.c line 499 at r8 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

I agree that it sounds bad. But sorry that I don't see a simple fix that can be correct at the same time (as we've decided that we'd never want to incorrectly unlock the lock). :(

Done, kinda. I tried different ideas, but I always end up with "system-wide handle refcounting". Which is not easy to implement, and can't be crammed into this PR for sure.

So I introduced sys.experimental__enable_flock and this feature becomes experimental. I don't know what else we can with this PR.


libos/src/bookkeep/libos_handle.c line 502 at r10 (raw file):

            return -EBUSY;
        }
        unlock(&g_process.children_lock);

Btw, if we decide to go with this sys.experimental__enable_flock thing, then I can remove this whole hunk about g_process.children check. It works only in the simple cases anyway.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 8 files at r10.
Reviewable status: 13 of 17 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @kailun-qin)


Documentation/manifest-syntax.rst line 607 at r10 (raw file):

.. warning::
   This syscall is still under development and may contain security
   vulnerabilities. This is temporary; the syscall will be enabled in future

Suggestion:

the syscall will be enabled by default in the future

libos/src/bookkeep/libos_handle.c line 499 at r8 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Done, kinda. I tried different ideas, but I always end up with "system-wide handle refcounting". Which is not easy to implement, and can't be crammed into this PR for sure.

So I introduced sys.experimental__enable_flock and this feature becomes experimental. I don't know what else we can with this PR.

Sounds like the only reasonable solution for now, I completely forgot we have such an option.


libos/src/bookkeep/libos_handle.c line 502 at r10 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Btw, if we decide to go with this sys.experimental__enable_flock thing, then I can remove this whole hunk about g_process.children check. It works only in the simple cases anyway.

I'd remove it then.

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 11 of 17 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)


Documentation/manifest-syntax.rst line 607 at r10 (raw file):

.. warning::
   This syscall is still under development and may contain security
   vulnerabilities. This is temporary; the syscall will be enabled in future

Done.


libos/src/bookkeep/libos_handle.c line 502 at r10 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I'd remove it then.

Done.

@dimakuv dimakuv force-pushed the dimakuv/flock-syscall branch from 92f104b to 6d084a2 Compare June 29, 2023 22:56
Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 11 of 17 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @mkow)

a discussion (no related file):
I did a tentative rebase (squashed all fixup commits and updated to master)



-- commits line 12 at r4:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

...cannot be mixed on the same file

Done


-- commits line 12 at r9:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Will do at final rebase

Done.

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 11 of 17 files reviewed, 8 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @mkow)

a discussion (no related file):
Dump of notes and TODOs for the future:

  • Allow mixing POSIX and BSD locks, like Linux does. The two families should basically ignore each other (as if they are acquired on different files). This should be simple to implement, but need to be careful with a "list of locks" -- it will have both, and at least POSIX locks are ordered in that list, so the ordering code should be complicated. Alternatively, we can have two "lists of locks", one for POSIX and one for BSD.
  • Fix the problem described in libos_handle.h. This will require some sort of system-wide handle refcounting. There's also an interesting asymmetry, which was probably a bad idea after all: handle_id is easy to make unique and to inherit, but it's very hard to undo (in the sense of releasing locks implicitly, when handles are closed and processes terminated).
  • clear_flock_locks() is called on every file-related handle close. This is very expensive, as it sends an IPC message to the leader. There should be optimizations to not send the IPC message. Just imagine a Python workload which opens and closes 1,000s of script files during its execution, each such close sending an IPC message to the leader (which will be most probably useless as there are no flocks on that file).
  • Both commands ("acquire shared lock", "acquire exclusive lock", "release lock") and states ("this is an exclusive lock", "this is a shared lock") are represented by the same libos_file_lock struct. This is wrong -- they should actually be two different structs. Consider for example F_UNLCK -- it is a command only and never a state; this is easily confusing for readers. Also, because we decided to reuse fcntl's flock struct and its macros, we end up with bad field names -- the type of the lock is called family, whereas the command/state of the lock is called type.
  • The currently enabled tests are trivial. We need 3-4 powerful tests that will cover complex flows: e.g. a test with 3 processes, where an intermediate process terminates. Or e.g. a test with a complex tree of processes, grabbing and releasing locks on different files. Maybe even a randomized version of such test.
  • We may want to revert changes to libos_ipc_pid.c which we made to guarantee uniqueness of handle_id. This is in case we will replace handle_id with something that doesn't require such uniqueness.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 13 files at r1, 1 of 4 files at r7, 4 of 8 files at r10, 2 of 2 files at r11, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv and @kailun-qin)


libos/test/regression/flock_lock.c line 50 at r12 (raw file):

}

static void try_flock(int fd, int operation, int expect_ret) {

ditto in all places below

Suggestion:

expected_ret

libos/test/regression/flock_lock.c line 60 at r12 (raw file):

static void try_fcntl(int fd, int operation, int expect_ret, int expect_errno) {
    struct flock fl = {
        .l_type = F_RDLCK,

This should be a parameter of this function, it's a bit weird now.

Code quote:

.l_type = F_RDLCK,

libos/test/regression/flock_lock.c line 103 at r12 (raw file):

    do {
        ret = read(pipe[0], &c, sizeof(c));
    } while (ret == -1 && errno == EINTR);

What's actually causing these EINTRs in these tests? What are this function and the one above actually doing conceptually?

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r5, 1 of 3 files at r8, 1 of 2 files at r9, 6 of 8 files at r10, 2 of 2 files at r11, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


libos/include/libos_handle.h line 151 at r12 (raw file):

     *        P2. Unfortunately, Gramine lacks system-wide tracking of handle FDs. We currently err
     *        on the side of caution and do *not* perform an operation if the process has children;
     *        this may leak resources or deadlock but at least we don't introduce unsafe behavior.

Why do we still keep this?

Code quote:

     *        on the side of caution and do *not* perform an operation if the process has children;
     *        this may leak resources or deadlock but at least we don't introduce unsafe behavior.

The `flock` implementation introduces flock (aka BSD) locks, in addition
to already-existing fcntl (aka POSIX) locks. The flock heavily reuses
the current logic of `libos_file_lock` and `dent_file_locks`. Because
flock is still under development, it is disabled by default and can be
switched on via `sys.experimental__enable_flock` manifest option.

Since flock locks are tied to the handle (aka open file description), we
introduce a new field `struct libos_file_lock::handle_id` that uniquely
identifies the handle across all Gramine processes from the same
instance. `handle_id` is composed of a PID and an ever-increasing local
counter; to guarantee uniqueness, re-use and integer wraparound of PIDs
and local counters are disallowed.

Currently flock and fcntl locks cannot be mixed on the same file.
Gramine will loudly fail if it detects such attempts. This is different
from Linux which allows such mixing.

Similarly to fcntl locks, interruptible flock locks are not supported.

Co-authored-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
Signed-off-by: Liang Ma <liang3.ma@intel.com>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
@dimakuv dimakuv force-pushed the dimakuv/flock-syscall branch from 6d084a2 to 0665da5 Compare June 30, 2023 06:09
Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 15 of 18 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @mkow)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I did a tentative rebase (squashed all fixup commits and updated to master)

And rebased again with small fixes.



libos/include/libos_handle.h line 151 at r12 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Why do we still keep this?

Done. Thanks, this was outdated.


libos/test/regression/fcntl_lock.c line 24 at r13 (raw file):

#include <unistd.h>

#include "common.h"

FYI: modifications to this file are purely for uniformity with flock_lock.c


libos/test/regression/flock_lock.c line 50 at r12 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto in all places below

Done.


libos/test/regression/flock_lock.c line 60 at r12 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This should be a parameter of this function, it's a bit weird now.

Done.


libos/test/regression/flock_lock.c line 103 at r12 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

What's actually causing these EINTRs in these tests? What are this function and the one above actually doing conceptually?

Done. You're right, there is no EINTR possible in this test (there are no signal handlers).

I simplified these helper functions. These functions are needed to synchronize several threads or several processes -- e.g. second thread waits on read_pipe() until the first thread finishes its stuff, then first thread performs write_pipe() to unblock the second thread. So a simple sync primitive.

This was taken from fcntl_lock.c test as-is. I changed that other test too, for uniformity.

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r13, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @mkow)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r13, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Dump of notes and TODOs for the future:

  • Allow mixing POSIX and BSD locks, like Linux does. The two families should basically ignore each other (as if they are acquired on different files). This should be simple to implement, but need to be careful with a "list of locks" -- it will have both, and at least POSIX locks are ordered in that list, so the ordering code should be complicated. Alternatively, we can have two "lists of locks", one for POSIX and one for BSD.
  • Fix the problem described in libos_handle.h. This will require some sort of system-wide handle refcounting. There's also an interesting asymmetry, which was probably a bad idea after all: handle_id is easy to make unique and to inherit, but it's very hard to undo (in the sense of releasing locks implicitly, when handles are closed and processes terminated).
  • clear_flock_locks() is called on every file-related handle close. This is very expensive, as it sends an IPC message to the leader. There should be optimizations to not send the IPC message. Just imagine a Python workload which opens and closes 1,000s of script files during its execution, each such close sending an IPC message to the leader (which will be most probably useless as there are no flocks on that file).
  • Both commands ("acquire shared lock", "acquire exclusive lock", "release lock") and states ("this is an exclusive lock", "this is a shared lock") are represented by the same libos_file_lock struct. This is wrong -- they should actually be two different structs. Consider for example F_UNLCK -- it is a command only and never a state; this is easily confusing for readers. Also, because we decided to reuse fcntl's flock struct and its macros, we end up with bad field names -- the type of the lock is called family, whereas the command/state of the lock is called type.
  • The currently enabled tests are trivial. We need 3-4 powerful tests that will cover complex flows: e.g. a test with 3 processes, where an intermediate process terminates. Or e.g. a test with a complex tree of processes, grabbing and releasing locks on different files. Maybe even a randomized version of such test.
  • We may want to revert changes to libos_ipc_pid.c which we made to guarantee uniqueness of handle_id. This is in case we will replace handle_id with something that doesn't require such uniqueness.

Moved these items to a proper GitHub issue: #1433


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants