-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: interruptible POSIX locks #12
Comments
I prefer option 4. Option 3 sounds like an overkill to fix this issue (and similar "amend/cancel pending operation" issues). Splitting our nice By the way, what is the |
I agree; on the other hand, option 4 sounds over-specific. But since it seems cheap, maybe it doesn't hurt to try it out. I'll try finishing my branch with option 4 and we'll see how it looks.
It was supposed to be the original message ( |
How does 4. work with both messages? Will we wait for responses for both? (original lock + cancel msg) |
I would say you cannot get away from using something like: struct X x;
add_to_global_list(&x);
ipc_send_msg(get_lock);
thread_wait();
if (!check_if_condition_happend(&x, is_lock_taken, atomic_wrt_callbacks=True)) {
ipc_send_msg(cancel_get_lock);
}
remove_from_global_list(&x);
return check_if_condition_happend(&x, is_lock_taken, atomic_wrt_callbacks=False); and have some callbacks on lock_get response messages. This might sound a bit like reimplementing responses, but this seems to be a pretty special case. |
OK, looks like no easy win here. On the other hand, reimplementing responses doesn't seem so bad: an IPC abstraction for "long-term operations" might be exactly what I need. I'll think about it more. |
@pwmarcz It's quite possible that we need to add some more advanced IPC primitives than we have right now. |
I'm trying to fix recently added POSIX locks (gramineproject/graphene#2481) so that you can interrupt them, but I ran into a problem trying to do it with current IPC. I don't want to jump and change too much before I learn your opinion.
@boryspoplawski @mkow @dimakuv
The problem
Graphene's current implementation of POSIX locks (
fcntl
) do not support interrupting the request:The
fcntl(F_SETLKW)
in the child process should wait for 10 seconds and then fail with EINTR (after the process receives SIGALRM). Under Graphene, it hangs indefinitely.So far, I've seen this checked by LTP (test
fcntl16
), and I'm pretty sure it's used bystress-ng
(see gramineproject/graphene#2510). I'm not sure how much it's used in actual applications, but interrupting thefcntl
call seems to be the only sensible way of trying to take a lock with timeout.How to interrupt an operation over IPC?
Interrupting the lock operation is easy in the single-process version. However, if we are in a remote process, the operation is requested over IPC, and it's not that easy to achieve the same semantics.
The current IPC implementation does something like this:
As long as
ipc_send_message_and_get_response
keeps retrying, we have no way to return earlier. I've been thinking about some solutions:Stop waiting?: You could be tempted to modify the
ipc_send_message_and_get_response()
function to not wait (perhaps with someretry = false
parameter). However, this does not actually cancel the operation, so the lock will be taken eventually. This is wrong - we want theF_SETLKW
operation to fail without side effects.Stop waiting and issue another call? We could send another IPC message to cancel the pending operation. Keep in mind that the operation might get finished before the "cancel" message is delivered, so we need to learn what the actual result is.
The problem with that is that if the operation does get finished in the meantime, the remote side has to keep storing the result in case we receive
POSIX_LOCK_CANCEL
. So we would need some way of cleaning up that result.Also, if the response to
POSIX_LOCK_SET
is sent in the meantime, we are going to just drop it.Keep waiting for the original response? Ideally, we want to always get the response for the original message, but sometimes we want to trigger the response to come early. So we could register a wait, but on EINTR, send additional message in the meantime, then return to the previous wait:
I like that, but it makes the IPC API more complicated: instead of one function that does everything, there are separate stages (first send a message and register a wait, then wait for response). That means managing some state between calls (a "waiter" registered for the IPC worker) that currently is local to
ipc_send_message_and_get_response
.Add an
on_interrupt
callback? The above could be simplified by keeping theipc_send_message_and_get_response
as is, but making it perform some additional action when the wait is interrupted.This would certainly be easiest to use, but it's a pretty specific use-case. I think it's worth it only if we expect to use it again soon.
So I think either 3 or 4 are viable. But maybe I'm missing something and there is an easier way?
The text was updated successfully, but these errors were encountered: