-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spurious wakeups #66
Spurious wakeups #66
Conversation
39cf1a8
to
3f6f33c
Compare
9f9be40
to
c6dc9c3
Compare
c6dc9c3
to
88884a2
Compare
@Math2 I've implemented what I think is needed for The array implementation is split out so we can manage the pooling of element allocations later on if it proves to be a performance issue. Unfortunately using a direct array of structures causes too much pain when reallocation invalidates existing pointers to elements of the array. Another option is to use a multi-level array, e.g. an array of chunks of 1024 descriptors or something. All the implementation can be kept in Are you interested in trying to update the |
Note that the kernel version of GitHub actions is too old for |
axboe/liburing#918 - discussion around how pedantic we need to be w.r.t. |
@Math2 I have some time today, so I'm going to start working on the |
I ended up combining |
Hi. Sorry for late reply. I haven't had time to look into details yet, or do a lot of testing, but I'm going to. Fixing this was more complicated than I thought. It would have taken me forever if I had tried to do even just the KQueue part... This code is really doing something complex... with lots of details to get right. The logic in IO_Event_Selector_EPoll_handle()/IO_Event_Selector_KQueue_handle() and associated functions seem good to me. But I'm really not used to circular linked lists so I'm having a hard time following it and confirming that the iteration is safe. But it looks like it should be. If the "waiting" list nodes never pointed to the head/tail node in the "descriptor" in the array (by implementing a different kind of linked lists), the array could be allowed to move. But that would require to redo the linked list implementation (and callers would have to explicitly pass a pointer to the head/tail node in the descriptor for certain operations). Maybe the "descriptor" struct could be dropped entirely. There would be no special head/tail node for the list and the FD-indexed array would point directly to some "waiting" node. And when removing a node you'd check if that's the one the FD array points to and set it to NULL if so. And the union "events" mask would be recalculated every time it's needed (there shouldn't be a lot of fibers waiting on the same IO, and if there are you have to walk the whole list to wake them up anyway). And when dispatching events, you'd look up the list by FD by the array instead of using user data pointers. I think this would allow to keep the circular list implementation. What happens if a pid and a fd value collide in the array? It SEEMS like it might actually be safe, since it should be impossible for the same fiber to wait on both a fd and a pid... If collisions are safe, I was thinking that the pids could be masked to reduce their range to turn the array into some kind of hash table, but I think that the collisions in-between pids it could cause would not be safe. Separate arrays is probably the simplest way to deal with it. With this patch KQueue works on FreeBSD (it was just being a little bit finicky). Hopefully it doesn't break OS X though. Unfortunately OpenBSD/NetBSD don't have EVFILT_USER. I'm going to look at what could be done for them. diff --git c/ext/extconf.rb i/ext/extconf.rb
index a535127..8d5bfab 100755
--- c/ext/extconf.rb
+++ i/ext/extconf.rb
@@ -50,9 +50,9 @@ if have_header('sys/epoll.h')
$srcs << "io/event/selector/epoll.c"
end
-# The order matters, because we MUST have EV_UDATA_SPECIFIC.
+# The order matters, because we MUST have EVFILT_USER.
# The `have_header` call is just to add the -D to the compiler flags.
-if have_const('EV_UDATA_SPECIFIC', 'sys/event.h') and have_header('sys/event.h')
+if have_const('EVFILT_USER', 'sys/event.h') and have_header('sys/event.h')
$srcs << "io/event/selector/kqueue.c"
end
diff --git c/ext/io/event/selector/kqueue.c i/ext/io/event/selector/kqueue.c
index bdbe077..c266cff 100644
--- c/ext/io/event/selector/kqueue.c
+++ i/ext/io/event/selector/kqueue.c
@@ -823,8 +823,7 @@ VALUE IO_Event_Selector_KQueue_wakeup(VALUE self) {
struct kevent trigger = {0};
trigger.filter = EVFILT_USER;
- trigger.flags = EV_ADD | EV_CLEAR | EV_UDATA_SPECIFIC;
- trigger.fflags = NOTE_TRIGGER;
+ trigger.flags = EV_ADD | EV_CLEAR;
int result = kevent(selector->descriptor, &trigger, 1, NULL, 0, NULL);
@@ -832,6 +831,16 @@ VALUE IO_Event_Selector_KQueue_wakeup(VALUE self) {
rb_sys_fail("IO_Event_Selector_KQueue_wakeup:kevent");
}
+ // FreeBSD apparently only works if the NOTE_TRIGGER is done as a separate call.
+ trigger.flags = 0;
+ trigger.fflags = NOTE_TRIGGER;
+
+ result = kevent(selector->descriptor, &trigger, 1, NULL, 0, NULL);
+
+ if (result == -1) {
+ rb_sys_fail("IO_Event_Selector_KQueue_wakeup:kevent");
+ }
+
return Qtrue;
}
Also, I think that EV_ONESHOT should be used when polling IO. As it is, there's nothing that makes it stop polling if the fibers don't make the condition disappear (ex, they don't read the whole input, and then stop waiting on it) and it can end up busy-looping. diff --git i/ext/io/event/selector/kqueue.c w/ext/io/event/selector/kqueue.c
index c266cff..8941350 100644
--- i/ext/io/event/selector/kqueue.c
+++ w/ext/io/event/selector/kqueue.c
@@ -252,7 +252,7 @@ int IO_Event_Selector_KQueue_arm(struct IO_Event_Selector_KQueue *selector, uint
if (events & IO_EVENT_READABLE) {
kevents[count].ident = ident;
kevents[count].filter = EVFILT_READ;
- kevents[count].flags = EV_ADD | EV_ENABLE;
+ kevents[count].flags = EV_ADD | EV_ENABLE | EV_ONESHOT;
kevents[count].udata = (void *)kqueue_descriptor;
// #ifdef EV_OOBAND
@@ -267,7 +267,7 @@ int IO_Event_Selector_KQueue_arm(struct IO_Event_Selector_KQueue *selector, uint
if (events & IO_EVENT_WRITABLE) {
kevents[count].ident = ident;
kevents[count].filter = EVFILT_WRITE;
- kevents[count].flags = EV_ADD | EV_ENABLE;
+ kevents[count].flags = EV_ADD | EV_ENABLE | EV_ONESHOT;
kevents[count].udata = (void *)kqueue_descriptor;
count++;
} |
Test case triggers a busy-loop on KQueue and exposes some issues with EPoll.
Lookup the descriptor by FDs/PIDs to avoid dangling pointers if descriptors move.
I did some improvements and added a test case that triggered the busy-loop in this branch: The test case also exposes some issues that apply to EPoll. In With EPoll, it'll just stop polling for them on the NEXT iteration. And the test fails because of that, it dispatches events just one more time than necessary. It's not a big deal but it'd be better to fix it if only to keep things easier to understand. But also, I think the loop could miss some events that should go in With KQueue, there's no |
Thanks - feel free to make a PR targeting this branch if that's easy for you, it will be easy for me. |
This should never happen as the actual descriptor struct is |
Ah. No. I mistakenly thought that they were allocated directly in the array and that they could move when the array is reallocated... Yeah, 9adbd25 should be dropped. Should I make a pull request for that? I found some other problem that I really can't figure out... This script fails with both EPoll and KQueue. require 'async'
Sync do
10.times do
fail unless `echo test` == "test\n"
fail unless $?.success?
warn $?.inspect
end
end It works on main branch io-event. On this branch, it stalls after the first iteration with EPoll, and on KQueue most of the time some iteration fails because With strace on Linux, it seems that pidfd_open() isn't even called in the iteration before it fails. With truss/ktrace on FreeBSD, everything looks like it should work, the kevent() returns that the process has exited, but the wait4() sometimes returns 0 for the pid. I can't make sense of this... |
This reverts commit 9adbd25.
There are two more things I want to fix:
Regarding your failing case, can you please add it as a test case to this PR? Thanks! I've added you as a maintainer if that's helpful, otherwise just make a PR and set this branch as the target. |
Yeah, I may have missed that. The goal is to avoid using oneshot. A common pattern looks like this: while data = io.read
process(data)
end By checking the waiting AFTER returning from the fiber, we might avoid re-arming the event in the OS, i.e. it reduces the number of system calls. With one-shot, we need to re-arm the event (one system call) per iteration of the loop. But with lazy removal, we can mostly just avoid it, and if an event comes that we are not interested in any more, we can remove it at that point. |
This reverts commit 56d18c7.
|
I couldn't come up with a test case that uses io-event API. I'll try again by messing with fibers, but so far it only happens within async tasks. And not even 100% reliably. And I can't explain it at all... Maybe it's a CRuby bug somehow, no idea...
Oh OK yeah I think I get it... My change to EPoll might be wrong then. I still think
Nice. I'll plug that in and test on OpenBSD/NetBSD once things work as they should on Linux/OS X/FreeBSD. |
I'll take a look, but I don't think 2 passes is needed. Before enumerating, we can set |
Okay, I think I'm happy with the |
|
I believe your test script is working on this branch now - at least I tested it and it seemed okay to me. |
This is enough of an improvement to merge. There are still several issues to follow up on:
|
As discussed in #65, it's possible for the list of events in
IO_Event_Selector_EPoll_select
to contain cancelled operations. If during cancellation, another operation is scheduled, a spurious wakeup can be generated. Adding one layer of indirection, in the form of cancelled tokens, avoids this situation, since each operation has a unique token which can independently be cancelled.The overhead of this is extra memory allocations, but these can be pooled. We could compare the cost of
malloc
. There may be little point pooling these tokens.Fixes #65.
Types of Changes
Contribution