Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious wakeups #66

Merged
merged 18 commits into from
Aug 15, 2023
Merged

Spurious wakeups #66

merged 18 commits into from
Aug 15, 2023

Conversation

ioquatix
Copy link
Member

@ioquatix ioquatix commented Jul 29, 2023

As discussed in #65, it's possible for the list of events in IO_Event_Selector_EPoll_select to contain cancelled operations. If during cancellation, another operation is scheduled, a spurious wakeup can be generated. Adding one layer of indirection, in the form of cancelled tokens, avoids this situation, since each operation has a unique token which can independently be cancelled.

The overhead of this is extra memory allocations, but these can be pooled. We could compare the cost of malloc. There may be little point pooling these tokens.

Fixes #65.

Types of Changes

  • Bug fix.

Contribution

@ioquatix
Copy link
Member Author

ioquatix commented Aug 9, 2023

@Math2 I've implemented what I think is needed for epoll. This also avoids the dup hack by maintaining a separate list of waiting fibers (and the event they are waiting on). The next step is to try this out for kqueue and possibly uring if needed.

The array implementation is split out so we can manage the pooling of element allocations later on if it proves to be a performance issue. Unfortunately using a direct array of structures causes too much pain when reallocation invalidates existing pointers to elements of the array. Another option is to use a multi-level array, e.g. an array of chunks of 1024 descriptors or something. All the implementation can be kept in array.h to keep things organised.

Are you interested in trying to update the kqueue implementation? Otherwise I can have a go at it this weekend. By using a similar approach, we may not need to use the unique user data hack which is unavailable on BSD.

@ioquatix
Copy link
Member Author

ioquatix commented Aug 9, 2023

Note that the kernel version of GitHub actions is too old for io_uring, so it's still failing and I'll fix that separately, I still have some hope we don't need such a complex implementation and can have a better fast path.

@ioquatix
Copy link
Member Author

ioquatix commented Aug 9, 2023

axboe/liburing#918 - discussion around how pedantic we need to be w.r.t. io_uring cancellations.

@ioquatix
Copy link
Member Author

@Math2 I have some time today, so I'm going to start working on the kqueue implementation.

@ioquatix
Copy link
Member Author

kqueue is now migrated to use the new system.

I ended up combining pid and fd (ident) in the descriptor array. It makes me think it might be a bad idea, but 100,000 entries in that array is at most about a 1MB so it's not too bad.

@Math2
Copy link
Contributor

Math2 commented Aug 10, 2023

Hi. Sorry for late reply. I haven't had time to look into details yet, or do a lot of testing, but I'm going to. Fixing this was more complicated than I thought. It would have taken me forever if I had tried to do even just the KQueue part... This code is really doing something complex... with lots of details to get right.

The logic in IO_Event_Selector_EPoll_handle()/IO_Event_Selector_KQueue_handle() and associated functions seem good to me. But I'm really not used to circular linked lists so I'm having a hard time following it and confirming that the iteration is safe. But it looks like it should be.

If the "waiting" list nodes never pointed to the head/tail node in the "descriptor" in the array (by implementing a different kind of linked lists), the array could be allowed to move. But that would require to redo the linked list implementation (and callers would have to explicitly pass a pointer to the head/tail node in the descriptor for certain operations).

Maybe the "descriptor" struct could be dropped entirely. There would be no special head/tail node for the list and the FD-indexed array would point directly to some "waiting" node. And when removing a node you'd check if that's the one the FD array points to and set it to NULL if so. And the union "events" mask would be recalculated every time it's needed (there shouldn't be a lot of fibers waiting on the same IO, and if there are you have to walk the whole list to wake them up anyway). And when dispatching events, you'd look up the list by FD by the array instead of using user data pointers. I think this would allow to keep the circular list implementation.

What happens if a pid and a fd value collide in the array? It SEEMS like it might actually be safe, since it should be impossible for the same fiber to wait on both a fd and a pid... If collisions are safe, I was thinking that the pids could be masked to reduce their range to turn the array into some kind of hash table, but I think that the collisions in-between pids it could cause would not be safe. Separate arrays is probably the simplest way to deal with it.

With this patch KQueue works on FreeBSD (it was just being a little bit finicky). Hopefully it doesn't break OS X though. Unfortunately OpenBSD/NetBSD don't have EVFILT_USER. I'm going to look at what could be done for them.

diff --git c/ext/extconf.rb i/ext/extconf.rb
index a535127..8d5bfab 100755
--- c/ext/extconf.rb
+++ i/ext/extconf.rb
@@ -50,9 +50,9 @@ if have_header('sys/epoll.h')
 	$srcs << "io/event/selector/epoll.c"
 end
 
-# The order matters, because we MUST have EV_UDATA_SPECIFIC.
+# The order matters, because we MUST have EVFILT_USER.
 # The `have_header` call is just to add the -D to the compiler flags.
-if have_const('EV_UDATA_SPECIFIC', 'sys/event.h') and have_header('sys/event.h')
+if have_const('EVFILT_USER', 'sys/event.h') and have_header('sys/event.h')
 	$srcs << "io/event/selector/kqueue.c"
 end
 
diff --git c/ext/io/event/selector/kqueue.c i/ext/io/event/selector/kqueue.c
index bdbe077..c266cff 100644
--- c/ext/io/event/selector/kqueue.c
+++ i/ext/io/event/selector/kqueue.c
@@ -823,8 +823,7 @@ VALUE IO_Event_Selector_KQueue_wakeup(VALUE self) {
 		struct kevent trigger = {0};
 		
 		trigger.filter = EVFILT_USER;
-		trigger.flags = EV_ADD | EV_CLEAR | EV_UDATA_SPECIFIC;
-		trigger.fflags = NOTE_TRIGGER;
+		trigger.flags = EV_ADD | EV_CLEAR;
 		
 		int result = kevent(selector->descriptor, &trigger, 1, NULL, 0, NULL);
 		
@@ -832,6 +831,16 @@ VALUE IO_Event_Selector_KQueue_wakeup(VALUE self) {
 			rb_sys_fail("IO_Event_Selector_KQueue_wakeup:kevent");
 		}
 		
+		// FreeBSD apparently only works if the NOTE_TRIGGER is done as a separate call.
+		trigger.flags = 0;
+		trigger.fflags = NOTE_TRIGGER;
+		
+		result = kevent(selector->descriptor, &trigger, 1, NULL, 0, NULL);
+		
+		if (result == -1) {
+			rb_sys_fail("IO_Event_Selector_KQueue_wakeup:kevent");
+		}
+		
 		return Qtrue;
 	}
 	

Also, I think that EV_ONESHOT should be used when polling IO. As it is, there's nothing that makes it stop polling if the fibers don't make the condition disappear (ex, they don't read the whole input, and then stop waiting on it) and it can end up busy-looping.

diff --git i/ext/io/event/selector/kqueue.c w/ext/io/event/selector/kqueue.c
index c266cff..8941350 100644
--- i/ext/io/event/selector/kqueue.c
+++ w/ext/io/event/selector/kqueue.c
@@ -252,7 +252,7 @@ int IO_Event_Selector_KQueue_arm(struct IO_Event_Selector_KQueue *selector, uint
 	if (events & IO_EVENT_READABLE) {
 		kevents[count].ident = ident;
 		kevents[count].filter = EVFILT_READ;
-		kevents[count].flags = EV_ADD | EV_ENABLE;
+		kevents[count].flags = EV_ADD | EV_ENABLE | EV_ONESHOT;
 		kevents[count].udata = (void *)kqueue_descriptor;
 		
 // #ifdef EV_OOBAND
@@ -267,7 +267,7 @@ int IO_Event_Selector_KQueue_arm(struct IO_Event_Selector_KQueue *selector, uint
 	if (events & IO_EVENT_WRITABLE) {
 		kevents[count].ident = ident;
 		kevents[count].filter = EVFILT_WRITE;
-		kevents[count].flags = EV_ADD | EV_ENABLE;
+		kevents[count].flags = EV_ADD | EV_ENABLE | EV_ONESHOT;
 		kevents[count].udata = (void *)kqueue_descriptor;
 		count++;
 	}

Math2 added 3 commits August 11, 2023 01:11
Test case triggers a busy-loop on KQueue and exposes some issues with EPoll.
Lookup the descriptor by FDs/PIDs to avoid dangling pointers if
descriptors move.
@Math2
Copy link
Contributor

Math2 commented Aug 11, 2023

I did some improvements and added a test case that triggered the busy-loop in this branch:
https://github.com/Math2/io-event/commits/spurious-wakeups

The test case also exposes some issues that apply to EPoll.

In IO_Event_Selector_*_handle(), after a fiber is resumed, the events it was waiting on are added to waiting_events. But it might no longer be waiting on those events. So the selector will keep polling for those events. But there might not be any fibers still waiting on them at that point.

With EPoll, it'll just stop polling for them on the NEXT iteration. And the test fails because of that, it dispatches events just one more time than necessary. It's not a big deal but it'd be better to fix it if only to keep things easier to understand. But also, I think the loop could miss some events that should go in waiting_events for fibers that add themselves to the list during the iteration. The loop starts from tail, and fibers add themselves at tail, so it won't see the newly added fibers (and that's necessary to avoid endless loops since some fibers will always re-add themselves after being resumed). Better gather the waiting_events in a second pass after fibers have stopped messing around.

With KQueue, there's no EV_DELETE call anywhere so it just kept polling forever. I think EV_ONESHOT here does the trick because read/write events are handled separately. Once we got a read event, we'll necessarily have resumed all of the fibers that were waiting on reads after the loop since they're all linked together, so we can stop polling on reads altogether, and EV_ONESHOT does that for us automatically (and if fibers want to read again, they'll request a new one-shot event before blocking). So KQueue can be simplified. Since we never explicitly cancel events, it's still possible to get a stray one-shot event, but this should be harmless since it'll be checked against the waiting list, which is always up-to-date. With EPoll, EPOLLONESHOT would affect events for both reads and writes and it could make us miss events, so it has to keep track of things more carefully.

@ioquatix
Copy link
Member Author

Thanks - feel free to make a PR targeting this branch if that's easy for you, it will be easy for me.

@ioquatix
Copy link
Member Author

ioquatix commented Aug 13, 2023

Lookup the descriptor by FDs/PIDs to avoid dangling pointers if descriptors move.

This should never happen as the actual descriptor struct is malloced separately from the array, which can grow in size and be realloced. Did you observe some problems?

@Math2
Copy link
Contributor

Math2 commented Aug 14, 2023

Did you observe some problems?

Ah. No. I mistakenly thought that they were allocated directly in the array and that they could move when the array is reallocated... Yeah, 9adbd25 should be dropped. Should I make a pull request for that?

I found some other problem that I really can't figure out... This script fails with both EPoll and KQueue.

require 'async'
Sync do
	10.times do
		fail unless `echo test` == "test\n"
		fail unless $?.success?
		warn $?.inspect
	end
end

It works on main branch io-event. On this branch, it stalls after the first iteration with EPoll, and on KQueue most of the time some iteration fails because $? is nil.

With strace on Linux, it seems that pidfd_open() isn't even called in the iteration before it fails. With truss/ktrace on FreeBSD, everything looks like it should work, the kevent() returns that the process has exited, but the wait4() sometimes returns 0 for the pid. I can't make sense of this...

@ioquatix
Copy link
Member Author

There are two more things I want to fix:

  1. Explicitly track #select calls and prevent re-entrancy. This prevents invalid usage.
  2. Extend the waiting struct to support the ready events rather than sending it to the fiber as an argument. This ensures that wakeups that are unrelated to the waiting event can explicitly detect this condition.

Regarding your failing case, can you please add it as a test case to this PR? Thanks! I've added you as a maintainer if that's helpful, otherwise just make a PR and set this branch as the target.

@ioquatix
Copy link
Member Author

With KQueue, there's no EV_DELETE call anywhere so it just kept polling forever.

Yeah, I may have missed that.

The goal is to avoid using oneshot.

A common pattern looks like this:

while data = io.read
  process(data)
end

By checking the waiting AFTER returning from the fiber, we might avoid re-arming the event in the OS, i.e. it reduces the number of system calls. With one-shot, we need to re-arm the event (one system call) per iteration of the loop. But with lazy removal, we can mostly just avoid it, and if an event comes that we are not interested in any more, we can remove it at that point.

@ioquatix
Copy link
Member Author

EVFILT_USER if not present, then kqueue can use IO_Event_Interrupt helper. See interrupt.h and how it's used in epoll.c.

@Math2
Copy link
Contributor

Math2 commented Aug 14, 2023

Regarding your failing case, can you please add it as a test case to this PR?

I couldn't come up with a test case that uses io-event API. I'll try again by messing with fibers, but so far it only happens within async tasks. And not even 100% reliably. And I can't explain it at all... Maybe it's a CRuby bug somehow, no idea...

By checking the waiting AFTER returning from the fiber, we might avoid re-arming the event in the OS, i.e. it reduces the number of system calls. With one-shot, we need to re-arm the event (one system call) per iteration of the loop. But with lazy removal, we can mostly just avoid it, and if an event comes that we are not interested in any more, we can remove it at that point.

Oh OK yeah I think I get it...

My change to EPoll might be wrong then. I still think waiting_events should be gathered in a second pass, but it's not wrong to keep the events for the resumed fibers armed (like they were before). I'll look into it.

EVFILT_USER if not present, then kqueue can use IO_Event_Interrupt helper. See interrupt.h and how it's used in epoll.c.

Nice. I'll plug that in and test on OpenBSD/NetBSD once things work as they should on Linux/OS X/FreeBSD.

@ioquatix
Copy link
Member Author

My change to EPoll might be wrong then. I still think waiting_events should be gathered in a second pass, but it's not wrong to keep the events for the resumed fibers armed (like they were before). I'll look into it.

I'll take a look, but I don't think 2 passes is needed. Before enumerating, we can set x_descriptor->events to 0 and then when the fiber wants to re-arm, it just enables it. However, I also need to think about it some more.

@ioquatix
Copy link
Member Author

Okay, I think I'm happy with the epoll.c implementation now. I'm keeping track of the waiting_events and registered_events so we can do the minimal update.

@ioquatix
Copy link
Member Author

I found some other problem that I really can't figure out... This script fails with both EPoll and KQueue.

Async is part of the external test suite of io-event so if you want to add the test to async, it will be run as part of this PR.

@ioquatix
Copy link
Member Author

I believe your test script is working on this branch now - at least I tested it and it seemed okay to me.

@ioquatix ioquatix merged commit 3fc744c into main Aug 15, 2023
@ioquatix ioquatix deleted the spurious-wakeups branch August 15, 2023 13:33
@ioquatix
Copy link
Member Author

This is enough of an improvement to merge. There are still several issues to follow up on:

  • Consider preventing re-entrancy into select operations.
  • More extensive testing.
  • If EVFILT_USER is not available, use IO_Event_Interrupt helper.
  • io_uring needs additional work as it can race cancellation and CQE processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spurious resumption of fibers unwinding past their blocking operation
2 participants