Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't deliver events for unregistered fds #341

Merged
merged 1 commit into from
Apr 20, 2018

Conversation

weissi
Copy link
Member

@weissi weissi commented Apr 19, 2018

Motivation:

Since forever we had a major bug in the Selector: In this condition:

  • kqueue/epoll had many events
  • in one of the earlier events we unregister a Channel whose fd is on of
    the later events
  • we subsequently (still in the same event loop tick) register a new
    channel which gets the same fd as the previously closed one

then we would deliver an event that was meant for a previous channel to
a newly opened one.

Thanks to @mcdappdev for hitting this bug, helping us debug it and also
providing a repeatedly working repro.

Modifications:

if during event delivery any fd gets unregistered, we stop delivering
the remaining events and rely on the selector to redeliver them
again next time.

Result:

we don't deliver events for previously closed channels to new ones.

@@ -540,7 +543,8 @@ final class Selector<R: Registration> {
Int(try KQueue.kevent(kq: self.fd, changelist: nil, nchanges: 0, eventlist: events, nevents: Int32(eventsCapacity), timeout: ts))
}

for i in 0..<ready {
self.deregistrationsHappened = false
for i in 0..<ready where !self.deregistrationsHappened {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as noted offline I think while this should work we should treat this as workaround as this depends highly on the "level-triggered" semantics of kqueue / epoll.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This should be thought of as a temporary fix to avoid the nasty behaviour: we should come up with a better long-term solution for dealing with this in a future release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agreed. Let’s get this in (I actually found another bug (and a missing assertion) whilst fixing it) and then fix properly. I agree we shouldn’t rely on it being level triggered here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me

@weissi weissi added this to the 1.5.1 milestone Apr 19, 2018
@weissi weissi force-pushed the jw-fix-delivery-of-unregistered-fds branch from 27b4ff0 to 9d951f1 Compare April 19, 2018 20:32
@@ -259,6 +259,7 @@ final class Selector<R: Registration> {
private var eventsCapacity = 64
private var events: UnsafeMutablePointer<EventType>
private var registrations = [Int: R]()
private var deregistrationsHappened: Bool = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put a doc comment on this guy indicating that it's a temporary fix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

@@ -500,7 +502,8 @@ final class Selector<R: Registration> {
ready = Int(try Epoll.epoll_wait(epfd: self.fd, events: events, maxevents: Int32(eventsCapacity), timeout: -1))
}

for i in 0..<ready {
self.deregistrationsHappened = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't think we need to do this for epoll. I realised this this morning, but epoll generally only returns each FD once, with a mask of all event flags for that FD.

This means that our epoll code is immune to this bug: the channel is concretely held to dispatch all the events to it. This also suggests a sensible long-term fix: we should coalesce the kqueue notifications so that each channel has events dispatched to it only once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s actually a very good point 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not true because (just like the test does), if you close lots of fds and then open lots of fds, there might very well be events for file descriptors that are no longer valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doh... true :(

@weissi weissi force-pushed the jw-fix-delivery-of-unregistered-fds branch from 9d951f1 to 755bc7a Compare April 20, 2018 11:48
Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, mostly looks really good. Some notes inline.

@@ -776,6 +781,9 @@ class BaseSocketChannel<T: BaseSocket>: SelectableChannel, ChannelCore {
}

final func readEOF() {
print("readEOF \(self)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print statement left behind.

@@ -259,6 +259,7 @@ final class Selector<R: Registration> {
private var eventsCapacity = 64
private var events: UnsafeMutablePointer<EventType>
private var registrations = [Int: R]()
private var deregistrationsHappened: Bool = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

// impossible to get a read notification in the very same event loop tick that you got registered
XCTAssertTrue(self.hasReConnectEventLoopTickFinished.value)

// TODO: check the bytebuffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

}

func channelRead(ctx: ChannelHandlerContext, data: NIOAny) {
// we expect these channels to be get data only a while after the re-connect event loop tick as it's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/to be get/to get/

func userInboundEventTriggered(ctx: ChannelHandlerContext, event: Any) {
// this is the `.readEOF` that is triggered by the `ServerHandler`'s `close` calls because our channel
// supports half-closure
if self.allChannels.value.count == SelectorTest.testWeDoNotDeliverEventsForPreviouslyClosedChannels_numberOfChannelsToUse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be cleaner as guard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not

Note: if you changed anything about the pipeline's handler adding/removal
you might also have a bug there.
""")
return ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need the return statement.

@@ -776,6 +781,9 @@ class BaseSocketChannel<T: BaseSocket>: SelectableChannel, ChannelCore {
}

final func readEOF() {
print("readEOF \(self)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi remove

private static let testWeDoNotDeliverEventsForPreviouslyClosedChannels_numberOfChannelsToUse = 10
func testWeDoNotDeliverEventsForPreviouslyClosedChannels() {
/// We use this class to box mutable values, generally in this test anything boxed should only be read/written
/// on the event loop `el`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi should we use a computed property and check MultiThreadedEventLoopThread.currentEventLoop returns what we expect ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure why not

case didNotReadGotReadComplete
}

/// This handler is inserted in the channels that are re-connected. So we're closing a bunch of channels and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... is inserted in the ChannelPipeline of the channels that are re-connected


private let didReadPromise: EventLoopPromise<Void>
private var didRead: Bool = false
private let hasReConnectEventLoopTickFinished: Box<Bool>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we group the let together ?

}
let el = MultiThreadedEventLoopGroup(numThreads: 1).next()
defer {
try! el.syncShutdownGracefully()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XCTAssertNoThrow(...) ?

}
}
templateBytes.removeLast()
let udsTempDir = String(decoding: templateBytes, as: UTF8.self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just return.. not net to create the let first.

@weissi weissi force-pushed the jw-fix-delivery-of-unregistered-fds branch from 755bc7a to f57010d Compare April 20, 2018 13:05
@weissi
Copy link
Member Author

weissi commented Apr 20, 2018

@normanmaurer / @Lukasa all addressed

@weissi weissi force-pushed the jw-fix-delivery-of-unregistered-fds branch 2 times, most recently from 7f1cadf to c7c8b6f Compare April 20, 2018 14:05
Lukasa
Lukasa previously requested changes Apr 20, 2018
assert(eventLoop.inEventLoop)
assert(self.lifecycleManager.isRegistered)

guard !self.lifecycleManager.hasSeenEOFNotification else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a guard or an assert?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa this will be run if the user does ctx.read()

channel.pipeline.add(handler: countingHandler)
}
.bind(host: "127.0.0.1", port: 0)
.thenIfError {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely bind isn't going to error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed

Copy link
Member

@normanmaurer normanmaurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Motivation:

Since forever we had a major bug in the Selector: In this condition:

- kqueue/epoll had many events
- in one of the earlier events we unregister a Channel whose fd is on of
  the later events
- we subsequently (still in the same event loop tick) register a new
  channel which gets the same fd as the previously closed one

then we would deliver an event that was meant for a previous channel to
a newly opened one.

Thanks to @mcdappdev for hitting this bug, helping us debug it and also
providing a repeatedly working repro.

Modifications:

if during event delivery any fd gets unregistered, we stop delivering
the remaining events and rely on the selector to redeliver them
again next time.

Result:

we don't deliver events for previously closed channels to new ones.
@weissi weissi force-pushed the jw-fix-delivery-of-unregistered-fds branch from c7c8b6f to 01b5e71 Compare April 20, 2018 14:33
@weissi
Copy link
Member Author

weissi commented Apr 20, 2018

did over the shoulder review with @Lukasa who is happy, so pressing the 'dismiss review' button :)

@weissi weissi dismissed Lukasa’s stale review April 20, 2018 15:19

because he's happy

@weissi weissi merged commit 902b18d into apple:master Apr 20, 2018
@weissi weissi deleted the jw-fix-delivery-of-unregistered-fds branch April 20, 2018 15:19
@Lukasa Lukasa added the semver/patch No public API change. label Apr 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver/patch No public API change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants