Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU spin loop forever when writeAndFlush gets run from channelActive #467

Closed
weissi opened this issue Jul 9, 2024 · 2 comments · Fixed by #470
Closed

100% CPU spin loop forever when writeAndFlush gets run from channelActive #467

weissi opened this issue Jul 9, 2024 · 2 comments · Fixed by #470

Comments

@weissi
Copy link
Member

weissi commented Jul 9, 2024

NIOSSL has the following loop

            bufferedActionsLoop: while bufferedActions.hasMark {         // JW: <<<----- loop
                let element = bufferedActions.first!
                switch element {
                case .write(let bufferedWrite):
                    var data = bufferedWrite.data
                    let writeSuccessful = try self._encodeSingleWrite(buf: &data)
                    if writeSuccessful {      // JW: <<<----- `writeSuccessful` will be `false`
                        didWrite = true
                        if let promise = bufferedWrite.promise { promises.append(promise) }
                        _ = bufferedActions.removeFirst()
                    }
                case .closeOutput:
                    [...]
            }  // JW: <<<----- this will spin around again and again and again

Precisely what happens is this:

  • We're in NIOSSLHandler.channelActive
  • WebsocketClientHandshakeHander.channelActive issues a writeAndFlush
  • so we enter (in the same frame) into NIOSSLHandler.flush
  • that calls SSLConnection.writeDataToNetwork
  • that tries SSL_write (namespaced to CNIOBoringSSL_SSL_write)
  • which asks ssl_can_write which says false
  • which triggers the implicit (BoringSSL comment // If necessary, complete the handshake implicitly.) handshake through SSL_do_handshake which returns -1 (and SSL_ERROR_WANT_READ)
  • this makes writeDataToNetwork return .incomplete which makes _encodeSingleWrite return false
  • that makes the above loop loop again, forever
@weissi
Copy link
Member Author

weissi commented Jul 9, 2024

Thanks @glbrntt , this could be a reentrancy bug

    public func channelActive(context: ChannelHandlerContext) {
        // We fire this a bit early, entirely on purpose. This is because
        // in doHandshakeStep we may end up closing the channel again, and
        // if we do we want to make sure that the channelInactive message received
        // by later channel handlers makes sense.
        context.fireChannelActive()
        doHandshakeStep(context: context)
    }

So we're calling out before calling doHandshakeStep. This intern then triggers the implicit handshake in BoringSSL. Maybe we think that we'd never trigger the implicit handshake.

Possible that reversing those two lines will fix it because we never do the implicit handshake in BoringSSL.

@weissi
Copy link
Member Author

weissi commented Jul 9, 2024

NIOSSLHandler sees the following channel events:

  • handlerAdded
  • close (triggered from a Channel.close())
  • channelInactive (triggered from BaseSocketChannel.close0) [BUG HERE (channelInactive without channelActive)]
  • channelActive (triggered from BaseSocketChannel.writable -> BaseSocketChannel.finishConnect -> BaseSocketChannel.becomeActive()) [BUG HERE (channelActive after channelInactive)]

Obviously those channel events are wrong. That's a bug in NIO too.

glbrntt added a commit to glbrntt/swift-nio-ssl that referenced this issue Jul 19, 2024
Motivation:

The NIOSSLHandler can enter a spin loop in `doUnbufferActions` if
writing data into BoringSSL fails with SSL_ERROR_WANT_READ or
SSL_ERROR_WANT_WRITE.

One way these errors can happen is if a write into BoringSSL happens
prior to the handshake completing. The NIOSSLHandler is explicit about
starting the handshake: it's done in channel active and in handler added
if the channel is already active.

However, the handshaking step is currently done without any state
checking, so if the state is 'closed' (as it would be after channel
inactive) then the handshake will still start. This can happen if
channel inactive happens before channel active.

To reach the write loop there must be a buffered write and flush prior
to the handshake step starting and the state isn't 'idle' or 'handshaking'.
This can happen if a write and flush hapens while 'NIOSSLHandler' is in
'channelActive' (it forwards the 'channelActive' event _before_ starting
the handshake) and 'channelInactive' came first.

Modifications:

- Early exit from 'doHandshakeStep' if the state isn't applicable for
  starting a handshake.
- Don't buffer writes when the state is 'closed' as they'll never
  succeed and can be failed immediately.
- If the write isn't succesful in 'doUnbufferActions' then either write
  to the network or try reading.
- Only allow a limited number of spins through 'doUnbufferActions'

Result:

- Resolves apple#467
@Lukasa Lukasa closed this as completed in 7b84abb Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant