Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(websocket-websys): change close code in drop implementation to 1000 #5229

Merged
merged 3 commits into from
Mar 25, 2024
Merged

fix(websocket-websys): change close code in drop implementation to 1000 #5229

merged 3 commits into from
Mar 25, 2024

Conversation

sisou
Copy link
Contributor

@sisou sisou commented Mar 14, 2024

Description

In browsers, only the code 1000 and codes between 3000-4999 are allowed to be set by userland code: https://websockets.spec.whatwg.org/#dom-websocket-close

I found this while debugging use-after-free errors in our WASM application. Turns out, when connections are timing out, libp2p in rust drops them, but does not seem to close them explicitly (we are using libp2p-swarm). This led to these WebSockets still emitting events, the handlers of which were already dropped on the rust side, though.

A long investigation led me to have a look into the Result that gets returned from close_with_code_and_reason, and it turns out it's an error! Specifically:

InvalidAccessError: Failed to execute 'close' on 'WebSocket': The code must be either 1000, or between 3000 and 4999. 1001 is neither.

This PR only fixes the failing closing of the WebSocket, not the remaining use-after-free problem.

Notes & open questions

Not ignoring error results

Can Err results not be ignored, but instead logged? That would have shown this problem a long time ago.

Use-after-free event handlers

Calling .close() on the WebSocket leads to the WebSocket still emitting its "close" event (and also an "error" event, if the connection is still pending and ran into the time-out), for which the websocket-websys transport registers a handler. This handler is dropped though together with the connection, so when the event comes into WASM, the handler is already free'd and this error is thrown (by wasm-bindgen?):

Error: closure invoked recursively or after being dropped

This error is not critical in browsers (only annoying in the console) but exits NodeJS programs due to an uncaught exception.

A quick-fix would be to unregister the relevant event handlers in the Drop implementation, like so:

self.inner.socket.set_onclose(None);
self.inner.socket.set_onerror(None);

According to @nibhar, there must be a better way, though. He suggests storing the socket's event handlers outside the connection, having them clean-up themselves when the onclose handler is called.

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@sisou
Copy link
Contributor Author

sisou commented Mar 15, 2024

Here is a basic reproduction which I used to create this PR: https://github.com/sisou/libp2p-websocket-reproduction

Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, and thanks for the thorough report! Looks good to me, can you also add an entry in the CHANGELOG.md?
cc @thomaseizinger you were the one adding the code, you may want to review it, and probably have an opinion on the use after free situation.

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that is surprising!

Great find. Yeah needs a version bump and changelog entry.

@jsdanielh
Copy link
Contributor

Wow, that is surprising!

Great find. Yeah needs a version bump and changelog entry.

Thanks for reviewing it. We just added the version bump and changelog entry.

transports/websocket-websys/CHANGELOG.md Outdated Show resolved Hide resolved
In browsers, only the code 1000 and codes between 3000-4999 are allowed to be set by userland code: https://websockets.spec.whatwg.org/#dom-websocket-close
@jxs
Copy link
Member

jxs commented Mar 23, 2024

Thanks! Just waiting on #5253 so we can get CI green again

@jxs jxs added the send-it label Mar 25, 2024
@mergify mergify bot merged commit 0687385 into libp2p:master Mar 25, 2024
71 of 72 checks passed
@jxs
Copy link
Member

jxs commented Mar 25, 2024

@jsdanielh do you require a release?

@jsdanielh
Copy link
Contributor

@jsdanielh do you require a release?

Yeah, that would be great

guillaumemichel pushed a commit that referenced this pull request Mar 28, 2024
In browsers, only the code 1000 and codes between 3000-4999 are allowed to be set by userland code: https://websockets.spec.whatwg.org/#dom-websocket-close

I found this while debugging use-after-free errors in our WASM application. Turns out, when connections are timing out, libp2p in rust drops them, but does not seem to close them explicitly (we are using libp2p-swarm). This led to these WebSockets still emitting events, the handlers of which were already dropped on the rust side, though.

A long investigation led me to have a look into the `Result` that gets returned from `close_with_code_and_reason`, and it turns out it's an error! Specifically:
```
InvalidAccessError: Failed to execute 'close' on 'WebSocket': The code must be either 1000, or between 3000 and 4999. 1001 is neither.
```

This PR only fixes the failing closing of the WebSocket, not the remaining use-after-free problem.

Pull-Request: #5229.
@jxs
Copy link
Member

jxs commented May 1, 2024

sorry for the delay @jsdanielh and thanks for your patience.
A libp2p-websocket-websys version has now been released, see #5351 for more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants