Skip to content
This repository has been archived by the owner on Apr 4, 2024. It is now read-only.

Websocket blocks after terminating without unsubscribing #821

Closed
thomas-nguy opened this issue Dec 10, 2021 · 4 comments · Fixed by #933
Closed

Websocket blocks after terminating without unsubscribing #821

thomas-nguy opened this issue Dec 10, 2021 · 4 comments · Fixed by #933

Comments

@thomas-nguy
Copy link
Contributor

thomas-nguy commented Dec 10, 2021

Describe the bug
As discussed on Discord https://discord.com/channels/783264383978569728/783264902613565450/907267013028749322

It was discovered that subscribing to WS works.

However, if a subscription is not unsubscribed and the session is terminated, then the WS blocks up and no further requests will be entertained. It doesn't help to reconnect and unsubscribe.

Also, in the instance where 2 clients are connected and if Client A terminates the session without unsubscribing, Client B blocks as well, and the whole WS blocks.

To Reproduce
Steps to reproduce the behavior:

  1. Have a local node running
  2. On CLI wscat -c ws://localhost:8546 (response connected)
  3. {"jsonrpc":"2.0", "id": 1, "method": "eth_subscribe", "params": ["newPendingTransactions"]} (respond with suscription id)
  4. Ctrl+C to force exit
  5. Repeat 2-3, no response in step 3
  6. Restart node
  7. Repeat steps 2-3
  8. {"id": 1, "method": "eth_unsubscribe", "params": ["(subscriptionid)"]}
  9. Ctrl+C to exit.
  10. Repeat steps 2-3, subscription works

Expected behavior
WS should handle disconnection without blocking.

Screenshots
NA

Desktop (please complete the following information):
Node is Linux

Original issue
crypto-org-chain/cronos#207

@github-actions
Copy link

This issue is stale because it has been open 45 days with no activity. Remove Status: Stale label or comment or this will be closed in 7 days.

@acryptosx
Copy link

I am getting this issue as well. eth_subscribe just does not work.

@odeke-em odeke-em reopened this Feb 9, 2022
@odeke-em
Copy link
Contributor

odeke-em commented Feb 9, 2022

I am getting this issue as well. eth_subscribe just does not work.

@acryptosx could you please provide more detail of how to reproduce your problem? The steps provided in the opening issue statement do not reproduce the bug. @fedekunze is there a reliably producer on y'all end? Offline via chat, I had kindly asked for some repro as this seems to be on the production mainnet. Please help with some context to reproduce the problem then I can dig in.

@odeke-em
Copy link
Contributor

odeke-em commented Feb 9, 2022

I sat down down just now and tried to read through the bug reports and I think I have found the credible problem after auditing the code, and it exactly matches the scenarios mentioned. The problem comes about because there is a Read lock that's invoked, then on an error we try to invoke a .Lock() and the docs for sync.RWMutex explicitly call out this issue https://pkg.go.dev/sync#RWMutex
Screen Shot 2022-02-09 at 4 27 33 PM

and here is the sequence of problems

1. Subscribe and api.filtersMu.RLock()

https://github.com/tharsis/ethermint/blob/7b22a535569f558dc9083e0833915c745c68be43/rpc/websockets.go#L692

2. Client exits without unsubscribing

Without unsubscribing, when the node tries to write back to the client it encounters an error https://github.com/tharsis/ethermint/blob/7b22a535569f558dc9083e0833915c745c68be43/rpc/websockets.go#L710

3. On error, api.filtersMu.Lock()

Invoking that Lock without the read lock being undone unfortunately locks up forever https://github.com/tharsis/ethermint/blob/7b22a535569f558dc9083e0833915c745c68be43/rpc/websockets.go#L714-L715
and as specified by Go per https://pkg.go.dev/sync#RWMutex
Screen Shot 2022-02-09 at 4 27 33 PM

and the whole set falls apart per this repro https://go.dev/play/p/3Y2ajlNTZ2b which is going to show a deadlock
and this is the whole problematic coupling
https://github.com/tharsis/ethermint/blob/7b22a535569f558dc9083e0833915c745c68be43/rpc/websockets.go#L692-L716

Remedy

Invoke that .Runlock() before the .Lock()

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants