Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket RPC client should automatically reconnect if becomes broken #2640

Closed
msgmaxim opened this issue Dec 16, 2022 · 2 comments · Fixed by #2791
Closed

Websocket RPC client should automatically reconnect if becomes broken #2640

msgmaxim opened this issue Dec 16, 2022 · 2 comments · Fixed by #2791
Assignees
Labels

Comments

@msgmaxim
Copy link
Contributor

msgmaxim commented Dec 16, 2022

It seems that there are currently two issues:

  1. WS connections (silently?) become inactive: we stop receiving messages until the process is restarted. Instead, we should be able to detect inactivity and manually reconnect. (This is quite common for network connections that are meant to be persistent, and it is what we already do for p2p.)

  2. When we request a new subscription and get an error instead, the program crashes. This seems to be by far the most common reason our nodes crash and fail ceremonies. We should handle this more gracefully and attempt to recover. (See, for example: link to logs)

It seems that (2) is related to (1): subscription failure seems to happen when the underlying WS connection is broken.

@msgmaxim msgmaxim added the CFE label Dec 16, 2022
@msgmaxim msgmaxim self-assigned this Dec 16, 2022
@kylezs
Copy link
Contributor

kylezs commented Dec 16, 2022

It seems this is also an issue for Polkadot, but before fixing it for Polkadot, I want to fix the other issues for the Polkadot witnesssing because we may use a different client entirely (i.e. our own instead of subxt)

@kylezs
Copy link
Contributor

kylezs commented Jan 23, 2023

Having a look at this, it seems like this might be a good time to move over to the more well-maintained, and now generally regarded as the better Rust library, ethers-rs. For example, last week they added this: gakonst/ethers-rs#1915 which would seem to make it pretty trivial to implement retry logic.

There are also a lot of other nice things you get out of the box with ethers-rs like exponentially backed off retry requests.

Been something @AlastairHolmes and I have been thinking about doing for a while, this now seems like a good reason (and time) to do so. If we were to implement with web3, we'd effectively end up copying a page of their source code, and then adding modifications to reconnect. Using ethers-rs, the details required to gracefully reconnect are provided. Pinging @dandanlen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants