You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WS connections (silently?) become inactive: we stop receiving messages until the process is restarted. Instead, we should be able to detect inactivity and manually reconnect. (This is quite common for network connections that are meant to be persistent, and it is what we already do for p2p.)
When we request a new subscription and get an error instead, the program crashes. This seems to be by far the most common reason our nodes crash and fail ceremonies. We should handle this more gracefully and attempt to recover. (See, for example: link to logs)
It seems that (2) is related to (1): subscription failure seems to happen when the underlying WS connection is broken.
The text was updated successfully, but these errors were encountered:
It seems this is also an issue for Polkadot, but before fixing it for Polkadot, I want to fix the other issues for the Polkadot witnesssing because we may use a different client entirely (i.e. our own instead of subxt)
Having a look at this, it seems like this might be a good time to move over to the more well-maintained, and now generally regarded as the better Rust library, ethers-rs. For example, last week they added this: gakonst/ethers-rs#1915 which would seem to make it pretty trivial to implement retry logic.
There are also a lot of other nice things you get out of the box with ethers-rs like exponentially backed off retry requests.
Been something @AlastairHolmes and I have been thinking about doing for a while, this now seems like a good reason (and time) to do so. If we were to implement with web3, we'd effectively end up copying a page of their source code, and then adding modifications to reconnect. Using ethers-rs, the details required to gracefully reconnect are provided. Pinging @dandanlen
It seems that there are currently two issues:
WS connections (silently?) become inactive: we stop receiving messages until the process is restarted. Instead, we should be able to detect inactivity and manually reconnect. (This is quite common for network connections that are meant to be persistent, and it is what we already do for p2p.)
When we request a new subscription and get an error instead, the program crashes. This seems to be by far the most common reason our nodes crash and fail ceremonies. We should handle this more gracefully and attempt to recover. (See, for example: link to logs)
It seems that (2) is related to (1): subscription failure seems to happen when the underlying WS connection is broken.
The text was updated successfully, but these errors were encountered: