Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One particular block cannot be read over websocket: "CONNECTION ERROR: Provider started to reconnect" #4016

Closed
joeytwiddle opened this issue May 1, 2021 · 9 comments · Fixed by #5820
Labels
1.x 1.0 related issues 3.x 3.0 related issues Bug Addressing a bug Investigate P2 Medium severity bugs Stale Has not received enough activity

Comments

@joeytwiddle
Copy link

joeytwiddle commented May 1, 2021

Could we document the various reasons why this error may occur? To help developers who experience this to debug their issue.

We saw this error once in the past, when our light node had crashed/frozen (due to running out of disk space). The solution that time was to bring back the node, and restart our app. (Perhaps discarding the old web3 provider and creating a new one would have been sufficient, instead of a full restart, as suggested in another issue.)

However today we had it again, and chronically. We could create a new provider and connect, and successfully request the client version and block height. But then, sometimes within 1 second of connecting, we would get this error.

This was happening on both light and fast nodes. Restarting them, and upgrading from 1.10.1 to 1.10.2 didn't help.

We have been using web3 v 1.2.11 for 6 months, but this is the first time seeing this happen. Edit: It also happens with web3 v 1.3.5.

We can actually make the following requests without any error:

      const clientVersion = await web3.eth.getNodeInfo();
      const blockNumber = await web3.eth.getBlockNumber();

But the error was triggered when we try to fetch the block data:

      const blockNumber = 12344370;
      const blockData = await web3.eth.getBlock(blockNumber, true);

This was the error we received:

Error: CONNECTION ERROR: Provider started to reconnect before the response got received!
at Object.PendingRequestsOnReconnectingError (/app/node_modules/web3-core-helpers/src/errors.js:69:16)
at /app/node_modules/web3-providers-ws/src/index.js:395:37
at Map.forEach (<anonymous>)
at WebsocketProvider.reconnect (/app/node_modules/web3-providers-ws/src/index.js:394:28)
at WebsocketProvider._onClose (/app/node_modules/web3-providers-ws/src/index.js:174:14)
at W3CWebSocket._dispatchEvent [as dispatchEvent] (/app/node_modules/yaeti/lib/EventTarget.js:115:12)
at W3CWebSocket.onClose (/app/node_modules/websocket/lib/W3CWebSocket.js:228:10)
at WebSocketConnection.<anonymous> (/app/node_modules/websocket/lib/W3CWebSocket.js:201:17)
    at WebSocketConnection.emit (events.js:314:20)
at WebSocketConnection.drop (/app/node_modules/websocket/lib/WebSocketConnection.js:475:14)
at /app/node_modules/websocket/lib/WebSocketConnection.js:303:18
    at processTicksAndRejections (internal/process/task_queues.js:79:11)

What could be the cause this time?

To get the service working again today, we switched from ws:// to http:// and that worked without issue.

In general, what are the various reasons this error might occur, and how can we test or solve each one? Thanks!

@joeytwiddle
Copy link
Author

joeytwiddle commented May 3, 2021

Update: After some testing I have found that this only occurs on that particular block!

      const blockNumber = 12344370;
      const blockData = await web3.eth.getBlock(blockNumber, true);
      // Error thrown

I have not found any other block with this same issue.

As well as happening in web3@1.2.11 it also happens in the latest version web3@1.3.5

Of course it only happens with websocket connections, not http connections.

This seems like a bug. Does it happen for you too?

@joeytwiddle joeytwiddle changed the title Document "CONNECTION ERROR: Provider started to reconnect before the response got received" One particular block cannot be read over websocket: "CONNECTION ERROR: Provider started to reconnect" May 3, 2021
@spacesailor24 spacesailor24 added 1.x 1.0 related issues 3.x 3.0 related issues Bug Addressing a bug Investigate P1 High severity bugs P2 Medium severity bugs and removed P1 High severity bugs labels May 11, 2021
@spacesailor24
Copy link
Contributor

Thank you for bringing this to our attention, if the troublesome block isn't a necessity for you, then I would put this as a medium severity bug, and we'll work on it when we have the availability

That said, I will keep this issue in mind during the rewrite (v4.x) and verify it does not perpetuate

@GregTheGreek
Copy link
Contributor

@joeytwiddle Are you able to show the logs on your node when this call is made?

@joeytwiddle
Copy link
Author

joeytwiddle commented May 21, 2021

For what it's worth, we seem to be getting the same behaviour with block 12467716.

I don't see anything special in the geth node logs. Just the usual "Imported new block headers". We have light and fast nodes.

It takes about 1.3 seconds to receive this error from these "unhealthy" blocks. That is the same time it usually takes to get a response from the healthy blocks too.

Yes, there is a workaround (use http instead of ws) so this bug isn't too severe. I'll be happy to check if the behaviour is fixed or recurring in future releases of web3.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. If you believe this was a mistake, please comment.

@github-actions github-actions bot added the Stale Has not received enough activity label Jul 21, 2021
@github-actions github-actions bot closed this as completed Aug 4, 2021
@gabuladze
Copy link

Hi everyone!
I've encountered the same error on web3 v1.5.0 when calling:
web3.eth.getTransactionReceipt('0x6ff102ba20556ff333763733d7ceaf3055eaa690c2e806f9102cad2f2a779792');

Increasing the maxReceivedFrameSize and maxReceivedMessageSize params in websocket provider options seems to have fixed the issue, which does make sense as this particular transaction: https://etherscan.io/tx/0x6ff102ba20556ff333763733d7ceaf3055eaa690c2e806f9102cad2f2a779792 has >1.9k log objects in its receipt which makes it larger then most transactions processed by my code everyday.

@joeytwiddle since you are passing true as second param to getBlock, to get block data with transaction objects, this could fix it for you too.

The maxReceivedFrameSize and maxReceivedMessageSize can be passed to provider constructor like so:

const Web3 = require('web3');
const options = {
  clientConfig: {
      maxReceivedFrameSize: 2000000, // bytes - default: 1MiB, current: 2MiB
      maxReceivedMessageSize: 10000000, // bytes - default: 8MiB, current: 10Mib
   }
};
const provider = new Web3.providers.WebsocketProvider('wss://host:port', options);
const Web3Instance = new Web3(provider);

More on config options here: https://web3js.readthedocs.io/en/v1.5.0-rc.0/web3.html#configuration

@joeytwiddle
Copy link
Author

joeytwiddle commented Aug 19, 2021

Thank you so much for tracking down the cause! It's good to know there was a logical explanation.

Ideally web3 could improve this error message.

Instead of "CONNECTION ERROR", a message like "Response exceeds maxReceivedMessageSize" would make it easier for the consumer to understand and address the problem.

Of course, I've no idea if that's easy to code or not! It looks like this error comes from the reconnect() function, rather than the code that processes requests and responses.

@joeytwiddle
Copy link
Author

Note: I tried with gabuladze's suggestion above (used some even larger values) but the error still pops up sometimes.

So we just gave up on using ws:// and only use http:// to communicate with our geth node.

@rj03hou
Copy link

rj03hou commented Feb 1, 2023

same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.x 1.0 related issues 3.x 3.0 related issues Bug Addressing a bug Investigate P2 Medium severity bugs Stale Has not received enough activity
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants