-
Notifications
You must be signed in to change notification settings - Fork 2.6k
A larger data payload in the response of a block request can't be sent back due to KeepAliveTimeout and then peer disconnected #12105
Comments
CC @arkpar |
@tomaka Could you pleas clarify when exactly Also, does @liuchengxu how fast is the network speed? Is it at least 100mbps? |
The keep-alive timeout only starts after there is no activity on a connection anymore. It's the mechanism that we use to kill inactive connections. Each protocol decides what "no activity" means. In our situation, it means no notifications protocols open and no request (in the request/response system) in progress. Looking at the issue, the logs are cropped, and I don't see any hint of the fact that the problem could be caused by the |
@liuchengxu Could you please full logs for both nodes for the same period of time. |
Here is logs of the node that Liu-Cheng's node tried to sync from (we had them added as reserved nodes of each other with |
Thanks Nazar. @arkpar For the convient review, here is the server node log: https://files.slack.com/files-pri/T03LJ85UR5G-F03UZ31SHC3/untitled.txt, the client node log: https://files.slack.com/files-pri/T03LJ85UR5G-F03V6JZ7HQC/client_node.txt Let me know if they are not sufficient. Both nodes are running from home, I doubt the network speed can reach 100mbps. |
yeah, duplicated block requests lead to us disconnecting from that node
|
To me this looks like this is a duplicate of paritytech/polkadot-sdk#531 |
I'd say this one is caused by paritytech/polkadot-sdk#531, but they are not quite the same as other slow network data transfers can potentially cause this timeout too. |
Is there an existing issue?
Experiencing problems? Have you tried our Stack Exchange first?
Description of bug
When a node requests a chunk of blocks from another node, if these blocks happen to be huge, the block request is successfully handled on the server node but failed to be sent back to the requester due to the error
KeepAliveTimeout
Here is the server node log, we can see the byte size of the first request result is 299402, which is sent successfully, the second request result size is 10534306 which failed to be sent back. The reason is that there are some data store transactions after block 64, causing the block size to increase significantly.
With interior networking and large blocks which are normal for a storage chain, this issue can occur quite often. I see a few ways to help this:
I think paritytech/polkadot-sdk#531 is related, after Timeout, we see a flood of duplicated block requests, maybe we can also do something to improve that.
Steps to reproduce
No response
The text was updated successfully, but these errors were encountered: