-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Make sure nodes don't hammer each other even when reserved #8901
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some kind of dynamic backoff with randomness? If all nodes got disconnected, we will spam with requests exactly in 10 seconds.
Indeed, I changed it to a random 5 to 10 seconds interval. We actually use randomness everywhere else already. |
bot merge |
Trying merge. |
* Make sure nodes don't hammer each other even when reserved * Make the ban random
Why a fixed or random back off instead of a progressively larger one? |
I foresee this potentially causing some issues with collator networking. If a validator initially refuses a connection from a collator due to a race in when they see the next relay chain block, then the collator won't attempt to connect for 10 seconds. That may lead to unnecessary gaps in the parachain. |
Validators never refuse substream from collators unless they're full. |
Ok, that is true. The way that @Lldenaurois is designing the collator protocol is to maintain a peer-set which is larger than the actual reservoir of peers we accept. As we add PreVF handshake logic we'd need to maintain that guarantee, so we don't reject connections outright. |
…h#8901) * Make sure nodes don't hammer each other even when reserved * Make the ban random
…h#8901) * Make sure nodes don't hammer each other even when reserved * Make the ban random
Right now, when node A has node B as reserved, node A will try really hard to open a substream to node B.
Even if node B refuses the substream, node A will almost immediately try again. This results in a spam of substream open attempts, which unnecessarily uses bandwidth and CPU.
If we have a lot of reserved nodes, for example because of the collation and validation substreams in Polkadot, and most of them refuse the substream, this spam can be considerable. They're not supposed to refuse the substream in normal situations, but with our recent revert to 0.8.30, they do in practice.
This PR adds a 10 seconds "ban" during which we don't try to re-open a substream after it has failed to open, even if the node is reserved.