Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Make sure nodes don't hammer each other even when reserved #8901

Merged
2 commits merged into from
May 25, 2021

Conversation

tomaka
Copy link
Contributor

@tomaka tomaka commented May 25, 2021

Right now, when node A has node B as reserved, node A will try really hard to open a substream to node B.
Even if node B refuses the substream, node A will almost immediately try again. This results in a spam of substream open attempts, which unnecessarily uses bandwidth and CPU.

If we have a lot of reserved nodes, for example because of the collation and validation substreams in Polkadot, and most of them refuse the substream, this spam can be considerable. They're not supposed to refuse the substream in normal situations, but with our recent revert to 0.8.30, they do in practice.

This PR adds a 10 seconds "ban" during which we don't try to re-open a substream after it has failed to open, even if the node is reserved.

@tomaka tomaka added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels May 25, 2021
@tomaka tomaka requested review from ordian and kpp May 25, 2021 09:56
andresilva added a commit to paritytech/polkadot that referenced this pull request May 25, 2021
Copy link
Contributor

@kpp kpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some kind of dynamic backoff with randomness? If all nodes got disconnected, we will spam with requests exactly in 10 seconds.

@tomaka
Copy link
Contributor Author

tomaka commented May 25, 2021

Indeed, I changed it to a random 5 to 10 seconds interval. We actually use randomness everywhere else already.

@andresilva
Copy link
Contributor

bot merge

@ghost
Copy link

ghost commented May 25, 2021

Trying merge.

@ghost ghost merged commit 3c2bd9a into paritytech:master May 25, 2021
@tomaka tomaka deleted the try-fix-cpu branch May 25, 2021 11:13
s3krit pushed a commit that referenced this pull request May 25, 2021
* Make sure nodes don't hammer each other even when reserved

* Make the ban random
@burdges
Copy link

burdges commented May 25, 2021

Why a fixed or random back off instead of a progressively larger one?

@rphmeier
Copy link
Contributor

I foresee this potentially causing some issues with collator networking. If a validator initially refuses a connection from a collator due to a race in when they see the next relay chain block, then the collator won't attempt to connect for 10 seconds. That may lead to unnecessary gaps in the parachain.

@kpp
Copy link
Contributor

kpp commented May 25, 2021

@burdges, @rphmeier, as I see both or your suggestions can be fixed with an exponential backoff starting from, e.g. a rand 1-2s, insreasing by a multiplier of 2^N?

@tomaka
Copy link
Contributor Author

tomaka commented May 25, 2021

If a validator initially refuses a connection from a collator due to a race in when they see the next relay chain block

Validators never refuse substream from collators unless they're full.
They could accept the substream and then immediately close it, but that's a different code path.

@rphmeier
Copy link
Contributor

rphmeier commented May 25, 2021

Ok, that is true. The way that @Lldenaurois is designing the collator protocol is to maintain a peer-set which is larger than the actual reservoir of peers we accept. As we add PreVF handshake logic we'd need to maintain that guarantee, so we don't reject connections outright.

nazar-pc pushed a commit to autonomys/substrate that referenced this pull request Aug 8, 2021
…h#8901)

* Make sure nodes don't hammer each other even when reserved

* Make the ban random
jordy25519 pushed a commit to cennznet/substrate that referenced this pull request Sep 17, 2021
…h#8901)

* Make sure nodes don't hammer each other even when reserved

* Make the ban random
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants