Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Rework priority groups #7374

Closed
wants to merge 64 commits into from
Closed

Conversation

tomaka
Copy link
Contributor

@tomaka tomaka commented Oct 22, 2020

Close #6087
Relates to #7074

This PR isn't finished. Before finishing it, I'd like to get feedback about the general approach.

This PR implements what I commented about here.

When initializing the network, one will now be able to pass an additional list of "peer sets".
Rather than modifying a priority group, the authority-discovery module, and Polkadot, would now modify the appropriate "peer set". Priority groups are no longer a thing after this PR.

So what is the difference between a priority group and a peer set?

  • Each peer set has its own inbound and outbound slots, independent from the "main" inbound/outbound slots used for syncing. In practice, this guarantees that validators would always have slots reserved for collators, which is the main motivation behind this change.
  • Each peer set is tied to a specific list of notifications protocol. When we allocate a slot for a certain peer in a certain set, we open the corresponding notifications substream. When the notifications substreams are closed, we deallocate a slot. See also Change sc_network::protocol::generic_proto::behaviour to occupy inbound peerset slots for substreams and not connections #7074. This means that we can be "connected to" (i.e. have a substream) a peer through a certain set, while remaining "disconnected" (i.e. no substream) through a different set.

That second point isn't implemented yet, as it requires deeper changes in the behaviour/handler layer, as explained here.

A consequence of this change is that the authority discovery module should no longer pass a random sample of the list of validators, but should pass the full list instead. The peerset would now be responsible for choosing that random sample.

This PR is also a step towards #3310, as we would now be able to maintain a peer set for each chain.

@tomaka tomaka added A3-in_progress Pull request is in progress. No review needed at this stage. B3-apinoteworthy C1-low PR touches the given topic and has a low impact on builders. labels Oct 22, 2020
@tomaka tomaka requested review from ordian and mxinden October 22, 2020 12:08
@ordian
Copy link
Member

ordian commented Oct 22, 2020

Each peer set has its own inbound and outbound slots, independent from the "main" inbound/outbound slots used for syncing. In practice, this guarantees that validators would always have slots reserved for collators, which is the main motivation behind this change.

So how are priority groups different? Do they occupy only inbound or outbound slots?

IIUC from the polkadot's (validator discovery) pespective, the change would be as simple as renaming *_priority_group to *_peer_set.

@tomaka
Copy link
Contributor Author

tomaka commented Oct 22, 2020

So how are priority groups different? Do they occupy only inbound or outbound slots?

At the moment, the node as a whole has 25 inbound slots and 25 outbound slots (by default).
When you add a node in a priority group, the node tries to connect to it "more" than it tries to connect to the other nodes, but it still occupies of these 25+25 slots.

The problem with priority groups is that, when it comes to collation, a validator doesn't know who the collators are. It can't set the priority group to the list of collators. Therefore, there's a possibility that a collator tries to connect to a validator, but gets denied because all the slots of the validator are occupied by regular relay chain nodes.

After this PR, and after #7074, 25+25 slots would be attributed to relay chain nodes, and, for example, 10+10 slots would be attributed specifically to collators.

IIUC from the polkadot's (validator discovery) pespective, the change would be as simple as renaming *_priority_group to *_peer_set.

That's what I understand as well.

Copy link
Contributor

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to get feedback about the general approach.

I am in favor of this more granular approach.

IIUC from the polkadot's (validator discovery) pespective, the change would be as simple as renaming *_priority_group to *_peer_set.

Do I understand correctly that each peer set would have (a) a set of variable nodes where a subset is chosen based on the min and max boundaries of the set and (b) a set of reserved nodes that would always be connected? If that is true and in case you were only passing a subset of nodes from Polkadot down, you could now pass all of them down, having the PeerSetManager decide who to connect to.

reserved_nodes.insert(validator.peer_id.clone());
known_addresses.push((validator.peer_id.clone(), validator.multiaddr.clone()));
let peerset_config = {
let mut sets = Vec::with_capacity(1 + params.network_config.extra_sets.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore as pr still in progress: I doubt pre-allocating will bring us a performance gain, given that this is only called once in the lifetime of a node and relatively small. Am I missing something?

@@ -967,7 +984,7 @@ impl<B: BlockT + 'static, H: ExHashT> NetworkService<B, H> {
.unbounded_send(ServiceToWorkerMsg::SyncFork(peers, hash, number));
}

/// Modify a peerset priority group.
/// Modify a set of peers from the peerset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would one still be allowed to create new peer sets via set_peer_set?

client/peerset/src/lib.rs Outdated Show resolved Hide resolved
client/peerset/src/lib.rs Outdated Show resolved Hide resolved
#[derive(Debug)]
pub struct SetConfig {
/// Name of the set. Used to later refer to that set.
pub name: &'static str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that set names have to be known at compile time?

Does Polkadot know all set names at compile time (//CC @ordian)? E.g. when a validator is assigned to a specific parachain, would it use a prefixed priority group or reuse the same from the previous assignment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed one of the aspects I'm hesitating about. To me it feels more correct to indicate ahead of time the list of overlay networks, and Polkadot would indeed use a set called something like "current-validators-rotation".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it feels more correct to indicate ahead of time the list of overlay networks

Otherwise, there isn't really a clean way to configure the number of slots of each set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that set names have to be known at compile time?

Does Polkadot know all set names at compile time (//CC @ordian)? E.g. when a validator is assigned to a specific parachain, would it use a prefixed priority group or reuse the same from the previous assignment?

That's a good question, I don't think we have a design for it now. It depends on how peer-sets are working, but for now we have one large group that is changing over time.

Copy link
Contributor Author

@tomaka tomaka Nov 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention behind this refactor is indeed to use one "peers set" and change its content.

@ordian
Copy link
Member

ordian commented Nov 23, 2020

Do I understand correctly that each peer set would have (a) a set of variable nodes where a subset is chosen based on the min and max boundaries of the set and (b) a set of reserved nodes that would always be connected? If that is true and in case you were only passing a subset of nodes from Polkadot down, you could now pass all of them down, having the PeerSetManager decide who to connect to.

Sorry I forgot to reply. Currently, we have only one priority group https://github.com/paritytech/polkadot/blob/7ac0353728611643d1e807eb7563e6e00cd11260/node/network/bridge/src/validator_discovery.rs#L31, this is used by collators to connect to validators. This priority group changes over time.

It will also be used by validators to connect to other validators: paritytech/polkadot#1990. My understanding is that priority groups unlike topics are local, meaning each node can have its own priority groups independent of others. Is that correct? Or maybe I misunderstanding how it works

In practice, this guarantees that validators would always have slots reserved for collators, which is the main motivation behind this change.

@tomaka
Copy link
Contributor Author

tomaka commented Nov 23, 2020

It will also be used by validators to connect to other validators: paritytech/polkadot#1990. My understanding is that priority groups unlike topics are local, meaning each node can have its own priority groups independent of others. Is that correct? Or maybe I misunderstanding how it works

Indeed. Contrary to the notification protocol names, the names and content of the priority groups are never communicated over the network.

I'm however considering changing the API to use indices rather than names.

wheresaddie and others added 2 commits November 23, 2020 16:23
Co-authored-by: Max Inden <mail@max-inden.de>
Co-authored-by: Max Inden <mail@max-inden.de>
@tomaka tomaka marked this pull request as ready for review December 8, 2020 16:44
@tomaka
Copy link
Contributor Author

tomaka commented Dec 8, 2020

Marked as "ready for review" in order for CI to run. Sorry for the noise, please ignore. This isn't ready for review yet.

@tomaka
Copy link
Contributor Author

tomaka commented Dec 9, 2020

Closing in favour of #7700

@tomaka tomaka closed this Dec 9, 2020
@tomaka tomaka deleted the rework-priority-groups branch December 9, 2020 11:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A3-in_progress Pull request is in progress. No review needed at this stage. C1-low PR touches the given topic and has a low impact on builders.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Improve priority groups
4 participants