-
Notifications
You must be signed in to change notification settings - Fork 1.6k
guide: collator networking & subsystems #1452
Conversation
roadmap/implementers-guide/src/node/collators/collator-protocol.md
Outdated
Show resolved
Hide resolved
- Determine the DHT keys to use for each validator based on the relay-chain state and Runtime API. | ||
- Recover the Peer IDs of the validators from the DHT. There may be more than one peer ID per validator. | ||
- Accumulate all `(ValidatorId, PeerId)` pairs and send on the response channel. | ||
- Feed all Peer IDs to the discovery utility the underlying network provides. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is meant with discovery utility
here? Would the line below work as well?
- Feed all Peer IDs to the discovery utility the underlying network provides. | |
- Add one `PeerId` per validator as a priority group to the `PeerSet`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so, but I talked with Pierre and we wanted to do away with this priority groups thing. The previous line seems more general and can be adapted based on what is actually done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guide in general leans away from implementation details of Substrate. I believe priority groups are such a detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Feed all Peer IDs to the discovery utility the underlying network provides. | |
- Feed all Peer IDs to the peer set manager the underlying network provides. |
In case we want to keep it generic I would suggest the following. I don't think discovery utility
is the right term here as the peer has already been discovered at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in #1535
/// Request the advertised collation at that relay-parent. | ||
RequestCollation(RequestId, Hash, ParaId), | ||
/// A requested collation. | ||
Collation(RequestId, CandidateReceipt, PoV), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should instead be a request-response-style protocol, but I suppose this can be changed later?
As long as collators are honest, it's ok to send collations through notifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it can be changed later. As I understand, request/response protocols are not implemented yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable, main thing missing is talking to other parachain validators.
/// Peer-sets handled by the network bridge. | ||
enum PeerSet { | ||
/// The collation peer-set is used to distribute collations from collators to validators. | ||
Collation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I am interpreting "PeerSet" incorrectly, but with collator networking for validators there are actually two types of neighbours:
- collators, a one-to-one protocol
- other parachain validators, as described in "Validator-validator communication" in Parachain / Collator networking design proposal #1348, a gossip protocol
Then there is "Passing to the relay chain" which is done on the main relay chain gossip protocol that already exists, with its own set of neighbours that (as you note below) can include non-validators, and also validators on another parachain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what's specified, we only need collator<>validator communication. The validator<>validator aspects of distributing whitelists etc. are not handled in this version.
Other aspects of parachain networking (distributing PoVs among parachain group, gossiping statements) are beyond the scope of this subsystem and do indeed use the Validation
peerset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this context, PoV blocks are not supposed to be distributed to the Validation
peerset; that is the purpose of the A&V protocol. Here they are only supposed to be distributed to other parachain validators, so that they can sign attestations for them, and is less traffic overall.
If we do not have this component, then parachain collators will need to send the same PoV block to multiple parachain validators, in order to achieve the minimum number of attestations needed for the block production protocol. By having parachain validators also pass this between each other, we alleviate this bottleneck. I think this is fairly important to have even in an early version of the protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you are using both the terms "collation" and "PoV block" separately. Are you saying they are different things? Because I understood them to be the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here they are only supposed to be distributed to other parachain validators, so that they can sign attestations for them, and is less traffic overall.
Yes, I understood this. And it is covered by another part of the code, as I mentioned. The flow is Collator->Validator. Validator chooses collation to second (Candidate Selection). Signs an attestation (Candidate Backing) and circulates the PoV to other members of the same group (PoV Distribution). Other members of the group validate and sign attestations (Candidate Backing). Signing an attestation also implies keeping the data available. If the candidate is backed, then Availability Distribution is used to distribute the erasure-coded pieces.
Also, you are using both the terms "collation" and "PoV block" separately. Are you saying they are different things? Because I understood them to be the same thing.
Collation is (CandidateReceipt, PoV)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understood this. And it is covered by another part of the code, as I mentioned.
OK, what you said makes sense although I am still confused by the docstring on "Validation", it says:
This may include nodes which are not validators, as some protocols on this peer-set are expected to be gossip.
This makes sense for the main gossip network on the relay chain, for GRANDPA/BABE etc. However it does not make sense for PoV distribution - here you are only distributing to other parachain validators, not non-validators nor validators on other parachains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@infinity0 We can't guarantee at this point that we have transitive connection among the validator set. I guarantee that, in practice, if we excluded full nodes we would probably not achieve parachain liveness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there plans to guarantee these connections? PoV blocks can get quite large, so gossiping these via non-validators will add a lot of latency - and also non-validator full nodes are untrusted, so this allows them to spam validators that are hoping to receive these objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the data are authenticated, because they must be presented alongside a Seconded
candidate by a validator. Barring validator equivocations, the amount of data is bounded.
## Functionality | ||
|
||
## Jobs, if any | ||
The process of generating a collation for a parachain is very parachain-specific. As such, the details of how to do so are left beyond the scope of this description. The subsystem should be implemented as an abstract wrapper, which is aware of this configuration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are validation/pre-validation functions mentioned anywhere else? If not they could go here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, they would go here. However I've deferred that to a later point.
key: CollatorPair, | ||
collation_producer: Fn(params) -> async (HeadData, Vec<UpwardMessage>, PoV), | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you intending that collator nodes extend the polkadot client & recompile? It might be easier to have a local parachain-specific process talk to a polkadot node acting as a collator via some interprocess API, but that's a discussion for much later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you intending that collator nodes extend the polkadot client & recompile
Basically, yes. That's how the current Cumulus architecture works, at least. Some IPC-based split would also be amenable to this approach, with the collation_producer
yielding an IPC future.
When acting on an advertisement, we issue a `WireMessage::RequestCollation`. If the request times out, we need to note the collator as being unreliable and reduce its priority relative to other collators. And then make another request - repeat until we get a response or the chain has moved on. | ||
|
||
As a validator, once the collation has been fetched some other subsystem will inspect and do deeper validation of the collation. The subsystem will report to this subsystem with a [`CollatorProtocolMessage`][CPM]`::ReportCollator` or `NoteGoodCollation` message. In that case, if we are connected directly to the collator, we apply a cost to the `PeerId` associated with the collator and potentially disconnect or blacklist it. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in my other comment on PeerSet, validators should also be connected to a few other validators on the same parachain, and forward received collations onto them. (The details of neighbour selection are mentioned in #1348)
Other validators are more trusted than collators, so the protocol wire message here only needs to consist of the actual Collation
, but if it's simpler to re-use the collator-validator wire message for now, it wouldn't hurt. However in the future once we get onto passing around whitelists/blacklists of collators, the message types would have to diverge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in my other comment on PeerSet, validators should also be connected to a few other validators on the same parachain, and forward received collations onto them
Agreed, but this is handled by the PoV distribution and Statement distribution subsystems, not collation distribution.
Messages received by the [Collator Protocol subsystem](../node/collators/collator-protocol.md) | ||
|
||
```rust | ||
enum CollatorProtocolMessage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Protocol" suggests that this message is passed to other nodes, and is somewhat stable, but as I understand this is just an internal message between subsystems and could change whenever. How about CollatorInfoMessage
or just CollatorInfo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subsystem is called the Collator Protocol subsystem. Our naming convention for these message types is format!("{}Message", subsystem_name)
Closes #1355
Keeping #1348 in mind, although not yet specifying the features of incremental validation, whitelist/blacklisting, heatmaps, etc. which I think bear further discussion & roll-out plans as well as lower-level design work and interfaces. So this PR is primarily setting up the basic logic and structure the code will have.
TODO: