guide: collator networking & subsystems #1452

rphmeier · 2020-07-22T05:36:56Z

Keeping #1348 in mind, although not yet specifying the features of incremental validation, whitelist/blacklisting, heatmaps, etc. which I think bear further discussion & roll-out plans as well as lower-level design work and interfaces. So this PR is primarily setting up the basic logic and structure the code will have.

TODO:

Note validator look-up and discovery that collators use
Describe pull protocol for collations
Network bridge: expand to introduce a split between underlying network protocols for validators & collators, utilities for peer discovery.

roadmap/implementers-guide/src/node/collators/collator-protocol.md

mxinden · 2020-07-31T07:14:50Z

roadmap/implementers-guide/src/node/utility/network-bridge.md

+- Determine the DHT keys to use for each validator based on the relay-chain state and Runtime API.
+- Recover the Peer IDs of the validators from the DHT. There may be more than one peer ID per validator.
+- Accumulate all `(ValidatorId, PeerId)` pairs and send on the response channel.
+- Feed all Peer IDs to the discovery utility the underlying network provides.


What is meant with discovery utility here? Would the line below work as well?

Suggested change

- Feed all Peer IDs to the discovery utility the underlying network provides.

- Add one `PeerId` per validator as a priority group to the `PeerSet`.

I guess so, but I talked with Pierre and we wanted to do away with this priority groups thing. The previous line seems more general and can be adapted based on what is actually done.

The guide in general leans away from implementation details of Substrate. I believe priority groups are such a detail.

Suggested change

- Feed all Peer IDs to the discovery utility the underlying network provides.

- Feed all Peer IDs to the peer set manager the underlying network provides.

In case we want to keep it generic I would suggest the following. I don't think discovery utility is the right term here as the peer has already been discovered at this point.

addressed in #1535

tomaka · 2020-07-31T08:21:34Z

roadmap/implementers-guide/src/node/collators/collator-protocol.md

+	/// Request the advertised collation at that relay-parent.
+	RequestCollation(RequestId, Hash, ParaId),
+	/// A requested collation.
+	Collation(RequestId, CandidateReceipt, PoV),


This should instead be a request-response-style protocol, but I suppose this can be changed later?
As long as collators are honest, it's ok to send collations through notifications.

Yeah, it can be changed later. As I understand, request/response protocols are not implemented yet.

* master: guide: collator networking & subsystems (#1452) Guide: add a diagram for Inclusion Pipeline & Approval Subsystem (#1457) [CI] Build wasm blob with srtool and include prop hashes and blobs in release notes (#1506)

infinity0

Looks reasonable, main thing missing is talking to other parachain validators.

infinity0 · 2020-07-31T15:04:56Z

roadmap/implementers-guide/src/types/overseer-protocol.md

+/// Peer-sets handled by the network bridge.
+enum PeerSet {
+	/// The collation peer-set is used to distribute collations from collators to validators.
+	Collation,


Perhaps I am interpreting "PeerSet" incorrectly, but with collator networking for validators there are actually two types of neighbours:

collators, a one-to-one protocol

other parachain validators, as described in "Validator-validator communication" in Parachain / Collator networking design proposal #1348, a gossip protocol

Then there is "Passing to the relay chain" which is done on the main relay chain gossip protocol that already exists, with its own set of neighbours that (as you note below) can include non-validators, and also validators on another parachain.

For what's specified, we only need collator<>validator communication. The validator<>validator aspects of distributing whitelists etc. are not handled in this version.

Other aspects of parachain networking (distributing PoVs among parachain group, gossiping statements) are beyond the scope of this subsystem and do indeed use the Validation peerset.

In this context, PoV blocks are not supposed to be distributed to the Validation peerset; that is the purpose of the A&V protocol. Here they are only supposed to be distributed to other parachain validators, so that they can sign attestations for them, and is less traffic overall.

If we do not have this component, then parachain collators will need to send the same PoV block to multiple parachain validators, in order to achieve the minimum number of attestations needed for the block production protocol. By having parachain validators also pass this between each other, we alleviate this bottleneck. I think this is fairly important to have even in an early version of the protocol.

Also, you are using both the terms "collation" and "PoV block" separately. Are you saying they are different things? Because I understood them to be the same thing.

Here they are only supposed to be distributed to other parachain validators, so that they can sign attestations for them, and is less traffic overall.

Yes, I understood this. And it is covered by another part of the code, as I mentioned. The flow is Collator->Validator. Validator chooses collation to second (Candidate Selection). Signs an attestation (Candidate Backing) and circulates the PoV to other members of the same group (PoV Distribution). Other members of the group validate and sign attestations (Candidate Backing). Signing an attestation also implies keeping the data available. If the candidate is backed, then Availability Distribution is used to distribute the erasure-coded pieces.

Also, you are using both the terms "collation" and "PoV block" separately. Are you saying they are different things? Because I understood them to be the same thing.

Collation is (CandidateReceipt, PoV)

Yes, I understood this. And it is covered by another part of the code, as I mentioned.

OK, what you said makes sense although I am still confused by the docstring on "Validation", it says:

This may include nodes which are not validators, as some protocols on this peer-set are expected to be gossip.

This makes sense for the main gossip network on the relay chain, for GRANDPA/BABE etc. However it does not make sense for PoV distribution - here you are only distributing to other parachain validators, not non-validators nor validators on other parachains.

@infinity0 We can't guarantee at this point that we have transitive connection among the validator set. I guarantee that, in practice, if we excluded full nodes we would probably not achieve parachain liveness.

Are there plans to guarantee these connections? PoV blocks can get quite large, so gossiping these via non-validators will add a lot of latency - and also non-validator full nodes are untrusted, so this allows them to spam validators that are hoping to receive these objects.

In this case, the data are authenticated, because they must be presented alongside a Seconded candidate by a validator. Barring validator equivocations, the amount of data is bounded.

infinity0 · 2020-07-31T15:07:09Z

roadmap/implementers-guide/src/node/collators/collation-generation.md

 ## Functionality

-## Jobs, if any
+The process of generating a collation for a parachain is very parachain-specific. As such, the details of how to do so are left beyond the scope of this description. The subsystem should be implemented as an abstract wrapper, which is aware of this configuration:


Are validation/pre-validation functions mentioned anywhere else? If not they could go here.

Yup, they would go here. However I've deferred that to a later point.

infinity0 · 2020-07-31T15:12:46Z

roadmap/implementers-guide/src/node/collators/collation-generation.md

+	key: CollatorPair,
+	collation_producer: Fn(params) -> async (HeadData, Vec<UpwardMessage>, PoV),
+}
+```


Are you intending that collator nodes extend the polkadot client & recompile? It might be easier to have a local parachain-specific process talk to a polkadot node acting as a collator via some interprocess API, but that's a discussion for much later.

Are you intending that collator nodes extend the polkadot client & recompile

Basically, yes. That's how the current Cumulus architecture works, at least. Some IPC-based split would also be amenable to this approach, with the collation_producer yielding an IPC future.

infinity0 · 2020-07-31T15:20:42Z

roadmap/implementers-guide/src/node/collators/collator-protocol.md

+When acting on an advertisement, we issue a `WireMessage::RequestCollation`. If the request times out, we need to note the collator as being unreliable and reduce its priority relative to other collators. And then make another request - repeat until we get a response or the chain has moved on.
+
+As a validator, once the collation has been fetched some other subsystem will inspect and do deeper validation of the collation. The subsystem will report to this subsystem with a [`CollatorProtocolMessage`][CPM]`::ReportCollator` or `NoteGoodCollation` message. In that case, if we are connected directly to the collator, we apply a cost to the `PeerId` associated with the collator and potentially disconnect or blacklist it.
+


As mentioned in my other comment on PeerSet, validators should also be connected to a few other validators on the same parachain, and forward received collations onto them. (The details of neighbour selection are mentioned in #1348)

Other validators are more trusted than collators, so the protocol wire message here only needs to consist of the actual Collation, but if it's simpler to re-use the collator-validator wire message for now, it wouldn't hurt. However in the future once we get onto passing around whitelists/blacklists of collators, the message types would have to diverge.

As mentioned in my other comment on PeerSet, validators should also be connected to a few other validators on the same parachain, and forward received collations onto them

Agreed, but this is handled by the PoV distribution and Statement distribution subsystems, not collation distribution.

infinity0 · 2020-07-31T15:25:15Z

roadmap/implementers-guide/src/types/overseer-protocol.md

+Messages received by the [Collator Protocol subsystem](../node/collators/collator-protocol.md)
+
+```rust
+enum CollatorProtocolMessage {


"Protocol" suggests that this message is passed to other nodes, and is somewhat stable, but as I understand this is just an internal message between subsystems and could change whenever. How about CollatorInfoMessage or just CollatorInfo?

The subsystem is called the Collator Protocol subsystem. Our naming convention for these message types is format!("{}Message", subsystem_name)

rphmeier added 5 commits July 22, 2020 00:20

Do a small write-up on collation-generation

075c675

preamble to collator protocol

0e9dec1

notes on protocol

0017dbe

collation-generation: point to collator protocol

54cc765

fix missing bracket

2b079ab

rphmeier added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Jul 22, 2020

rphmeier added 3 commits July 22, 2020 17:50

Merge branch 'master' into rh-guide-collator-networking

d091813

expand on collator protocol wire protocol

31092a1

add a couple more sentences

4bbc358

This was referenced Jul 23, 2020

Implement: Collation Generation Subsystem #1464

Closed

Update network bridge to support collation peer-set #1463

Closed

Implement past-session validator discovery APIs for Network Bridge #1461

Closed

rphmeier added 6 commits July 23, 2020 20:05

expand on requests some more

17d999c

go higher level

1d529d5

network bridge: note peerset

7700b25

note peer-set = validation for protocols

6eb751c

add ConnectToValidators message

a4c85e7

use ConnectToValidators in collator protocol

8535f73

rphmeier added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Jul 24, 2020

rphmeier marked this pull request as ready for review July 24, 2020 01:09

rphmeier requested a review from mxinden July 25, 2020 16:26

mxinden reviewed Jul 28, 2020

View reviewed changes

roadmap/implementers-guide/src/node/collators/collator-protocol.md Outdated Show resolved Hide resolved

rphmeier added 3 commits July 28, 2020 11:18

typo

f08fe27

remove references to sentry nodes

9880f43

Merge branch 'master' into rh-guide-collator-networking

2c5f0f5

mxinden approved these changes Jul 31, 2020

View reviewed changes

tomaka reviewed Jul 31, 2020

View reviewed changes

rphmeier merged commit 0bcb6f9 into master Jul 31, 2020

rphmeier deleted the rh-guide-collator-networking branch July 31, 2020 15:07

infinity0 reviewed Jul 31, 2020

View reviewed changes

rphmeier mentioned this pull request Aug 4, 2020

Network Bridge Refactoring #1535

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

guide: collator networking & subsystems #1452

guide: collator networking & subsystems #1452

rphmeier commented Jul 22, 2020 •

edited

Loading

mxinden Jul 31, 2020

rphmeier Jul 31, 2020

rphmeier Jul 31, 2020

mxinden Aug 3, 2020

rphmeier Aug 4, 2020

tomaka Jul 31, 2020 •

edited

Loading

rphmeier Jul 31, 2020

infinity0 left a comment

infinity0 Jul 31, 2020

rphmeier Jul 31, 2020

infinity0 Jul 31, 2020

infinity0 Jul 31, 2020

rphmeier Jul 31, 2020 •

edited

Loading

infinity0 Aug 3, 2020

rphmeier Aug 3, 2020

infinity0 Aug 3, 2020

rphmeier Aug 3, 2020

infinity0 Jul 31, 2020

rphmeier Jul 31, 2020

infinity0 Jul 31, 2020

rphmeier Jul 31, 2020

infinity0 Jul 31, 2020

rphmeier Jul 31, 2020

infinity0 Jul 31, 2020

rphmeier Jul 31, 2020

	- Feed all Peer IDs to the discovery utility the underlying network provides.
	- Add one `PeerId` per validator as a priority group to the `PeerSet`.

	- Feed all Peer IDs to the discovery utility the underlying network provides.
	- Feed all Peer IDs to the peer set manager the underlying network provides.

		When acting on an advertisement, we issue a `WireMessage::RequestCollation`. If the request times out, we need to note the collator as being unreliable and reduce its priority relative to other collators. And then make another request - repeat until we get a response or the chain has moved on.

		As a validator, once the collation has been fetched some other subsystem will inspect and do deeper validation of the collation. The subsystem will report to this subsystem with a [`CollatorProtocolMessage`][CPM]`::ReportCollator` or `NoteGoodCollation` message. In that case, if we are connected directly to the collator, we apply a cost to the `PeerId` associated with the collator and potentially disconnect or blacklist it.

guide: collator networking & subsystems #1452

guide: collator networking & subsystems #1452

Conversation

rphmeier commented Jul 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomaka Jul 31, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

infinity0 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rphmeier Jul 31, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rphmeier commented Jul 22, 2020 •

edited

Loading

tomaka Jul 31, 2020 •

edited

Loading

rphmeier Jul 31, 2020 •

edited

Loading