Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Prepare block authoring for asynchronous backing #2267

Closed
bkchr opened this issue Mar 2, 2023 · 12 comments
Closed

Prepare block authoring for asynchronous backing #2267

bkchr opened this issue Mar 2, 2023 · 12 comments
Labels
I7-refactor Code needs refactoring.

Comments

@bkchr
Copy link
Member

bkchr commented Mar 2, 2023

Block authoring for Parachains currently works in the following way:

  1. A new relay chain block is imported.
  2. The collation generation subsystem checks if the core associated to the parachain is free and if yes, continues.
  3. Collation generation calls our collator callback to generate a PoV.
  4. Authoring logic determines if the current node should build a PoV.
  5. Build new PoV and give it back to collation generation.

With asynchronous backing this will be more complicated as block production isn't bound to importing a relay chain block anymore. Parachains will build new blocks in fixed time frames at standalone chains are doing this, e.g. every 6 seconds. To support this we will need separate the logic that determines when to build a block from the logic that determines on which relay chain block to build.

For determining on when to build a new block we can reuse the slots logic from Substrate. We will let it run with the requested slot duration of the Parachain. Then we will implement a custom SlotWorker. Every this slot worker is triggered we will need to trigger some logic to determine the next relay chain block to build on top of. This logic should be generic and should support the asynchronous/synchronous backing. It will return the relay chain block in which context the block should be build on and the parachain block to build on top of.

For synchronous backing we will check the best relay chain block to check if the core of our parachain is free. Relative simple and easy. The advantage of this logic is that we will not have Parachain forks anymore as we are building only on one relay chain block and also very likely on the block that the network is seeing as its best block (assuming all blocks of the same height already propagated through the network). However, as currently we start with the block production directly after importing the relay chain block, we will may start block production later which could make it more complicated to include big PoVs into the relay chain as there is less time to send them to the validators. The parachain slot should be calculated based on the timestamp and this should be calculated using relay_chain_slot * slot_duration.

For asynchronous backing we will be more free to choose the block to build on as we can also build on older relay chain blocks as well. We will probably need some kind of runtime api for the Parachain to check if we want to build on a given relay chain block. So, for example to reject building too many parachain blocks on the same relay chain block. The parachain slot should be calculated based on the timestamp and this should be calculating using relay_chain_slot * slot_duration + parachain_slot_duration * unincluded_segment_len.

@bkchr bkchr added the I7-refactor Code needs refactoring. label Mar 2, 2023
@bkchr
Copy link
Member Author

bkchr commented Mar 2, 2023

We will also need to forward unincluded_segment_len > 0 to the inherent generation to skip including messages as they would be already included in the first block build on the associated relay chain block.

@rphmeier
Copy link
Contributor

rphmeier commented Mar 3, 2023

For determining on when to build a new block we can reuse the slots logic from Substrate. We will let it run with the requested slot duration of the Parachain. Then we will implement a custom SlotWorker. Every this slot worker is triggered we will need to trigger some logic to determine the next relay chain block to build on top of. This logic should be generic and should support the asynchronous/synchronous backing. It will return the relay chain block in which context the block should be build on and the parachain block to build on top of.

IMO slots are not necessarily the right primitive, but fine for a first attempt.

The "correct" thing would be to start authoring immediately after the last parablock in some way, as long as authoring is allowed. And if not, then to wait until the next moment that authoring is allowed (relay-chain inclusion)

@bkchr
Copy link
Member Author

bkchr commented Mar 3, 2023

How do we determine the next author?

@rphmeier
Copy link
Contributor

rphmeier commented Mar 3, 2023

If we can set slot_number = relay_parent_number + unincluded_segment_len or something like that, we'd have a "2-dimensional" Aura where each relay-parent has up to N sequential authors. Authors would not be guaranteed unique as new relay-parents come in, so we'd have more parachain forks and orphan blocks due to race conditions. Not ideal and I'm sure better solutions exist, but this seems fine for an initial implementation.

@burdges
Copy link

burdges commented Mar 6, 2023

We do still have sequencing among parachain blocks that touch the same parachain state root though, which heavily influences actually building the blocks.

We could break this model by sharding the state among parachains, but afaik you should think of these as a family of parachains with common code and deeper shared state, not individual parachains. In particular, you might've UTXOs which exist anywhere in the family, but designate one particular spending chain for deduplication, so spends proves existence in the family, correctness of the spending chain, etc.

You could've mix these two models of course, but this requires more relay chain logic.

@rphmeier
Copy link
Contributor

rphmeier commented Mar 8, 2023

https://forum.eigenlayer.xyz/t/multiplicity-a-gadget-for-multiple-concurrent-block-proposers/316 <- some EigenLayer research which might be helpful as we go forum. This idea of having a pre-collation quorum with multiple batches is interesting.

@rphmeier
Copy link
Contributor

rphmeier commented Mar 8, 2023

It seems like we are reaching the limits of pallet_aura and sc_consensus_aura, which weren't really designed for this use-case.

@rphmeier
Copy link
Contributor

rphmeier commented Mar 8, 2023

Here's a very minimal proposal for modifications to make asynchronous backing work for parachains.

Although it'd be nice to dive deeper and come up with something highly specialized and suitable for elastic scaling as well.

New approach:

  • Allow multiple blocks with the same slot from the same author.
  • Collators will not create blocks beyond the maximum depth set by the parachain runtime.
  • Collators will attempt to author whenever they either receive a new relay-chain block or import a new parachain block.
  • We expect parachains to set max-depth to 1 (all that’s necessary until elastic scaling: 1 block pending availability + 1 block prospective beyond that)
  • This can be highly forkful, but prioritizes liveness/continuity over minimizing orphan blocks. We may be able to reduce forks with some heuristics,
    • heuristic 1: waiting some time before building a depth-1 block where the parent is not a block authored locally. This helps ensure smooth ‘handovers’
    • heuristic 2: not building depth-1 blocks when very few or no transactions are pending.

Minimal Changeset:

  • Remove pallet_aura and pallet_aura_ext from runtimes.
  • Introduce an alternative pallet_aura_cumulus which emulates pallet_aura but allows multiple slots per author.
  • Create a long-running worker which can check a runtime API version on the parachain to determine which inherent-data creation logic to use on the node-side.
  • The runtime upgrade replacing Aura should bump this runtime API version, as well as the authoring version.
  • The worker should wrap Aura and an Aura import queue, as currently done, for backwards compatibility. This can be removed after the upgrade goes through.
  • The worker should, prior to the new runtime API version, just wait for produce_candidate requests from Polkadot’s collation-generation.
  • The worker should, after the new runtime API, trigger some pluggable heuristics for authoring on various input signals, like new parachain blocks, relay chain blocks, and times.

@bkchr
Copy link
Member Author

bkchr commented Mar 9, 2023

  • Introduce an alternative pallet_aura_cumulus which emulates pallet_aura but allows multiple slots per author.

Sounds like nimbus? Rewriting aura to be generic on the author selection should be fairly simple.

  • Remove pallet_aura and pallet_aura_ext from runtimes.

We need a better migration. We should not forget that we are not the only users and we should not just remove this.

@rphmeier
Copy link
Contributor

rphmeier commented Mar 9, 2023

I mean that this would be a migration strategy, not that we should delete the files altogether!

@burdges
Copy link

burdges commented Mar 10, 2023

What does "multiple slots per author" mean?

https://forum.eigenlayer.xyz/t/multiplicity-a-gadget-for-multiple-concurrent-block-proposers/316 <- some EigenLayer research which might be helpful as we go forum. This idea of having a pre-collation quorum with multiple batches is interesting.

You'd have collators collect signatures by other collators that they distributed their parablock? It incurs considerable latency when a parachain has real internal finality, but you're seemingly asking if collators could express promises relative to some relay parent? I'm unsure how much this buy you. cc @AlistairStewart

At present, we tolerate a colluding majority of collators, so long as they do not freeze out other collators, but in principle we could tolerate a malicious majority if honest ones run reconstruction. It'll keep our lives simpler if we preserve this.

@bkchr
Copy link
Member Author

bkchr commented Mar 10, 2023

Closing for the much better issue: #2301

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I7-refactor Code needs refactoring.
Projects
None yet
Development

No branches or pull requests

3 participants