Fix provisioner issue for on-demand/coretime #3141

eskimor · 2024-01-30T17:01:29Z

Provisioner code does not seem to be fully on-demand ready.

polkadot-sdk/polkadot/node/core/provisioner/src/lib.rs

Line 662 in a02b534

// TODO: doesn't work for on-demand parachains. We lean hard on the

antonva · 2024-01-31T11:09:56Z

This should be fine and is generally the same as #664,
which is that next_up_on_available() used to use the CoreIndex to index into the paras::parachains() vec directly to map to a ParaId.

Ever since we refactored this out of the scheduler and into the AssignmentProvider interface we no longer have this assumption.

alindima · 2024-02-01T12:23:12Z

Having a deeper look at the code and speaking with @antonva about this, we came to the following conclusion:

Assume core A is occupied by Para1, the next_up_on_available is Para2 and the bitfields for making the candidate of Para1 have been recorded on-chain.
The next block producer will call the provisioner, which will request from prospective-parachains the next backable candidate of Para2, which has the Para1 candidate as parent. There's obviously no such candidate so the core will transition to scheduled.
Para2 will be able to get a candidate in the next iteration.

So it's a slight inefficiency that is made less likely due to the core affinities.

I suggest we fix it by modifying the provisioner to check if the newly scheduled para is different than the previous one and if it is, request a backable candidate with the required_path of the new para instead (taking into account a possible parent candidate for Para 2 present on another core)

alindima · 2024-02-14T15:37:58Z

I found a very similar problem in collation-generation: #3327

#3130 builds on top of #3160 Processes the availability cores and builds a record of how many candidates it should request from prospective-parachains and their predecessors. Tries to supply as many candidates as the runtime can back. Note that the runtime changes to back multiple candidates per para are not yet done, but this paves the way for it. The following backing/inclusion policy is assumed: 1. the runtime will never back candidates of the same para which don't form a chain with the already backed candidates. Even if the others are still pending availability. We're optimistic that they won't time out and we don't want to back parachain forks (as the complexity would be huge). 2. if a candidate is timed out of the core before being included, all of its successors occupying a core will be evicted. 3. only the candidates which are made available and form a chain starting from the on-chain para head may be included/enacted and cleared from the cores. In other words, if para head is at A and the cores are occupied by B->C->D, and B and D are made available, only B will be included and its core cleared. C and D will remain on the cores awaiting for C to be made available or timed out. As point (2) above already says, if C is timed out, D will also be dropped. 4. The runtime will deduplicate candidates which form a cycle. For example if the provisioner supplies candidates A->B->A, the runtime will only back A (as the state output will be the same) Note that if a candidate is timed out, we don't guarantee that in the next relay chain block the block author will be able to fill all of the timed out cores of the para. That increases complexity by a lot. Instead, the provisioner will supply N candidates where N is the number of candidates timed out, but doesn't include their successors which will be also deleted by the runtime. This'll be backfilled in the next relay chain block. Adjacent changes: - Also fixes: #3141 - For non prospective-parachains, don't supply multiple candidates per para (we can't have elastic scaling without prospective parachains enabled). paras_inherent should already sanitise this input but it's more efficient this way. Note: all of these changes are backwards-compatible with the non-elastic-scaling scenario (one core per para).

…ch#3233) paritytech#3130 builds on top of paritytech#3160 Processes the availability cores and builds a record of how many candidates it should request from prospective-parachains and their predecessors. Tries to supply as many candidates as the runtime can back. Note that the runtime changes to back multiple candidates per para are not yet done, but this paves the way for it. The following backing/inclusion policy is assumed: 1. the runtime will never back candidates of the same para which don't form a chain with the already backed candidates. Even if the others are still pending availability. We're optimistic that they won't time out and we don't want to back parachain forks (as the complexity would be huge). 2. if a candidate is timed out of the core before being included, all of its successors occupying a core will be evicted. 3. only the candidates which are made available and form a chain starting from the on-chain para head may be included/enacted and cleared from the cores. In other words, if para head is at A and the cores are occupied by B->C->D, and B and D are made available, only B will be included and its core cleared. C and D will remain on the cores awaiting for C to be made available or timed out. As point (2) above already says, if C is timed out, D will also be dropped. 4. The runtime will deduplicate candidates which form a cycle. For example if the provisioner supplies candidates A->B->A, the runtime will only back A (as the state output will be the same) Note that if a candidate is timed out, we don't guarantee that in the next relay chain block the block author will be able to fill all of the timed out cores of the para. That increases complexity by a lot. Instead, the provisioner will supply N candidates where N is the number of candidates timed out, but doesn't include their successors which will be also deleted by the runtime. This'll be backfilled in the next relay chain block. Adjacent changes: - Also fixes: paritytech#3141 - For non prospective-parachains, don't supply multiple candidates per para (we can't have elastic scaling without prospective parachains enabled). paras_inherent should already sanitise this input but it's more efficient this way. Note: all of these changes are backwards-compatible with the non-elastic-scaling scenario (one core per para).

eskimor added this to parachains team board Jan 30, 2024

github-project-automation bot moved this to Backlog in parachains team board Jan 30, 2024

antonva self-assigned this Jan 31, 2024

alindima mentioned this issue Feb 1, 2024

Provisioner: Elastic Scaling #3130

Closed

alindima mentioned this issue Feb 9, 2024

provisioner: allow multiple cores assigned to the same para #3233

Merged

antonva removed their assignment Feb 9, 2024

alindima self-assigned this Feb 12, 2024

alindima moved this from Backlog to In Progress in parachains team board Feb 12, 2024

alindima mentioned this issue Feb 14, 2024

Fix collation-generation for on-demand/coretime #3327

Closed

alindima mentioned this issue Feb 19, 2024

Elastic scaling: use an assumed CoreIndex in candidate-backing #3229

Merged

3 tasks

alindima closed this as completed in #3233 Mar 1, 2024

github-project-automation bot moved this from In Progress to Completed in parachains team board Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix provisioner issue for on-demand/coretime #3141

Fix provisioner issue for on-demand/coretime #3141

eskimor commented Jan 30, 2024

antonva commented Jan 31, 2024

alindima commented Feb 1, 2024

alindima commented Feb 14, 2024

Fix provisioner issue for on-demand/coretime #3141

Fix provisioner issue for on-demand/coretime #3141

Comments

eskimor commented Jan 30, 2024

antonva commented Jan 31, 2024

alindima commented Feb 1, 2024

alindima commented Feb 14, 2024