-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Conversation
cc @arkpar Any idea why the wasm-executor test seems to be failing? I haven't changed this logic but did shuffle some crates around |
This comment has been minimized.
This comment has been minimized.
Not sure how the test failure could have happened. Unless CI machine was paused for a few seconds in the middle of the the test execution, it should timeout external validation function execution. That timeout was triggered too late apparently. It worked after restart. Please let me know if this happens again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel competent to review this PR as a whole; there's a lot of changes in here about which I just don't yet know enough to predict the resulting changes in behavior. However, from the narrow perspective of enabling me to handle parachain validation function changes within cumulus, this definitely appears to work.
Just ensuring that I understand the intention behind this design, here's the happy path as I understand it:
- a superuser (or, todo, referendum) approves a new validation function
- a support pallet (in progress) looks at
validation_params.code_upgrade_allowed
and keeps track of the relay chain block number at which an upgrade would be legal - The first time that
validation_params.relay_chain_height >= upgrade_block
, it returns the new function inValidationResult
, which notifies polkadot that the new function is ready to adopt. At all other times,ValidationResult.new_validation_code == None
.
Is that correct?
#[derive(PartialEq, Eq, Decode)] | ||
#[cfg_attr(feature = "std", derive(Debug, Encode))] | ||
pub struct ValidationParams { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[derive(PartialEq, Eq, Decode)] | |
#[cfg_attr(feature = "std", derive(Debug, Encode))] | |
pub struct ValidationParams { | |
#[derive(PartialEq, Eq, Encode, Decode)] | |
#[cfg_attr(feature = "std", derive(Debug))] | |
pub struct ValidationParams { |
It would be helpful if coding on this struct were always bidirectional: that would let me encode it into storage (under a constant key) so that pallets could publish things like the max code size and whether or not code upgrades are currently allowed.
Alternate suggestion, in case BlockData
or HeadData
are large: factor out something like
#[derive(PartialEq, Eq, Encode, Decode, Clone, Copy)]
#[cfg_attr(feature = "std", derive(Debug))]
pub struct ValidationFunctionParams {
/// The maximum code size permitted, in bytes.
pub max_code_size: u32,
/// The current relay-chain block number.
pub relay_chain_height: RelayChainBlockNumber,
/// Whether a code upgrade is allowed or not, and at which height the upgrade
/// would be applied after, if so. The parachain logic should apply any upgrade
/// issued in this block after the first block
/// with `relay_chain_height` at least this value, if `Some`. if `None`, issue
/// no upgrade.
pub code_upgrade_allowed: Option<NonZero<RelayChainBlockNumber>>,
}
That's all the data I'm really interested in forwarding for pallet use; this struct could then become a public member of ValidationParams
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful if coding on this struct were always bidirectional: that would let me encode it into storage (under a constant key) so that pallets could publish things like the max code size and whether or not code upgrades are currently allowed.
Yes, but at the cost of larger runtime code size. You can build it with the std
feature to get Encode
functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be open to the alternate suggestion, breaking out ValidationFunctionParams
? I'm already doing that in the Cumulus branch; it's not hard, but it would be easier to just copy out the struct directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this stuff should be kept locally to Cumulus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on why a Cumulus chain would want to put this in storage? These parameters will change for the execution of every block and cannot be relied on beyond a single block's execution.
The only thing a Cumulus should need to store to process a pending upgrade is what the inner value of code_upgrade_allowed: Some
was when the upgrade was initiated. Then the next time you get a ValidationFunctionParams
with relay_chain_height >= scheduled_at
, you apply the code change at the end of the block by writing to :code
storage.
Is it a question of putting it in the ephemeral storage, which is wiped at the end of the block, so other modules can query this value? I think you only need to put code_upgrade_allowed
in such ephemeral storage.
And I'll go a little off topic here, but I wrote this code with the understanding that Cumulus would have an inherent type with the ValidationFunctionParams
which is required in each block. Then the validation function is checking that the passed function params are equal to the ones in the inherent. This means that you can get this data into the ephemeral store by implementing a normal ensure_none
Call
. Is this in line with your thinking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on why a Cumulus chain would want to put this in storage?
The main goal is to propagate the values beyond Cumulus, so that a pallet can use them. It enables the pallet to publish methods like can_set_code() -> bool
or max_code_size() -> u32
, but more importantly, it lets the pallet generate nice errors like this:
pub fn set_code(origin, validation_function: ValidationFunction) {
let vfp = Self::validation_function_params();
ensure!(validation_function.len() <= vfp.max_code_size as usize, Error::<T>::TooBig);
let apply_block = vfp.code_upgrade_allowed.ok_or(Error::<T>::ProhibitedByPolkadot)?;
...
}
To a user, that's a much nicer way to know that this is the wrong time or the wrong size for a validation function upgrade than panicing during the Cumulus validate_block
function.
I spoke with @bkchr earlier this afternoon, and he explained why literally putting it into storage is a bad idea (it messes with the storage root). We came up with workaround which still ends up going through the storage interface, even though it doesn't actually modify the storage; for that reason, it would be still good to derive Encode
, Decode
on the struct.
Is it a question of putting it in the ephemeral storage... I think you only need to put code_upgrade_allowed in such ephemeral storage.
Essentially, yes, that's why we're using it. We do use all three fields, though:
max_code_size
: so that we can return an error instead of panicing if submitted code is too bigrelay_chain_height
: so we know when to set:code
given a previously scheduled validation function upgradecode_upgrade_allowed
: so we know when to schedule a validation function upgrade
I wrote this code with the understanding that Cumulus would have an inherent type with the
ValidationFunctionParams
I'm also looking into how to implement that. However, even if the upgrade pallet gets the validation function params from an inherent instead of by directly querying a special storage key, it's not yet clear to me how the inherent would get access to them except via the magic storage key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally: it would be helpful to add a field to ValidationParams
/ ValidationFunctionParams
. It could be as simple as
expect_upgrade_this_block: bool,
Rationale: we can't use code_upgrade_allowed
to validate all inserts to :code
. Obviously the upgrade pallet will be well-behaved in this regard, but not all users will discover and use that pallet, so we should guard against errors in DIY user code.
Scenario:
- Block 1000:
can_set_code == Some(1500)
. User sets some code, which is returned in theValidationResult
. - Blocks 1001-1499:
can_set_code == None
- Block 1500:
can_set_code == None || can_set_code == Some(2000)
. I don't know which condition actually applies, but neither value indicates that it is legal to set:code
on this block, right now.
Given expect_upgrade_this_block
, we can inspect all writes to :code
and ensure that they are only performed when legal. Without it, we have to either store the scheduled upgrade block within validate_block
(which breaks the storage root), or skip the check entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will try to address these points one-by-one.
Essentially, yes, that's why we're using it. We do use all three fields, though:
max_code_size
: so that we can return an error instead of panicing if submitted code is too bigrelay_chain_height
: so we know when to set:code
given a previously scheduled validation function upgradecode_upgrade_allowed
: so we know when to schedule a validation function upgrade
I would still recommend to create a separate type for these fields, as we will add more stuff to ValidationFunctionParams
including incoming messages and other fields you would not want to necessarily include in the ephemeral storage.
Additionally: it would be helpful to add a field to
ValidationParams
/ValidationFunctionParams
expect_upgrade_this_block
is redundant which could be a cause of confusion or bugs later on. I would prefer to avoid it. Specifically, this statement above is not correct:
Block 1500:
can_set_code == None || can_set_code == Some(2000)
. I don't know which condition actually applies, but neither value indicates that it is legal to set:code
on this block, right now.
When can_set_code
is Some
, you can schedule an upgrade to be applied at/after that block. When an upgrade is scheduled, you should put it into code as soon as relay_chain_height >= scheduled_at
, regardless of value of can_set_code
. Happy to bikeshed on names for can_set_code
as I see where the confusion stems from. The implementation accounts for the case where an upgrade is applied and a new one is signaled at the same time here: https://github.com/paritytech/polkadot/pull/918/files#diff-e21f8096708b22fcdbd66ebc74f0821aR1038
I wrote this code with the understanding that Cumulus would have an inherent type with the
ValidationFunctionParams
I'm also looking into how to implement that. However, even if the upgrade pallet gets the validation function params from an inherent instead of by directly querying a special storage key, it's not yet clear to me how the inherent would get access to them except via the magic storage key.
The inherent should provide the value for the ephemeral magic storage key, right? And then the inherent is constructed via a ProvideInherent
implementation, whose InherentData
contains a ValidationFunctionParams
which is provided by the polkadot-collator
pipeline to Cumulus' authorship process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When
can_set_code
isSome
, you can schedule an upgrade to be applied at/after that block. When an upgrade is scheduled, you should put it into code as soon asrelay_chain_height >= scheduled_at
, regardless of value ofcan_set_code
.
Yes, I understand that; that's what the pallet does. However, there are three areas in the code in play:
- Polkadot
- cumulus runtime (
validate_block
) - pallet-parachain-upgrade
When an upgrade is scheduled, both polkadot and the pallet store the current value of can_set_code
, and apply the code change at the appropriate time. However, the cumulus runtime is stateless; it doesn't get to add anything persistent to storage.
That's all fine, assuming that the only way that users ever attempt parachain upgrades is via our pallet. However, that's not an entirely reasonable assumption. Someone out there, eventually, is likely to want to do it themselves for some reason.
The purpose of expect_upgrade_this_block
is to enable validate_block
to know when an upgrade is legal, given that it can't store its own data in storage, and shouldn't peek into the pallet's storage (because we can't assume that our pallet will be used). If it exists, then validate_block
can fail when a user attempts to set :code
at the wrong time. If it doesn't exist, then it cannot perform that validation. If that validation is already performed by polkadot, then we can skip this. However, if Polkadot is not performing that validation check, then it is possible for an end user to write a pallet which just updates :code
whenever it feels like.
And then the inherent is constructed via a
ProvideInherent
implementation, whoseInherentData
contains aValidationFunctionParams
which is provided by thepolkadot-collator
pipeline to Cumulus' authorship process.
Thanks! This gives me a lot of entrypoints to research to figure out how to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of ways someone can write an incorrect parachain or runtime. Even if we added this expect_upgrade_this_block
, there is still a possibility that someone writes a pallet that doesn't care about can_upgrade_at
and writes to :code
anyway. Or writes the wrong code to :code
. Or doesn't write anything to:code
when it's supposed to. There are a million ways an end user can mess things up, which we cannot prevent. We can't address these issues, so are we supposed to throw away this PR?
This PR sets forth a protocol that chains can follow. Cumulus is one way for them to follow the protocol. If someone decides to register a parachain diverging from the protocol, then their parachain will not work.
As long as Cumulus can expose a reasonable entry-point for users to perform parachain code upgrades, then responsible parachain authors and auditors will (need to) ensure that it is used correctly.
The process is more like this:
|
cc @arkpar I've encountered the same spurious test failure of |
runtime/common/src/parachains.rs
Outdated
|
||
Self::do_old_code_pruning(now); | ||
|
||
// TODO [now]: adjust. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODOs should be fixed before merging or have an issue opened?
* upgrade primitives to allow changing validation function * set up storage schema for old parachains code * fix compilation errors * fix test compilation * add some tests for past code meta * most of the runtime logic for code upgrades * implement old-code pruning * add a couple tests * clean up remaining TODOs * add a whole bunch of tests for runtime functionality * remove unused function * fix runtime compilation * extract some primitives to parachain crate * add validation-code upgrades to validation params and result * extend validation params with code upgrade fields * provide maximums to validation params * port test-parachains * add a code-upgrader test-parachain and tests * fix collator tests * move test-parachains to own folder to work around compilation errors * fix test compilation * update the Cargo.lock * fix parachains tests * remove dbg! invocation * use new pool in code-upgrader * bump lockfile * link TODO to issue
f95cc7a5 Merge branch 'master' into hc-add-wococo-support a13ee0bc Bump Substrate (paritytech#939) f8680cbf jsonrpsee alpha6 (paritytech#938) 6163bcbf reonnect to failed client in on-demand relay background task (paritytech#936) 14e82bea Do not spawn additional task for on-demand relays (paritytech#933) b1557b88 Relay at least one header for every source chain session (paritytech#923) 9420649c Remove deprecated Runtime Header APIs (paritytech#932) 9627011e Update README.md (paritytech#931) 7b736b9c Truncate output in logs. (paritytech#930) faad06e3 Make sure that relayers have dates in logs. (paritytech#927) 07734535 Update dump-logs script. (paritytech#928) efe215e4 RustFmt 02522249 Fix test 48b41d82 Add support for relaying headers between Rococo and Wococo f16b6b41 Add CLI support for initializing the Wococo<>Rococo bridge 8c6e6443 Add more Wococo boilerplate code c2d56b2e Add pruning to bechmarks & update weights. (paritytech#918) a30c51dc Add properties to Chain Spec (paritytech#917) 28d3ed2f Add Wococo primitives crate d691c73e Fix issue with on-demand headers relay not starting (paritytech#921) 8ee55c1e Fix image publishing. (paritytech#922) f51fb59d Prefix in relay loops logs (paritytech#920) git-subtree-dir: bridges git-subtree-split: f95cc7a57b48948f17d33f5be3ea01c752deba94
801c99f3 Add Wococo<>Rococo Header Relayer (#925) 21f49051 Remove Westend<>Rococo header sync (#940) 06235f16 do not panic if pallet is not yet initialized (#937) a13ee0bc Bump Substrate (#939) f8680cbf jsonrpsee alpha6 (#938) 6163bcbf reonnect to failed client in on-demand relay background task (#936) 14e82bea Do not spawn additional task for on-demand relays (#933) b1557b88 Relay at least one header for every source chain session (#923) 9420649c Remove deprecated Runtime Header APIs (#932) 9627011e Update README.md (#931) 7b736b9c Truncate output in logs. (#930) faad06e3 Make sure that relayers have dates in logs. (#927) 07734535 Update dump-logs script. (#928) c2d56b2e Add pruning to bechmarks & update weights. (#918) a30c51dc Add properties to Chain Spec (#917) d691c73e Fix issue with on-demand headers relay not starting (#921) 8ee55c1e Fix image publishing. (#922) f51fb59d Prefix in relay loops logs (#920) git-subtree-dir: bridges git-subtree-split: 801c99f3de0fa4d0b61e4e065fa30817179368ea
f43c92430 Fix account derivation in CLI (#952) 9ac07e733 Add backbone configuration of cargo-spellcheck (#924) 2761c3fef Message dispatch support multiple instances (#942) 801c99f3d Add Wococo<>Rococo Header Relayer (#925) 21f490514 Remove Westend<>Rococo header sync (#940) 06235f162 do not panic if pallet is not yet initialized (#937) a13ee0bc3 Bump Substrate (#939) f8680cbfc jsonrpsee alpha6 (#938) 6163bcbf4 reonnect to failed client in on-demand relay background task (#936) 14e82bea3 Do not spawn additional task for on-demand relays (#933) b1557b882 Relay at least one header for every source chain session (#923) 9420649c1 Remove deprecated Runtime Header APIs (#932) 9627011e1 Update README.md (#931) 7b736b9cc Truncate output in logs. (#930) faad06e39 Make sure that relayers have dates in logs. (#927) 077345351 Update dump-logs script. (#928) c2d56b2e9 Add pruning to bechmarks & update weights. (#918) a30c51dc9 Add properties to Chain Spec (#917) d691c73e9 Fix issue with on-demand headers relay not starting (#921) 8ee55c1e1 Fix image publishing. (#922) f51fb59d0 Prefix in relay loops logs (#920) git-subtree-dir: bridges git-subtree-split: f43c924301c227d29ec161f6815d9bac458a211d
b2099c5c Bump Substrate to `b094edaf` (#958) 3f037094 Bump endowment amounts on Rialto and Millau (#957) b21fd07c Bump Substrate WASM builder (#947) 30ccd07c Bump Substrate to `ec180313` (#955) a7422ab1 Upgrade to GitHub-native Dependabot (#945) ed20ef34 Move pallet-bridge-dispatch types to primitives (#948) 2070c4d6 Endow accounts and add `bridgeIds` to chainspec. (#951) f43c9243 Fix account derivation in CLI (#952) 9ac07e73 Add backbone configuration of cargo-spellcheck (#924) 2761c3fe Message dispatch support multiple instances (#942) 801c99f3 Add Wococo<>Rococo Header Relayer (#925) 21f49051 Remove Westend<>Rococo header sync (#940) 06235f16 do not panic if pallet is not yet initialized (#937) a13ee0bc Bump Substrate (#939) f8680cbf jsonrpsee alpha6 (#938) 6163bcbf reonnect to failed client in on-demand relay background task (#936) 14e82bea Do not spawn additional task for on-demand relays (#933) b1557b88 Relay at least one header for every source chain session (#923) 9420649c Remove deprecated Runtime Header APIs (#932) 9627011e Update README.md (#931) 7b736b9c Truncate output in logs. (#930) faad06e3 Make sure that relayers have dates in logs. (#927) 07734535 Update dump-logs script. (#928) c2d56b2e Add pruning to bechmarks & update weights. (#918) a30c51dc Add properties to Chain Spec (#917) d691c73e Fix issue with on-demand headers relay not starting (#921) 8ee55c1e Fix image publishing. (#922) f51fb59d Prefix in relay loops logs (#920) git-subtree-dir: bridges git-subtree-split: b2099c5c0baf569e2ec7228507b6e4f3972143cc
cc @bkchr @coriolinus
Closes #825
This uses a time-limit approach to allowing parachains to upgrade their validation code, as opposed to a fees-based approach as outlined in the issue. The most important changes are in polkadot-runtime-common and summarized here:
Other miscellaneous changes:
parachain
crate as mentioned in Add current relay block to ValidationParams #879 . These types are re-exported viaprimitives/parachain
.