Prevent validator from using mismatched agoric-sdk version #8471
Labels
cosmic-swingset
package: cosmic-swingset
enhancement
New feature or request
liveslots
requires vat-upgrade to deploy changes
SwingSet
package: SwingSet
xsnap
the XS execution tool
What is the Problem Being Solved?
Some agoric-sdk versions may be mostly compatible between each other so that most operations remain in consensus, but some others cause a divergence. For example, we may fix an xsnap bug which has divergent execution depending on the environment (like #7829), or introduce a change that causes the supervisor or lockdown bundle to have a different hash (e.g. when we bumped from 11 to 11wf), which would only be picked up when starting a new vat.
#7012 describes requirements for enforcing the xsnap-worker version, however this is not sufficient as we've observed a validator on devnet falling off consensus when a core eval executed and started a new vat. The xsnap version checks implemented in #7831 require that the version included in the binary be explicitly bumped to reflect an expected software upgrade version, and thus didn't cover this case.
Description of the Design
The enforcement of version likely needs to remain piecemeal. For example it may not make sense to enforce that the cosmos or cosmic-swingset code matches exactly between all validators as we've found benefits to be able to locally patch that part of the software as long as the behavior remains in consensus.
For the supervisor and lockdown bundles, we know those are consensus affecting on start vat. These should only change during a planned software upgrade, and as such we should be able to fail starting the node if these bundles are not yet known, unless we're re-starting because of an upgrade. This would remove the ticking time bomb effect that these incompatible bundles currently have.
The xsnap version likely falls in the same bucket as the chain software upgrade, and having the ability to locally patch the worker may be valuable. Embedding the expected chain software upgrade version is considered a hack, and is a weird inversion of dependency. We likely want to instead report the Moddable SDK version the worker is based on, and include an "agoric suffix" to capture behavioral patches we want to enforce.
Security Considerations
While we avoid it as much as possible, retaining the ability to distribute a hotpatch to validators is valuable. While we can always disable / coerce versions checks in a hotpatch, it's easier if the implemented version checks allow this kind of patch as much as possible.
Scaling Considerations
None
Test Plan
TBD
Upgrade Considerations
This only affects chain software upgrades and ensures that all validators use the expected version.
The text was updated successfully, but these errors were encountered: