mark all core vats as "critical" #6051
Labels
cosmic-swingset
package: cosmic-swingset
enhancement
New feature or request
good first issue
Good for newcomers
Milestone
What is the Problem Being Solved?
Our initial chain will consist of about 20 vats: bootstrap, a handful of static vats (some created by the kernel itself), a bunch of dynamic vats created by bootstrap, and then a bunch of contract vats created by zoe (either during bootstap or during an "enable the economy" governance event shortly afterwards).
At this stage, if any of these vats fail, we should panic the kernel and halt the chain. It would be better to stop the chain than to move forward without any of these vats.
Description of the Design
For static vats, we have a flag for this: you just set
config.vats.NAME.creationOptions.critical = true
. So the task is to modifypackages/vats/*-config.json
to add this flag to the static vats defined therein.For dynamic vats, the party doing the creation (either bootstrap or zoe) must provide a special
criticalVatKey
object in an option to thecreateVat
call. This object can only be obtained from the vat-admin root object (which is distinct from thevatAdminService
on which one callscreateVat
):So the tasks are:
getCriticalVatKey()
from the vat-admin root object andproduce
it to a slot namedcriticalVatKey
makeVat
code shouldconsume
that slot and use it as thecritical:
option in itscreateVat()
calls, so all the initial dynamic vats (like zoe) will be marked as criticalcreateZcfVat
for Zoe, so bootstrap needs to imbue that function withcriticalVatKey
, so all contract vats that Zoe creates will be marked as criticalAfter our initial launch, zoe should certainly not be marking all contract vats as critical. For one, that would allow any third-party contract to halt the chain. But also we expect to be creating more contract instances as the chain matures, and many of them will be short-lived or will serve a smaller audience, where their termination is expected and will be more tolerated.
Security Considerations
Without this change, certain unexpected failure modes (vats consume more than 2 GiB of RAM, emit oversized netstring messages, or suffer some internal consistency error that manifests as an illegal syscall) will terminate the vat and then commit (within consensus) the deletion of the vat's state. This will make recovery more difficult, as we'll have to roll back the state by a block or two, instead of distributing new software that only has to fix the terminating behavior.
Test Plan
It'd be nice to somehow trigger a vat failure and make sure the chain halts, but it might be good enough to just inspect the config files and slogfiles once (looking for the
isCritical
option in thevatOptions
).The text was updated successfully, but these errors were encountered: