-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"external" vat upgrade ideas #10028
Comments
Approach 2: Remap ExportsWe can also approach this from the exporting side. The goal here would be to use This would update all clients, even ones the upgrading party doesn't know about. The simplest (but problematic) API would be If VatB is still operating, it would need to remove the VatB c-list entry, to maintain the invariant that objects are never simultaneously exported by multiple vats. It would be easier if VatB were disabled somehow (not terminated, which would trigger deletion, but it should certainly not be receiving deliveries). The trouble is that Carol's original kref is already floating around: at the very least, both the upgrading (parent) vat and vat-vat-admin have seen it, and have Presences for both it and the original Bob. When we change VatC's "Carol" c-list entry to point at the old Bob kref, we should probably orphan the old Carol kref, and hope that the involved vats will drop it shortly. As before, this would be easier if VatC is not really up and running yet, like if it's in some state where the replacement objects are ready to go, but it's holding off on talking to any other vat until its parent gives the all-clear. Approach 3: Commandeer At First ExportTo avoid exposing the initial Carol kref, we could use a scheme that involves a special "Claim" object. The const startup = (claimBob, stuff) => {
const carol = ...;
vatPowers.claim(claimBob, carol);
}; The Claim might be represented as a special kernel object, where instead of an syscall.claim(claimVref, replacementVref); This would be treated a bit like exporting
This would be easiest if VatB were no longer running, but we could also have the kernel allocate a new kref for Bob's old c-list entry (one which nobody is currently referencing). And/or somebody (maybe the |
PromisesWe might stop there, and say that this upgrade/replacement process only works for objects, but not promises. This would certainly be easier. Doing that would make little attempt to hide the upgrade trauma: operations in-process during the replacement would be visible to clients (their outstanding promises would be disconnected, and given what "upgrade-naïve" has meant so far, they would treat the disconnection as a failure). But we could also find a way to remap any Promises that VatB is currently a decider on, and let VatC take them over. From an authority point of view, we can pretend that VatB has carefully retained We we can imagine a I'm toying with the idea of a new vref category, strawman is |
Upgrade Vat Must Know MoreAll of this points at a general principle: if VatC wants to take over for VatB, it needs to know more, and be prepared to do more, than VatB did. The #1691 approach has the new vat being so clever that it can precisely emulate the old vat up until the big reveal. The normal These "remapping" approaches don't require the old vat to have planned ahead, but do require the new vat to know enough about the old vat's operations that it knows what to do with each Claim, or knows which replacement objects to give to the parent (so it can pass them through to the AdminNode). In that sense it requires more work of, and coordination between, both the parent vat (requesting the upgrade) and the replacement vat. It also requires that all the relevant pieces (objects to be replaced, promises to be taken over) are available to the parent vat. Somehow it must be involved enough in the interactions between old service vat and client vat to have grabbed a copy of Bob and other things-to-be-replaced. If the only such things are public facets and ZCF facets (low cardinality, generally created at startup time, rather than per-invocation), this may be easy. But if we want to reduce the upgrade trauma and allow in-progress operations to be unaffected, the parent vat will need access to the objects used by those in-process operations too, and that might be too invasive. |
@dtribble brought up a more evil level of hack: something like That would trade the cost of adding vat-admin APIs for the cost of adding kernel APIs, a new liveslots thing, and writing the short-lived vat whose only job is to record the right objects. Depending upon how we exposed the new syscall to the new vat ( We could also imagine a special logging feature, to which vats could send Presences, and the kernel would emit their krefs. Then we could have one upgrade/whatever which logged both Bob and Carol, and then we tell all the validator to look at their logs and write down the krefs, and then compare them against the body of the second proposal (which, when executed, calls |
Use cases: Just today, on mainnet, we launched a new
If we could remap imports, then instead of this upcoming proposal starting a third auctioneer vat, it could instruct the kernel to remap the imports of the second auctioneer vat to point at the new (2nd) price-feed vats. We wouldn't know those krefs until the price-feed vats had launched, which is why we'd either need to log them (and only then build the remapping proposal), or somehow register them by name. (In general, I think our existing plan to launch a new auctioneer is the simplest and most robust, as it requires no new code, and we've now done it once already) |
What is the Problem Being Solved?
When we designed the vat upgrade mechanism (
E(adminNode).upgrade(newBundlecap)
), we kinda assumed that the first version of every vat would be prepared for upgrade. Back then, we figured it was just a question of the vat keeping all its state in durable baggage. But, time ran out, and we were unable to implement (and/or test) upgradability for all vats. And since then, we've discovered other impediments to upgrade, such as downstream vats not reacting well to the disconnected promises decided by the upgraded vats, where it is Vat2 that is preventing us from upgrading Vat1.As a result, while we've successfully upgraded several vats, we have at least a few which cannot be upgraded, or for which our full upgrade process is way more complicated than we want.
One workaround is to launch a replacement vat, then convince all the original vat's clients to talk to the replacement instead. We're in the process of doing that with the price feed vats, however we had to start by upgrading the client vat to be able to accept this "please talk to a replacement" request.
@erights and @dtribble were brainstorming about ways to accomplish our goals more easily, so we started talking about kernel support for this "please talk to a replacement" feature.
Description of the Design
The starting point is a client VatA, which has an object Alice, who is talking to a service object Bob in VatB:
Both VatA and VatB are "upgrade naïve": they're difficult (or impossible) to upgrade. But, we need to upgrade the service provided by VatB anyways. Our plan is to introduce a new VatC, with an object "Carol" which is prepared to take over the duties of Bob. When we're done, we want Alice to be transparently reconnected to Carol instead of Bob:
Preliminaries
We start with our standard ground rule. We must maintain ocap security: no ambient authority, and connectivity begets connectivity.
We leverage the sizable authority of the
adminNode
. This is the object returned by the kernel'svatAdminService
when a vat is created. If Vat1 asks the kernel to create Vat2, the caller will receive an object we'll calladminNodeVat1
. With this AdminNode, the caller (something in Vat1) can direct the kernel to upgrade Vat2 to new code, retaining access to durable state, including all imports, and retaining the right to define the behavior of all exports. AdminNodes also provide.terminate()
.We establish a rule: you can't "fight the future". The first version of Vat2 is entirely vulnerable to subsequent versions: whatever code it gets upgraded to will have full access to all its durable state. We declare that we won't support attempts by earlier versions to limit the power of later versions by e.g. deliberately keeping authorities in non-durable storage. Further, we assume that upgrade is intended to provide as much access as possible, even if the earlier version forgot to retain something that was important for the later version (our #1691 scheme would effectively reacquire such dropped authorities).
That means the acceptable authority of an AdminNode extends to manipulating things inside the corresponding vat. If you can upgrade a vat, you can grab any object the vat has access to (because you could upgrade the vat to code that gives you that object), or you can forward one of its exports to some other object (modulo questions about object identity) (because you could upgrade the vat to code that forwards each message).
Ideally, object identity is monotonic. Some of the approach we discuss would "merge" two objects into a single one, or manipulate c-lists to change one vat's notion of what their Presence points to. This could lead to
presence1 !== presence2
at one point in time, but===
at a later point in time, or to situations where a Presence sent out of the vat might round-trip back as a different Presence. That could get messy. It might be unavoidable, but if at all possible, we want all object identity comparisons to remain stable across the upgrade/replacement.Approach 1: Remap Imports
Given the AdminNode for VatA (the client), we could build a mechanism to remap its import of Bob to point at Carol instead.
The API would be something like
E(adminNodeVatA).remapImports([ [bob, carol] ])
. Upon receipt, vat-vat-admin would invoke device-vat-admin, which would use itsvat-admin-hooks.js
API to tell the kernel to edit VatA's c-list entry, replacing the kref side, inserting Carol's kref where Bob's once was.This only affects VatA: if there are multiple clients, they must all be remapped separately.
VatB can continue to run, and VatC can be fully established before the remapping, neither need to be in any special state.
VatA retains the same Presence object across the remapping, with its pre-existing vref (
o-2
). If VatA somehow had previous access to Carol, it might have a separate Presence (with e.g.o-6
), already mapped to that same kref. This would cause c-list translation problems (breaking the "one to one" rule of c-lists). So theremapImports
API should be defined to throw an error if any of the replacement objects are already present in the vat's c-list. This is easier to avoid if the replacement VatC is not yet fully operational when the remapping occurs, and it hasn't spread its replacement objects widely enough to risk them appearing at the client yet. (Note that it much launch at least enough to deliver the replacement objects to the parent vat, who can send them into theremapImports
API).Approach 2: Commandeer Exports
TBD
Promises
TBD
Security Considerations
I believe these APIs are not more powerful than the existing
upgradeVat()
authority, modulo the possibility of creating identity discontinuities.Scaling Considerations
Remapping a moderate number of objects should not be a scaling problem, although there might be cases where we need to scan all vat c-lists for a given kref, which would then scale with the number of vats (moderate now, but maybe larger in the future).
Test Plan
Upgrade Considerations
To add new functionality to the AdminNodes, we must:
vat-admin-hooks.js
The kernel upgrade is straightforward. However, we don't currently have any mechanism to upgrade devices, so we'd need to invent one and implement it in the kernel. For vat-vat-admin, we have
controller.upgradeStaticVat()
, but we don't currently have a great place to call that from the cosmic-swingset chain code. The final issue is that upgrading vat-vat-admin will cause the existingE(adminNode).done()
promises to disconnect, and e.g. Zoe might mistake that for a contract vat dying, and might react by exiting all seats and returning all escrowed asserts.So there are many steps we must figure out before we could deploy these new APIs.
The text was updated successfully, but these errors were encountered: