Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tentative API to upgrade static vats #5811

Closed
warner opened this issue Jul 24, 2022 · 1 comment
Closed

add tentative API to upgrade static vats #5811

warner opened this issue Jul 24, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Jul 24, 2022

What is the Problem Being Solved?

For dynamic vats, the per-vat adminFacet can be used to upgrade or otherwise restart that vat, with different code and/or vatParameters than the previous incarnation. Whoever asked for the vat to be created is given this facet, which is a very tidy way to represent the authority to influence the behavior of the vat after that initial creation event.

We do not have a corresponding control mechanism to upgrade a static vat. #5666 is about making sure all of swingset's internal vats (vat-timer, vat-vat-admin, etc) are upgradable, meaning they use durable objects, and baggage, and are prepared to behave correctly after a restart. However we don't yet have a way to trigger this restart.

We know that kernel changes are easier to deploy than vat changes, so I've not included this as a requirement for MN-1: as long as the vats themselves are prepared for upgrade, we could wait to implement the upgrade button until after MN-1 is deployed. However that does make it harder to test the #5666 accomplishments.

@FUDCo was looking at the dynamic vat upgrade implementation and decided that it's nearly everything we need for upgrading static vats too. The big question is where the authority lies. The two main options are 1: something new on vat-vat-admin, or 2: something on the controller object.

One consideration is POLA. Being able to upgrade a static vat is really powerful, because that lets you commandeer the bootstrap vat (which may still have access to every authority created in every other vat), as well as vat-vat-admin itself (which therefore lets you upgrade all dynamic vats). Currently, if you have access to the vatAdminService, you can create new vats, but you can't affect any existing ones that you didn't create. If we added E(vatAdminService).upgradeStaticVat(vatID, { bundleCap, vatParameters }), that would represent a huge increase in authority.

A second consideration is the mechanics of the chain-upgrade scenario where we anticipate needing to upgrade the static vats. If we're upgrading one of the built-in vats, there's a decent chance we'll be changing the associated device at the same time, and/or changing the kernel code (especially if we're upgrading vat-vat-admin, since it does all its work through device-vat-admin and its kernel hooks). These sorts of upgrades cannot be driven by messages to vatAdminService: they need to be coordinated with code that sits outside the kernel and interacts with the new device/kernel code being launched.

So we were thinking that the right API might be something like controller.upgradeStaticVat(vatID, bundleID, vatParametersCapData). This would enqueue the vatUpgrade event on the kernel run-queue, and when that event gets processed, the static vat would be stopped, modified, and restarted, just like with dynamic vats.

In addition, @FUDCo has been working on a mechanism inside vat-vat-admin to coordinate vat-vat-admin upgrades with other activities (starting/upgrading dynamic vats). The concern was that someone might ask to create a new dynamic vat, get back a promise that fires when the new vat is ready, and then while they're waiting, an admin comes long and updates vat-vat-admin itself. Their promise would be rejected (with an UpgradeDisconnect object from #5768), even though the dynamic vat was created, effectively causing a vat leak.

This mechanism would disable/enable methods to vat-vat-admin that set a flag to prohibit new activity until the service is re-enabled. The disable() method would return a Promise that fires only after the service is quiescent and it's safe to upgrade.

These promises are slightly tricky to work with from the outside: we have kpid = controller.queueToVatRoot(vatName, methodName, args), but you can't just await the result: you must keep calling controller.run() until controller.kpStatus(kpid) === 'resolved', and in a chain context, that might be several blocks later. It would be nice if there were a single controller.something() call you could make that would interact with the target static vat, maybe calling some special method on its root object that performed the disable/enable dance, then did the static upgrade, then re-enabled the service.

I think one idea we had was to put that method on the vat-vat-admin root object, rather than on vatAdminService (as an authority-limiting thing), and have that object's method drive the coordinated upgrade. That might be messy if the static vat being upgraded is itself, but maybe not impossible. Another option is to have the kernel track the disable coordination kpid using something similar to the kernel.kpRegisterInterest tool used by controller.queueToVatRoot.

Description of the Design

The current working design consists of some extensions to the vat admin vat (to be explained shortly) and a single new controller method, controller.upgradeStaticVat(vatName, shouldPauseFirst, bundleID, options), where:

  • vatName is the name of the vat to be upgraded -- note that given how we configure swingsets, all static vats have names; however, a case could be made for passing a vatID here instead (the argument, basically, is that vat IDs are more fundamental), though this would give the controller more dangerous power since it could then operate on any vat, not just static ones.
  • shouldPauseFirst is a boolean flag. If true, the vat to be shutdown is first sent the message pauseService, giving it the opportunity to finish any activity in progress that may have associated promises (more on this below).
  • bundleID is the bundle ID of the bundle to upgrade the vat to.
  • options is a bag of options to the upgrade operation, identical to that passed to the vatAdminService's upgrade method.

The guts of the vatAdminService's upgrade method are broken out into a new method on the vat admin vat's root object, upgradeVat(vatID, readyP, bundleID, options). This is an internal method, like almost all the other vat admin root methods; it is not expected that user code (e.g., a swingset bootstrap vat) will ever call this. However, this provides a hook whereby the swingset controller object can trigger an upgrade. Most of the parameters are self explanatory, except the second, readyP. This is an optional promise that, if given, will be awaited prior to actually proceeding with shutting down the vat being upgraded and upgrading it. Note that the vat admin will only await the promise -- value it resolves to will be ignored, and upgrade will furthermore proceed regardless of whether the promise was fulfilled or rejected (this so that a vat will not be in a position to refuse an upgrade; of course it could delay indefinitely by simply never resolving the promise, but then we retain the option of adding a timeout in the future if this becomes an issue).

Static vats (and possibly dynamic vats in the future, though we have not come to a decision about this) can implement the optional methods pauseService and resumeService. When it receives pauseService, the vat stops accepting new service requests (or possibly stops accepting service requests that cannot be handled in a single crank) by rejecting them with a 'service not available, try again later' error, while still allowing any ongoing service requests to proceed to completion. It returns a promise that it should resolve once all "in flight" service requests finish. If it receives resumeService, it should resume accepting new requests for service. The intention is that the pauseService and resumeService operations should be idempotent.

When controller.upgradeStaticVat is called with a shouldPauseFirst parameter of true, it sends a pauseService message to the vat being upgraded and then passes the result promise from that into the readyP parameter of the upgradeVat request to the vat admin vat.

Security Considerations

The authority must be managed carefully, it's a huge one.

Test Plan

Unit tests, like in test/upgrade/, except driven by the test program (doing controller.something()) instead of strictly by the test's bootstrap vat (doing E(adminFacet).upgrade() on a dynamic vat).

Prioritization

This doesn't strictly need to be done for MN-1, but it looks like it won't be hard, and it would give us much more confidence in our #5666 work. I'm ok if the API isn't perfect; that's something we can change later without modifying the vats. Kernel changes are easier than vat changes.

@warner warner added enhancement New feature or request SwingSet package: SwingSet labels Jul 24, 2022
FUDCo added a commit that referenced this issue Jul 28, 2022
FUDCo added a commit that referenced this issue Jul 31, 2022
FUDCo added a commit that referenced this issue Aug 2, 2022
turadg pushed a commit that referenced this issue Aug 2, 2022
@FUDCo
Copy link
Contributor

FUDCo commented Aug 2, 2022

Closed by #5846

@FUDCo FUDCo closed this as completed Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

2 participants