feat(swingset): devices.bundle, install-bundle, bundlecaps, createVat…

…(bundlecap) Add kernel support for code "bundles", specifically objects with `{ moduleFormat: "EndoZipBase64" }` whose `.EndoZipBase64` property is a large string (base64-encoded zipfile with a compartment map and module components). Each bundle has a "bundleID" which is the versioning prefix `b1-` followed by the lowercase hex encoding of the SHA512 hash of the compartment map bytes. Bundles are represented within userspace as "bundlecaps", which are device nodes owned by a new "bundle device" (`devices.bundle`). These can be passed in messages from one vat to another, just like Remotables. Bundlecaps are used to create vats in lieu of passing the actual (large) code bundles around through messages. Bundlecaps can also be asked for their code bundle in case you need to `importBundle` one directly into userspace (e.g. when ZCF evaluates a contract bundle). The `config.bundles` table is now handled by installing the bundles at `initializeSwingset` time, and populating a name->ID table for later. The new APIs are: * `computedBundleID = controller.validateAndInstallBundle(bundle, allegedBundleID)` will validate the bundle against the claimed ID and add it to the kernel tables (NOTE: validation is minimal so far, must be improved before release) * `kernel.installBundle(bundleID, bundle)` will install a bundle under the given ID without validation * `devices.bundle` provides access to bundles * `D(devices.bundle).getBundleCap(bundleID)` yields a bundlecap or `undefined` if no bundle was installed with that ID * `D(devices.bundle).getNamedBundleCap(name)` yields a bundlecap or `undefined` if config.bundles lacked a bundle with that name * bundlescaps are device nodes * `D(bundlecap).getBundleID()` yields the bundleID * `D(bundlecap).getBundle()` yields a code bundle, for `importBundle()` * `E(vatAdminService).createVat(bundleOrBundleCap)` creates a dynamic vat * eventually we'll remove the option to use a bundle, making this strictly `E(vatAdminService).createVat(bundlecap)` * `E(vatAdminService).createVatByName(name)` still works, but eventually it will be removed in favor of userspace doing `getNamedBundleCap` first refs #4372 closes #3269 closes #4373
Agoric · Feb 9, 2022 · 1c39ebd · 1c39ebd
1 parent d10b0f9
commit 1c39ebd
Show file tree

Hide file tree

Showing 39 changed files with 1,318 additions and 355 deletions.
diff --git a/packages/SwingSet/docs/bundles.md b/packages/SwingSet/docs/bundles.md
@@ -0,0 +1,91 @@
+# Code Bundles
+
+The swingset kernel provides tools to install, reference, and evaluate "Code Bundles". Vats can be created from a bundle, and vats can pass around "bundlecaps" to refer to previously-installed bundles. Using bundlecaps is much more efficient than passing the entire (large) bundle object through messages.
+
+## What Is A Bundle?
+
+We write source code in one or more files named "modules", like `foo.js`. These modules will export some number of symbols (functions, etc), and can import symbols from other modules. These modules are organized into "packages".
+
+In a given workspace (e.g. a Git checkout / working tree), you will have a hierarchy of `node_modules/` directories which contain the packages and modules that can be imported. So when `foo.js` says `import { x } from '../bar.js'`, it will get code from a neighboring file, and when it says `import { y } from 'otherpackage'`, it will get code from somewhere in the `node_modules` directory.
+
+If you point at a single "entry point" module, and chase down all the `import` statements, you'll find some other set of module files. Following the transitive `import` statements leads to a collection of modules and a linkage map which remembers how they are wired together.
+
+A "code bundle" is a data structure that captures the contents of these modules and the linkage map (called a "compartment map"). Bundles are JSON-serializable objects, with one mandatory string property named `moduleFormat`, and other properties that depend on the format. Typically there is exactly one other property, and its value is a very large string (1-2 MB is common). Our bundles use a format that Endo defines, named "EndoZipBase64", in which the "very large string" is a base64-encoded Zip file, which contains one component for the compartment map, plus components for each of the modules it includes.
+
+The `bundleSource()` function takes a filename of the entry point module and returns (a promise for) the code bundle object. It generally needs disk access to find all the imports.
+
+The `importBundle()` function takes a bundle object and evaluates it, returning (a promise for) a "module namespace object", which contains all the exports of the entry-point module. You can think of `importBundle()` as what Node.js effectively does when you run `node ./foo.js`: it does the equivalent of `bundleSource()` followed by an immediate `importBundle()`, ignoring the exported symbols but executing everything in `foo.js` as a side-effect.
+
+Bundles are interesting because they can be serialized and sent to a remote system. They can be saved in a database and instantiated (evaluated/imported) later, possibly multiple times.
+
+## BundleIDs
+
+We define the "bundle ID" as a hash of the bundle's compartment map file, specifically the fixed string `b1-` followed by the lowercase hex encoding of the SHA512 hash of the compartment map file. The compartment map includes hashes of all the modules included in the bundle, and the validation process ensures that the bundle contains exactly the expected set of modules. So the bundle ID is a strong identifier of the contents of the bundle, which is not sensitive to the order of the zipfile contents.
+
+A bundle might be assembled piecemeal. If I want to give you a bundle, and you already have a number of similar bundles, we can compare the modules in each and figure out the ones my bundle needs but you don't have yet. Then I can send you just the compartment map and the missing modules, and you can reassemble a functionally identical bundle. The bundleID will be the same, the imported behavior will be the same, but the zipfile itself might be slightly different. This lets us minimize the amount of data transmitted and stored.
+
+## How Swingset Uses Bundles
+
+The swingset kernel maintains a "bundle table" in the kernel database. Bundles can be installed here, indexed by their bundleID, and retrieved for various purposes:
+
+* all swingset vats, both static and dynamic, start from a bundle
+  * in these bundles, the entry point module exports a function named `buildRootObject`
+* through a special `devices.bundle`, vat code can exchange a bundleID for a "bundlecap", which is an ocap-friendly way to refer to a bundle
+* the bundlecap can be used with `vatAdminService~.createVat()` to make a new dynamic vat
+* userspace can use `D(bundlecap).getBundle()` to fetch the bundle itself, for use with `importBundle()` that does not create an entire new vat
+  * the Zoe "ZCF" facet uses this to load contract code within an existing vat
+  * this could also be used as part of an in-vat upgrade process, to load new behavior
+* each vat also has a "liveslots" layer, defined by a bundle
+  * the liveslots bundleID is recorded separately for each vat, so liveslots can be upgraded (for new vats) without affecting the behavior of existing ones
+* swingset devices are defined by bundles that are stored in the bundle table
+* the kernel source code itself is stored in a bundle, to make it easier to switch from one version of the kernel to another at a pre-determined time
+
+## Bundle Installation Through Config
+
+When defining a static vat in the Swingset `config.vats` object, the filename provided as `sourceSpec` is turned into a bundle, the bundle is installed into the bundle table, and resulting bundleID is stored in the vat's database record. When the static vat is launched, the DB record provides the bundleID, and the bundle is loaded and evaluated in a new vat worker.
+
+The `config.bundles` object maps names to a bundle specification. These bundles are installed as above, and then a special "named bundles" table is updated with the name-to-bundleID mapping. These names are available to `D(devices.bundle).getNamedBundleCap(name) -> bundlecap`. For example, the chain's "core bootstrap" code will use this to define bundles for the core vats (Zoe, etc), and create dynamic vats at bootstrap time from them. It will also provide Zoe with the bundlecap for ZCF this way, so Zoe can later create dynamic ZCF vats. `E(vatAdminService).createVatByName(bundleName)` will continue to be supported until core-bootstrap is updated to retrieve bundlecaps, after which vat-admin will drop `createVatByName` and only support `createVat(bundlecap)`.
+
+The `initializeSwingset()` function, called when the host application is first configured, creates bundles for built-in vats and devices (timer, vatAdmin, mailbox), as well as liveslots and the kernel, and installs them into the table as well. Internally, the kernel remembers the bundleID of each one for later use.
+
+## Bundle Installation at Runtime
+
+Once the kernel is up and running, new bundles can be installed with the `controller.validateAndInstallBundle()` interface. This accepts a bundle and an optional alleged bundleID. It performs validity checks: if the ID is provided but does not match the compartment map, or if (TODO) the bundle contents do not match the compartment map, or if (TODO) the contents do not parse as JavaScript modules, then it will throw. If everything looks good, it will install the bundle into the table and return the computed bundleID.
+
+Once the bundle is installed, the external caller can send a vat-level message with the bundleID to some vat within the kernel. The receiving vat can then do `D(devices.bundle).getBundleCap(bundleID) -> bundlecap` to get a handle from which vats can be created.
+
+A future version of this interface will expose enough information to install individual modules first, and then "install" a bundle from just the compartment map contents (after checking that all the required modules are already present).
+
+By moving bundle installation into a separate external interface, vat-level messages can remain small. Currently the only place the full bundle will appear is in a vat transcript, in the results of the syscall that implements `D(bundlecap).getBundle()`, when userspace needs to do an `importBundle()` directly, and we hope to remove even that copy in the future.
+
+## Bundlecaps
+
+As suggested above, userspace works with a "bundlecap", which is a passable object (actually a device node) that represents the bundle. This can be passed through eventual-sends from one vat to another and stored in collections (and virtual objects). Internally, it wraps the bundleID.
+
+The full set of things you can do with a bundlecap are:
+
+* `D(bundlecap).getBundleID() -> string`: to get the bundleID
+* `D(bundlecap).getBundle() -> bundle object`: if you really need the full bundle, i.e. for `importBundle()`
+* `E(vatAdminService).createVat(bundlecap)`: create a new dynamic vat from the bundle
+
+## Kernel Internals
+
+The kvStore has a subset of the key space reserved for holding bundles (mapping bundleID to the JSON-encoded bundle). Another keyspace holds mapping from bundle name to bundleID for the named bundles. These are accessed through `kernelKeeper` methods.
+
+The "bundle device" (aka `devices.bundle`) has a root device node with an API to create bundlecaps:
+* `D(devices.bundle).getBundleCap(bundleID) -> bundlecap`
+* `D(devices.bundle).getNamedBundleCap(bundleName) -> bundlecap`
+
+Bundlecaps are new device nodes created by the bundle device. Internally, this device remembers the mapping from bundleID to the device node ID (`dref`), and vice versa, in a pair of vatstore state entries. This mapping is not held in RAM: the state entries are looked up on each call to `getBundleID`/etc. The device can also access the kernelKeeper APIs to convert bundleIDs into full bundles, if requested.
+
+The vatAdmin vat's `createVat` method accepts bundlecaps (or full bundles, for now). The vatAdmin device (which is wrapped tightly by vatAdminVat and not exposed to anyone else) accepts bundleIDs. VatAdminDevice translates bundlecaps into bundleIDs before talking to the device.
+
+`controller.validateAndInstallBundle(bundle, optionalAllegedBundleID)` performs validity checks, installs the bundle, and returns the computed bundleID. It uses `kernel.installBundle(bundleID, bundle)` internally, which uses uses `kernelKeeper` to store the bundle, and does not perform validation.
+
+
+
+## Determinism
+
+Bundle installation is a transactional event: it must happen, and be committed, before the bundle is available. `D(devices.bundle).getBundleCap(bundleID)` will fail (consistently) if the bundle was not already installed. Host applications (like a chain) should perform `validateAndInstallBundle()` from within a transaction, so all validators maintain consensus about whether the bundle is installed or not.
+
+But once a bundlecap is obtained, the corresponding bundle is guaranteed to be available. We do not yet have GC for bundles, but once we do, each copy of the bundlecap will establish a reference count. As long as any bundlecap is still held, the bundle will be held too. Bundles will be deleted at some point after the last bundlecap is gone. (We need a story for what keeps the bundle alive between the external `controller.validateAndInstallBundle()` and some vat getting a bundlecap, but that interval is not expected to be very long, so we might just use explicit GC actions that we don't run very often).
diff --git a/packages/SwingSet/package.json b/packages/SwingSet/package.json
@@ -21,6 +21,7 @@
     "lint:eslint": "eslint '**/*.js'"
   },
   "devDependencies": {
+    "@endo/compartment-mapper": "^0.6.5",
     "@endo/ses-ava": "^0.2.17",
     "@types/tmp": "^0.2.0",
     "ava": "^3.12.1",
@@ -41,6 +42,7 @@
     "@agoric/swing-store": "^0.6.3",
     "@agoric/xsnap": "^0.11.0",
     "@endo/base64": "^0.2.17",
+    "@endo/zip": "^0.2.17",
     "anylogger": "^0.21.0",
     "import-meta-resolve": "^1.1.1",
     "node-lmdb": "^0.9.5",

diff --git a/packages/SwingSet/src/controller.js b/packages/SwingSet/src/controller.js
@@ -13,6 +13,7 @@ import { assert, details as X } from '@agoric/assert';
 import { importBundle } from '@endo/import-bundle';
 import { xsnap, recordXSnap } from '@agoric/xsnap';
 
+import { computeBundleID } from './validate-archive.js';
 import { createSHA256 } from './hasher.js';
 import engineGC from './engine-gc.js';
 import { WeakRef, FinalizationRegistry } from './weakref.js';
@@ -295,6 +296,30 @@ export async function makeSwingsetController(
    */
   const defensiveCopy = x => JSON.parse(JSON.stringify(x));
 
+  /**
+   * Validate and install a code bundle.
+   *
+   * @param { EndoZipBase64Bundle } bundle
+   * @param { BundleID? } allegedBundleID
+   * @returns { Promise<BundleID> }
+   */
+  async function validateAndInstallBundle(bundle, allegedBundleID) {
+    // TODO: validation: unpack, parse sources, check hashes
+
+    // this only computes the hash of the compartment map, it does not check
+    // that the rest of the bundle matches
+    const bundleID = await computeBundleID(bundle);
+    if (allegedBundleID) {
+      assert.equal(
+        allegedBundleID,
+        bundleID,
+        `alleged bundleID ${allegedBundleID} does not match actual ${bundleID}`,
+      );
+    }
+    kernel.installBundle(bundleID, bundle);
+    return bundleID;
+  }
+
   // the kernel won't leak our objects into the Vats, we must do
   // the same in this wrapper
   const controller = harden({
@@ -312,6 +337,8 @@ export async function makeSwingsetController(
       kernel.kdebugEnable(flag);
     },
 
+    validateAndInstallBundle,
+
     async run(policy) {
       return kernel.run(policy);
     },

diff --git a/packages/SwingSet/src/devices/bundle.js b/packages/SwingSet/src/devices/bundle.js
@@ -0,0 +1,149 @@
+import { assert } from '@agoric/assert';
+import { buildSerializationTools } from '../deviceTools.js';
+
+/*
+
+The "bundle device" manages code bundles, which can be used to define a new
+dynamic vat (vatAdminSvc~.createVat), or can be imported/evaluated from
+within a vat (importBundle).
+
+Bundles are 'endoZipBase64' archives, which are basically a string-encoded
+ZIP file containing a bunch of importable modules and a single "compartment
+map". Each bundle has a unique "bundleID" string, which is the SHA512 hash of
+the compartment map file, aka the manifest. This manifest contains hashes of
+all the component modules, as well as information about how they link
+together, so the bundle's behavior is completely specified by the bundleID
+(plus the runtime behavior of the module loader, which includes global
+endowments, and potentially "holes" in the module map which will be filled in
+by the loader with locally-provided module namespace objects). The order of
+the modules in a zip file does not affect the bundleID.
+
+Any two bundles are likely to have a lot of modules in common (shared support
+libraries, etc). Modules are individual source files, so they tend to be
+small (1-50kB), but bundles tend to be several MB in size.
+
+The goals are:
+* install bundles "out of band", through controller.validateAndInstallBundle()
+* keep large (multi-MB) bundles out of vat messages and transcripts
+* reduce communication bandwidth by uploading shared modules only once
+* reduce storage costs by storing shared modules only once
+
+The kernel will provide this device with three endowments:
+* hasBundle(bundleID) -> boolean
+* getBundle(bundleID) -> string
+* getNamedBundleID(name) -> bundleID
+
+The root device node offers two methods to callers:
+* D(devices.bundle).getBundleCap(bundleID) -> devnode or undefined
+* D(devices.bundle).getNamedBundleCap(name) -> devnode or undefined
+
+The device node returned by getBundleCap() is called, unsurprisingly, a
+"bundlecap". Most vats interact with bundlecaps, not bundleIDs (although of
+course somebody must call `getBundleCap()` first). Holding a bundlecap
+guarantees that the bundle contents are available, since `getBundleCap()`
+will fail unless the bundle is currently installed. When we implement
+refcounting GC for bundles, the bundlecap will maintain a reference and
+protect the bundle data from collection.
+
+The bundlecap device node provides two device-invocation methods to callers:
+
+* D(bundlecap).getBundleID() -> string (bundleID)
+* D(bundlecap).getBundle() -> string (the bundle contents)
+
+For now, the only way to give a bundle to `importBundle()` is to obtain its
+string contents first, but eventually we hope to implement more direct
+support and keep the large string out of syscall results and userspace
+entirely.
+
+*/
+
+export function buildDevice(tools, endowments) {
+  const { hasBundle, getBundle, getNamedBundleID } = endowments;
+  const { syscall } = tools;
+  const dtools = buildSerializationTools(syscall, 'bundle');
+  const { unserialize, returnFromInvoke, deviceNodeForSlot } = dtools;
+
+  const ROOT = 'd+0';
+  const bundleIDRE = new RegExp('^b1-[0-9a-f]{128}$');
+  const nextDeviceNodeIDKey = 'nextDev';
+
+  // reminder: you may not perform vatstore writes during buildDevice(),
+  // because it runs on each kernel reboot, which is not consistent among
+  // members of a consensus kernel
+
+  function allocateDeviceNode() {
+    let s = syscall.vatstoreGet(nextDeviceNodeIDKey);
+    if (!s) {
+      s = '1';
+    }
+    const id = BigInt(s);
+    syscall.vatstoreSet(nextDeviceNodeIDKey, `${id + 1n}`);
+    return `d+${id}`;
+  }
+
+  function returnCapForBundleID(bundleID) {
+    assert(bundleID);
+    const idToBundleKey = `id.${bundleID}`;
+    let cap;
+    cap = syscall.vatstoreGet(idToBundleKey);
+    if (!cap) {
+      if (!hasBundle(bundleID)) {
+        return returnFromInvoke(undefined);
+      }
+      cap = allocateDeviceNode();
+      syscall.vatstoreSet(idToBundleKey, cap);
+      const capToIDKey = `slot.${cap}`;
+      syscall.vatstoreSet(capToIDKey, bundleID);
+    }
+    return returnFromInvoke(deviceNodeForSlot(cap));
+  }
+
+  // invoke() should use unserialize() and returnFromInvoke. Throwing an
+  // error will cause the calling vat's D() to throw.
+
+  const dispatch = {
+    invoke: (dnid, method, argsCapdata) => {
+      const args = unserialize(argsCapdata);
+
+      if (dnid === ROOT) {
+        // D(devices.bundle).getBundleCap(id) -> bundlecap
+        if (method === 'getBundleCap') {
+          const [bundleID] = args;
+          assert.typeof(bundleID, 'string');
+          assert(bundleIDRE.test(bundleID), 'not a bundleID');
+          return returnCapForBundleID(bundleID);
+        }
+        // D(devices.bundle).getNamedBundleCap(name) -> bundlecap
+        if (method === 'getNamedBundleCap') {
+          const [name] = args;
+          assert.typeof(name, 'string');
+          let bundleID;
+          try {
+            // this throws on a bad name, so make a better error
+            bundleID = getNamedBundleID(name);
+          } catch (e) {
+            throw Error(`unregistered bundle name '${name}'`);
+          }
+          return returnCapForBundleID(bundleID);
+        }
+        throw TypeError(`target[${method}] does not exist`);
+      }
+
+      const capToIDKey = `slot.${dnid}`;
+      const bundleID = syscall.vatstoreGet(capToIDKey);
+      if (bundleID) {
+        // D(bundlecap).getBundleID() -> id
+        if (method === 'getBundleID') {
+          return returnFromInvoke(bundleID);
+        }
+        // D(bundlecap).getBundle() -> bundle
+        if (method === 'getBundle') {
+          return returnFromInvoke(getBundle(bundleID));
+        }
+        throw TypeError(`bundlecap[${method}] does not exist`);
+      }
+      throw TypeError(`unknown device node ${dnid}, shouldn't happen`);
+    },
+  };
+  return dispatch;
+}