Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(SwingSet): Don't send stopVat during upgrade #7244

Merged
merged 13 commits into from
Apr 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions packages/SwingSet/docs/garbage-collection.md
Original file line number Diff line number Diff line change
Expand Up @@ -360,9 +360,7 @@ If the last importing vat had previously called `syscall.retireImport`, there wi

## syscall.abandonExport processing

During vat upgrade, the last delivery made to the old version is a special `dispatch.stopVat()`. This instructs the soon-to-be-deleted worker to destroy anything that will not survive the upgrade. All Remotables will be abandoned here, because they live solely in RAM, and the RAM heap does not survive. All non-durable virtual objects will also be abandoned, since only durable ones survive. We may also drop otherwise-durable objects which were only referenced by abandoned non-durable objects.

During this phase, the vat performs a `syscall.abandonExports()` with all the abandoned vrefs. The kernel reacts to this in roughly the same way it reacts to the entire vat being terminated (which it is, sort of, at least the heap is being terminated). Each vref is deleted from the exporting vat's c-list, because *that* vat isn't going to be referencing it any more. The kernel object table is updated to clear the `owner` field: while the kernel object retains its identity, it is now orphaned, and any messages sent to it will be rejected (by the kernel) with a "vat terminated" error.
Although not currently used (see [#6650](https://github.com/Agoric/agoric-sdk/issues/6650)), the kernel still supports vat issuance of `syscall.abandonExports()` with vrefs to be abandoned as part of vat upgrade (i.e., Remotables that live solely in the RAM heap and non-durable merely-virtual objects, and ideally also otherwise-durable objects not stored in baggage and referenced only by other abandoned objects). The kernel reacts to this in roughly the same way it reacts to the entire vat being terminated (which it is, sort of, at least the heap is being terminated). Each vref is deleted from the exporting vat's c-list, because *that* vat isn't going to be referencing it any more. The kernel object table is updated to clear the `owner` field: while the kernel object retains its identity, it is now orphaned, and any messages sent to it will be rejected (by the kernel) with a "vat terminated" error.

No other work needs to be done. Any importing vats will continue to hold their reference as before. They can only tell that the object has been abandoned if they try to send it a message. Eventually, if all the importing vats drop their reference, and nothing else in the kernel is holding one, the kernel object entry will be deleted. In this case, no `dispatch.retireExports` is sent to the old exporting vat, since it's already been removed from their c-list.

Expand Down
123 changes: 62 additions & 61 deletions packages/SwingSet/src/kernel/kernel.js
Original file line number Diff line number Diff line change
Expand Up @@ -446,6 +446,7 @@ export default function buildKernel(
* @param {DeliveryStatus} status
* @param {boolean} decrementReapCount
* @param {MeterID} [meterID]
* @returns {CrankResults}
*/
function deliveryCrankResults(vatID, status, decrementReapCount, meterID) {
let meterUnderrun = false;
Expand Down Expand Up @@ -802,7 +803,6 @@ export default function buildKernel(
}

/**
*
* @param {RunQueueEventUpgradeVat} message
* @returns {Promise<CrankResults>}
*/
Expand All @@ -818,79 +818,90 @@ export default function buildKernel(
return NO_DELIVERY_CRANK_RESULTS; // vat terminated already
}
const { meterID } = vatInfo;
let computrons;
const vatKeeper = kernelKeeper.provideVatKeeper(vatID);
const disconnectObject = {
name: 'vatUpgraded',
upgradeMessage,
incarnationNumber: vatKeeper.getIncarnationNumber(),
};
const disconnectionCapData = kser(disconnectObject);
/** @type { import('../types-external.js').KernelDeliveryStopVat } */
const stopVatKD = harden(['stopVat', disconnectionCapData]);
const stopVatVD = vatWarehouse.kernelDeliveryToVatDelivery(
vatID,
stopVatKD,
);
const stopVatStatus = await deliverAndLogToVat(vatID, stopVatKD, stopVatVD);
const stopVatResults = deliveryCrankResults(vatID, stopVatStatus, false);

// We don't meter stopVat, since no user code is running, but we
// still report computrons to the runPolicy
let { computrons } = stopVatResults; // BigInt or undefined
if (computrons !== undefined) {
assert.typeof(computrons, 'bigint');
}

/**
* Make a method-arguments structure representing failure
* for vat-vat-admin.js vatUpgradeCallback().
* Terminate the vat and translate internal-delivery results into
* abort-without-termination results for the upgrade delivery.
*
* @param {SwingSetCapData} _errorCapData
* @returns {RawMethargs}
* @param {CrankResults} badDeliveryResults
* @param {SwingSetCapData} errorCapData
* @returns {Promise<CrankResults>}
*/
const makeFailureMethargs = _errorCapData => {
insistCapData(_errorCapData); // kser(Error)
// const error = kunser(_errorCD)
// actually we shouldn't reveal the details, so instead we do:
const error = Error('vat-upgrade failure');
return ['vatUpgradeCallback', [upgradeID, false, error]];
};

// TODO: if/when we implement vat pause/suspend, and if
// deliveryCrankResults changes to not use .terminate to indicate
// a problem, this should change to match: where we would normally
// pause/suspend a vat for a delivery error, here we want to
// unwind the upgrade.

if (stopVatResults.terminate) {
const abortUpgrade = async (badDeliveryResults, errorCapData) => {
// get rid of the worker, so the next delivery to this vat will
// re-create one from the previous state
// eslint-disable-next-line @jessie.js/no-nested-await
await vatWarehouse.stopWorker(vatID);

// notify vat-admin of the failed upgrade
const vatAdminMethargs = makeFailureMethargs(
stopVatResults.terminate.info,
);
// notify vat-admin of the failed upgrade without revealing error details
insistCapData(errorCapData);
// const error = kunser(errorCapData);
const error = Error('vat-upgrade failure');
/** @type {RawMethargs} */
const vatAdminMethargs = [
'vatUpgradeCallback',
[upgradeID, false, error],
];

// we still report computrons to the runPolicy
const results = harden({
...stopVatResults,
computrons,
...badDeliveryResults,
computrons, // still report computrons
abort: true, // always unwind
consumeMessage: true, // don't repeat the upgrade
terminate: undefined, // do *not* terminate the vat
vatAdminMethargs,
});
return results;
}

// stopVat succeeded, now we finish cleanup on behalf of the worker
};

// TODO: send BOYD so the terminating vat has one last chance to clean
// cleanup on behalf of the worker
// This used to be handled by a stopVat delivery to the vat itself,
// but the implementation of that was cut to the bone
// in commits like 91480dee8e48ae26c39c420febf73b93deba6ea5
// basically reverting 1cfbeaa3c925d0f8502edfb313ecb12a1cab5eac
// and then ultimately moved to the kernel in a MUCH diminished form
// (see #5342 and #6650, and testUpgrade in
// {@link ../../test/upgrade/test-upgrade.js}).
// We hope to eventually add back correct sophisticated logic
// by e.g. having liveslots sweep the database when restoring a vat.

// send BOYD so the terminating vat has one last chance to clean
// up, drop imports, and delete durable data.
// If a vat is so broken it can't do BOYD, we can make that optional.
// https://github.com/Agoric/agoric-sdk/issues/7001
// If a vat is so broken it can't do BOYD, we can make this optional.
/** @type { import('../types-external.js').KernelDeliveryBringOutYourDead } */
const boydKD = harden(['bringOutYourDead']);
const boydVD = vatWarehouse.kernelDeliveryToVatDelivery(vatID, boydKD);
const boydStatus = await deliverAndLogToVat(vatID, boydKD, boydVD);
const boydResults = deliveryCrankResults(vatID, boydStatus, false);

// we don't meter bringOutYourDead since no user code is running, but we
// still report computrons to the runPolicy
computrons = addComputrons(computrons, boydResults.computrons);

// In the unexpected event that there is a problem during this
// upgrade-internal BOYD, the appropriate response isn't fully
// clear. We currently opt to translate a `terminate` result into a
// non-terminating `abort` that unwinds the upgrade delivery, and to
// ignore a non-terminate `abort` result. This is expected to change
// in the future, especially if we ever need some kind of emergency/
// manual upgrade (which might involve something like throwing an
// error to prompt a kernel panic if the bad vat is marked critical).
// There's a good analysis at
// https://github.com/Agoric/agoric-sdk/pull/7244#discussion_r1153633902
if (boydResults.terminate) {
const { info: errorCapData } = boydResults.terminate;
// eslint-disable-next-line @jessie.js/no-nested-await
const results = await abortUpgrade(boydResults, errorCapData);
return results;
}

// reject all promises for which the vat was decider
for (const kpid of kernelKeeper.enumeratePromisesByDecider(vatID)) {
Expand Down Expand Up @@ -942,20 +953,10 @@ export default function buildKernel(
computrons = addComputrons(computrons, startVatResults.computrons);

if (startVatResults.terminate) {
// unwind just like above
// abort and unwind just like above
const { info: errorCapData } = startVatResults.terminate;
// eslint-disable-next-line @jessie.js/no-nested-await
await vatWarehouse.stopWorker(vatID);
const vatAdminMethargs = makeFailureMethargs(
startVatResults.terminate.info,
);
const results = harden({
...startVatResults,
computrons,
abort: true, // always unwind
consumeMessage: true, // don't repeat the upgrade
terminate: undefined, // do *not* terminate the vat
vatAdminMethargs,
});
const results = await abortUpgrade(startVatResults, errorCapData);
return results;
}

Expand Down
11 changes: 11 additions & 0 deletions packages/SwingSet/src/kernel/state/kernelKeeper.js
Original file line number Diff line number Diff line change
Expand Up @@ -1470,6 +1470,17 @@ export default function makeKernelKeeper(kernelStorage, kernelSlog) {
}

function compareStrings(a, b) {
// natural-sort strings having a shared prefix followed by digits
// (e.g., 'ko42' and 'ko100')
const [_a, aPrefix, aDigits] = /^(\D+)(\d+)$/.exec(a) || [];
if (aPrefix) {
const [_b, bPrefix, bDigits] = /^(\D+)(\d+)$/.exec(b) || [];
if (bPrefix === aPrefix) {
return compareNumbers(aDigits, bDigits);
}
}

// otherwise use the default string ordering
if (a > b) {
return 1;
}
Expand Down
4 changes: 2 additions & 2 deletions packages/SwingSet/test/bootstrap-relay.js
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@ export const buildRootObject = () => {

getTimer: async () => encodePassable(timer),

getVatRoot: async ({ name }) => {
getVatRoot: async ({ name, rawOutput = false }) => {
const vat = vatData.get(name) || Fail`unknown vat name: ${q(name)}`;
const { root } = vat;
return encodePassable(root);
return rawOutput ? root : encodePassable(root);
},

createVat: async ({ name, bundleCapName, vatParameters = {} }) => {
Expand Down
21 changes: 14 additions & 7 deletions packages/SwingSet/test/upgrade/bootstrap-scripted-upgrade.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ import { Far, E } from '@endo/far';
import { assert } from '@agoric/assert';
import { makePromiseKit } from '@endo/promise-kit';

import { NUM_SENSORS } from './num-sensors.js';
const NUM_SENSORS = 39;

const insistMissing = (ref, isCollection = false) => {
let p;
Expand All @@ -26,9 +26,12 @@ export const buildRootObject = () => {
let ulrikRoot;
let ulrikAdmin;
const marker = Far('marker', {});
// sensors are numbered from '1' to match the vobj o+N/1.. vrefs
// for debugging, this array starts with a dummy element so
// the vref of each contained object in an importing vat
// (o-NN where NN starts at 1) is aligned with its index
/** @type {[string, ...object]} */
const importSensors = ['skip0'];
for (let i = 1; i < NUM_SENSORS + 1; i += 1) {
for (let i = 1; i <= NUM_SENSORS; i += 1) {
importSensors.push(Far(`import-${i}`, {}));
}
const { promise, resolve } = makePromiseKit();
Expand Down Expand Up @@ -63,13 +66,17 @@ export const buildRootObject = () => {

dur = await E(ulrikRoot).getDurandal({ d1: 'd1' });
const d1arg = await E(dur).get();
assert.equal(d1arg.d1, 'd1');
// poor man's deepEqual
// (each enumerable own property as a [key, value] pair)
assert.equal(JSON.stringify(Object.entries(d1arg)), '[["d1","d1"]]');

// give ver1 a promise that won't be resolved until ver2
// give version 1 a promise that won't be resolved until after upgrade
await E(ulrikRoot).acceptPromise(promise);
const { p1 } = await E(ulrikRoot).getEternalPromise();

// get promises that never resolve (and will be rejected at upgrade)
const [p1] = await E(ulrikRoot).getEternalPromiseInArray();
p1.catch(() => 'hush');
const p2 = E(ulrikRoot).returnEternalPromise(); // never resolves
const p2 = E(ulrikRoot).getEternalPromise();
p2.catch(() => 'hush');

return { version, data, p1, p2, retain, ...parameters };
Expand Down
1 change: 0 additions & 1 deletion packages/SwingSet/test/upgrade/num-sensors.js

This file was deleted.

Loading