kernel should commitCrank at end of start() to record device initialization state writes #4450

warner · 2022-02-04T01:46:34Z

What is the Problem Being Solved?

While working on #4372, I noticed that one unit test (test/virtualObjects/test-representatives.js 'exercise cache') was unexpectedly broken by the introduction of a new devices (devices.bundle). This (raw) device does a vatstoreGet and vatstoreSet as it starts up, to initialize the "next device node" counter. The unit test uses an instrumented kvStore that records all gets and sets, which appear there after the crank buffer is flushed (so we'll observe all the reads happen first, then a batch of writes, etc).

The unit test was flunked by the unexpected pair of devices.bundle operations. However, they didn't happen next to each other. Instead, I saw the set occur a long time after the initial get.

I think the issue is that we aren't doing a commitCrank() at the end of kernel.js start(). So any kvstore writes that happen during startup (e.g. while devices are being loaded and executed) are left in the crank buffer until the first real crank happens, then they get flushed along with the real work.

This is a bit weird: the delivery to vat-bootstrap shouldn't be entangled with state initialization for the whole kernel. It probably wouldn't cause a problem unless the host application performed start() and then immediately committed the hostDB and exited. In that case, the device writes would probably get dropped. This would cause the device to do them again at next reboot, before bootstrap() is delivered, which would cause a consensus failure.

It seems to me that we ought to flush all of these out these changes by adding a commitCrank() to the end of start(). The consequences I can think of are that the crank numbers will be one higher (which will probably break a few unit tests).

A related issue that occurred to me: device vatstore writes that happen because of vat invocations will get unwound if that vat delivery is aborted (along with any other state changes caused by the abandoned delivery). But if the devices invoked its endowments, those changes can't be unwound. Anything the device does is supposed to go into a transaction or buffer or something that doesn't get committed until the host application commits (hangover inconsistency prevention), but it might not be obvious to device authors that they need to pay attention to this restriction.

Description of the Change

in kernel.js, at the end of start(), just after the kernelKeeper.loadStats(), add a kernelKeeler.commitCrank().

Security Considerations

none that I can think of

Test Plan

Some tests may need to be updated, with a higher crank number than before.

cc @FUDCo for thoughts and in case you grab it before I do

The text was updated successfully, but these errors were encountered:

warner · 2022-02-04T01:50:57Z

Hrm, test/test-transcript.js -m 'transcript-one load' fails with an activityHash difference, which tells me that I need to think more carefully about this.

warner · 2022-02-04T02:33:47Z

Huh, right, so start() is not the correct place to do a commitCrank: it is not called in response to external input. The host might call start(), do no work, shut down, start up again, call start() again, etc, and the kernel state is not changing.

For vats, we dealt with this by not doing any state-changing work as we bring the vats online (for the first time, or any time). Now that we want vats to be able to use vatstore at startup, @FUDCo is implementing #2910 to capture this, but we're treating these vat-startup events as distinct deliveries, which gives them a context.

I wonder if we need to do something simliar with devices. the wrinkle is that they legitimately(??) need to read their state on each host restart. Except.. well, they could defer that until someone invokes the device, so even the first read happens in the context of a crank.

I think we went through this before, with the mailbox or timer device, where having a flag in RAM to avoid re-reading DB state cause a consensus failure. Here, it's not a flag in RAM, but rather state-mutating vatstore calls happening at startup, outside the context of a crank.

Needs more thought.

warner · 2022-02-04T02:43:49Z

So maybe the rule is that raw devices are not allowed to perform syscalls during buildDevice(), just like vats were (previously) not allowed to perform syscall during buildRootObject, but worse because there's no deviceslots to make it hard for them to do work that early.

I made this change on 4372-bundlecaps and it removes the need to hack the virtualobjects test that was instrumenting kvStore.

So it's highly probable that this ticket should be closed without code changes. Maybe we should document this prohibition in the raw-device docs (docs/devices.md), maybe we should provide some sort of enforcement.

warner · 2023-01-24T20:36:55Z

@FUDCo with the new SQLite -based storage (and the removal of the crank buffer), do you think this problem is fixed now?

warner added enhancement New feature or request SwingSet package: SwingSet labels Feb 4, 2022

warner assigned warner and FUDCo Feb 4, 2022

Tartuffo added migrate-icebox and removed migrate-icebox labels Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel should commitCrank at end of start() to record device initialization state writes #4450

kernel should commitCrank at end of start() to record device initialization state writes #4450

warner commented Feb 4, 2022

warner commented Feb 4, 2022

warner commented Feb 4, 2022

warner commented Feb 4, 2022

warner commented Jan 24, 2023

kernel should commitCrank at end of start() to record device initialization state writes #4450

kernel should commitCrank at end of start() to record device initialization state writes #4450

Comments

warner commented Feb 4, 2022

What is the Problem Being Solved?

Description of the Change

Security Considerations

Test Plan

warner commented Feb 4, 2022

warner commented Feb 4, 2022

warner commented Feb 4, 2022

warner commented Jan 24, 2023