Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel should commitCrank at end of start() to record device initialization state writes #4450

Open
warner opened this issue Feb 4, 2022 · 4 comments
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Feb 4, 2022

What is the Problem Being Solved?

While working on #4372, I noticed that one unit test (test/virtualObjects/test-representatives.js 'exercise cache') was unexpectedly broken by the introduction of a new devices (devices.bundle). This (raw) device does a vatstoreGet and vatstoreSet as it starts up, to initialize the "next device node" counter. The unit test uses an instrumented kvStore that records all gets and sets, which appear there after the crank buffer is flushed (so we'll observe all the reads happen first, then a batch of writes, etc).

The unit test was flunked by the unexpected pair of devices.bundle operations. However, they didn't happen next to each other. Instead, I saw the set occur a long time after the initial get.

I think the issue is that we aren't doing a commitCrank() at the end of kernel.js start(). So any kvstore writes that happen during startup (e.g. while devices are being loaded and executed) are left in the crank buffer until the first real crank happens, then they get flushed along with the real work.

This is a bit weird: the delivery to vat-bootstrap shouldn't be entangled with state initialization for the whole kernel. It probably wouldn't cause a problem unless the host application performed start() and then immediately committed the hostDB and exited. In that case, the device writes would probably get dropped. This would cause the device to do them again at next reboot, before bootstrap() is delivered, which would cause a consensus failure.

It seems to me that we ought to flush all of these out these changes by adding a commitCrank() to the end of start(). The consequences I can think of are that the crank numbers will be one higher (which will probably break a few unit tests).

A related issue that occurred to me: device vatstore writes that happen because of vat invocations will get unwound if that vat delivery is aborted (along with any other state changes caused by the abandoned delivery). But if the devices invoked its endowments, those changes can't be unwound. Anything the device does is supposed to go into a transaction or buffer or something that doesn't get committed until the host application commits (hangover inconsistency prevention), but it might not be obvious to device authors that they need to pay attention to this restriction.

Description of the Change

in kernel.js, at the end of start(), just after the kernelKeeper.loadStats(), add a kernelKeeler.commitCrank().

Security Considerations

none that I can think of

Test Plan

Some tests may need to be updated, with a higher crank number than before.

cc @FUDCo for thoughts and in case you grab it before I do

@warner warner added enhancement New feature or request SwingSet package: SwingSet labels Feb 4, 2022
@warner
Copy link
Member Author

warner commented Feb 4, 2022

Hrm, test/test-transcript.js -m 'transcript-one load' fails with an activityHash difference, which tells me that I need to think more carefully about this.

@warner
Copy link
Member Author

warner commented Feb 4, 2022

Huh, right, so start() is not the correct place to do a commitCrank: it is not called in response to external input. The host might call start(), do no work, shut down, start up again, call start() again, etc, and the kernel state is not changing.

For vats, we dealt with this by not doing any state-changing work as we bring the vats online (for the first time, or any time). Now that we want vats to be able to use vatstore at startup, @FUDCo is implementing #2910 to capture this, but we're treating these vat-startup events as distinct deliveries, which gives them a context.

I wonder if we need to do something simliar with devices. the wrinkle is that they legitimately(??) need to read their state on each host restart. Except.. well, they could defer that until someone invokes the device, so even the first read happens in the context of a crank.

I think we went through this before, with the mailbox or timer device, where having a flag in RAM to avoid re-reading DB state cause a consensus failure. Here, it's not a flag in RAM, but rather state-mutating vatstore calls happening at startup, outside the context of a crank.

Needs more thought.

@warner
Copy link
Member Author

warner commented Feb 4, 2022

So maybe the rule is that raw devices are not allowed to perform syscalls during buildDevice(), just like vats were (previously) not allowed to perform syscall during buildRootObject, but worse because there's no deviceslots to make it hard for them to do work that early.

I made this change on 4372-bundlecaps and it removes the need to hack the virtualobjects test that was instrumenting kvStore.

So it's highly probable that this ticket should be closed without code changes. Maybe we should document this prohibition in the raw-device docs (docs/devices.md), maybe we should provide some sort of enforcement.

@warner
Copy link
Member Author

warner commented Jan 24, 2023

@FUDCo with the new SQLite -based storage (and the removal of the crank buffer), do you think this problem is fixed now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

3 participants