Allocate ID numbers from counters that are kept in persistent storage #4898

FUDCo · 2022-03-23T07:26:13Z

Promise IDs, export IDs, and collection IDs were previously assigned from counters that were maintained in RAM. This would not survive vat shutdown and resurrection.

Now they are tracked via a DB record that is used to implement a write through cache that gets flushed to disk as part of the bringOutYourDead alongside the flushing of the virtual object cache.

Fixes #4730

warner · 2022-03-23T15:43:16Z

Thanks for grabbing this one!

Let's do the initial read at the beginning of startVat(), unconditionally, just after the !didStartVat check. That will get us more consistent vat startup syscall behavior: always one read, always at the same time, not dependent upon when userspace or the virtual collection manager starts allocating things.

On the write side, I can think of three approaches:

during BOYD as you did here: 0 or 1 writes per BOYD, ID count on disk is not guaranteed correct except just after BOYD
during the tail end of dispatch(): 0 or 1 writes per delivery, ID count on disk is always correct
immediately after each increment: 0..N writes per delivery, ID count on disk is always correct

It hadn't occurred to me before now to consider amending dispatch() to do a syscall afterwards, but now it's kind of growing on me.

Having the counter be correct all of the time feels like a good thing. From our discussion yesterday about performing upgrade with-or-without the previous vat's involvement, it seems like we prefer the performance benefits of involving the old vat, but if it was corrupted somehow, we could still do the right cleanup at the expense of more DB reads. If the next-id counter isn't kept updated, I think we'd be in a tougher situation: we'd have to heuristic out the right counter value by looking at the previous (possibly too low) value, then scanning the c-list for values that were apparently used, until we get to a counter value that seems to be not in use (and scanning a bit further in case it was used and then retired).

If we do it in dispatch, it would need to happen after the waitUntilQuiescent, something like:

  return gcTools.waitUntilQuiescent().then(() => {
    flushCounters();
    return p;
  });

warner

Let's do the read-once-in-startVat + write-once-per-delivery changes.

FUDCo · 2022-03-23T20:19:47Z

The first-read-on-startvat thing seems like a no brainer and I'm kind of sorry I didn't think of it originally :-)

Seems to me that we've had ideas for a number of things that might like to be done as end-of-crank cleanup, particularly things that involve writing stuff to the vatstore. I'm wondering if it would make sense to add a general post-dispatch function that we'd take care to ensure got called at exactly the right time, and then put all such activity into that. In the case of the stuff in the current PR, we really only need to do the work when the event that was dispatched was a message delivery or a promise notification since those are the only operations that engage with the counters. However, it wouldn't really do any harm to be unconditional since in the other dispatch cases the flush would effectively be a no-op on account of none of the counters having changed during the event. And it would save having to think/worry about whether the operation is actually needed in some other particular case, which is a win from the perspectives of both safety and effort expended.

We've kind of informally adopted bring-out-your-dead as a rough proxy for this end-of-dispatch thing in the sense of "well, at least make sure the stuff gets written before we try to do anything that actually cares about the database state on disk being up to date and consistent". While I share your gut sense about it being good to make sure the state-on-disk is always correct, in this case I'm less concerned because the situation before this PR was that the counters were entirely volatile, and the motivation for this new machinery is the shutdown-before-upgrade case, which does entail doing a BOYD. However, shaving off at most one write per crank doesn't seem like a significant enough performance boost to argue strongly that BOYD is the right time to flush the counters; I did it that way because BOYD was a readily identifiable event with an obvious place for the flush to be invoked from, whereas the place to insert the flush call after dispatch was less immediately obvious. Which is all a long winded way of saying I think doing it post-dispatch is probably the right choice, and I will see if I can make that a clean mechanism that we can piggyback on in the future if more of this kind of stuff turns up (as it probably will).

FUDCo · 2022-03-24T01:44:33Z

@warner ready for re-review. I put the updates in a separate commit for ease of review, but I'll squash it out for landing.

warner

One small functional change, two tiny cosmetic changes, feel free to land after fixing.

warner · 2022-03-24T20:40:43Z

packages/SwingSet/src/liveslots/liveslots.js

+      if (raw) {
+        idCounters = JSON.parse(raw);
+      } else {
+        idCounters = {};


I think this case needs a idCountersAreDirty= so we'll do a write during startVat, instead of waiting until the first delivery that allocates something

The base state always the same ({}) so we don't have to write it until there's an actual change.

warner · 2022-03-24T20:43:52Z

packages/SwingSet/src/liveslots/liveslots.js

-    const exportID = nextExportID;
-    nextExportID += 1;
-    return exportID;
+    return allocateNextID('exportID', 1);


We didn't have one before, but let's add a note that we use "1" because it's bigger than the "o+0" that's hard-coded for the root object.

The "5" for promises is arbitrary and could be reduced to 0 without problems, but the inital exportID must never be less than 1.

warner · 2022-03-24T20:47:37Z

packages/SwingSet/src/liveslots/liveslots.js

-      // promise queue. We return 'p' so that any bugs in liveslots that
-      // cause an error during unmeteredDispatch will be reported to the
+      // Instead, we wait for userspace to become idle by draining the promise
+      // queue. We clean up and the return 'p' so that any bugs in liveslots


typo "clean up and the return" should be "then return"

also please change the next line "bugs in liveslots that cause an error" to "bugs in liveslots which caues an error", for some reason the repeated "that" is bugging my grammar/ambiguity brain today

Good catch on the typo.

On the matter of "that" vs. "which", in this case I think I can argue chapter and verse with whatever grammar gods you sacrifice to that the way it's written is correct, as the second "that" introduces a restrictive clause. However, having two "that"s in close proximity definitely scans awkwardly. I think the fix is actually to drop the first "that" entirely, i.e, "... then return 'p' so any bugs in liveslots that cause an error ..."

Also, I can't believe I just wrote an entire paragraph on a grammar issue.

…rage fixes #4730

FUDCo added bug Something isn't working SwingSet package: SwingSet labels Mar 23, 2022

FUDCo added this to the Mainnet 1 milestone Mar 23, 2022

FUDCo requested a review from warner March 23, 2022 07:26

FUDCo self-assigned this Mar 23, 2022

FUDCo force-pushed the 4730-persistent-id-counters branch from 6aa9304 to 20e3f88 Compare March 23, 2022 07:58

warner requested changes Mar 23, 2022

View reviewed changes

Tartuffo removed this from the Mainnet 1 milestone Mar 23, 2022

FUDCo requested a review from warner March 24, 2022 01:43

FUDCo force-pushed the 4730-persistent-id-counters branch 2 times, most recently from e458703 to a9e4d08 Compare March 24, 2022 02:27

warner approved these changes Mar 24, 2022

View reviewed changes

FUDCo force-pushed the 4730-persistent-id-counters branch from a9e4d08 to 3c39d59 Compare March 24, 2022 21:33

fix: allocate ID number from counters that are kept in persistent sto…

a57f1b7

…rage fixes #4730

FUDCo force-pushed the 4730-persistent-id-counters branch from 3c39d59 to a57f1b7 Compare March 24, 2022 21:36

FUDCo added the automerge:rebase Automatically rebase updates, then merge label Mar 24, 2022

mergify bot merged commit 965d84b into master Mar 24, 2022

mergify bot deleted the 4730-persistent-id-counters branch March 24, 2022 21:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocate ID numbers from counters that are kept in persistent storage #4898

Allocate ID numbers from counters that are kept in persistent storage #4898

FUDCo commented Mar 23, 2022

warner commented Mar 23, 2022

warner left a comment

FUDCo commented Mar 23, 2022

FUDCo commented Mar 24, 2022

warner left a comment

warner Mar 24, 2022

FUDCo Mar 24, 2022

warner Mar 24, 2022

warner Mar 24, 2022

FUDCo Mar 24, 2022

Allocate ID numbers from counters that are kept in persistent storage #4898

Allocate ID numbers from counters that are kept in persistent storage #4898

Conversation

FUDCo commented Mar 23, 2022

warner commented Mar 23, 2022

warner left a comment

Choose a reason for hiding this comment

FUDCo commented Mar 23, 2022

FUDCo commented Mar 24, 2022

warner left a comment

Choose a reason for hiding this comment

warner Mar 24, 2022

Choose a reason for hiding this comment

FUDCo Mar 24, 2022

Choose a reason for hiding this comment

warner Mar 24, 2022

Choose a reason for hiding this comment

warner Mar 24, 2022

Choose a reason for hiding this comment

FUDCo Mar 24, 2022

Choose a reason for hiding this comment