Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate ID numbers from counters that are kept in persistent storage #4898

Merged
merged 1 commit into from
Mar 24, 2022

Conversation

FUDCo
Copy link
Contributor

@FUDCo FUDCo commented Mar 23, 2022

Promise IDs, export IDs, and collection IDs were previously assigned from counters that were maintained in RAM. This would not survive vat shutdown and resurrection.

Now they are tracked via a DB record that is used to implement a write through cache that gets flushed to disk as part of the bringOutYourDead alongside the flushing of the virtual object cache.

Fixes #4730

@FUDCo FUDCo added bug Something isn't working SwingSet package: SwingSet labels Mar 23, 2022
@FUDCo FUDCo added this to the Mainnet 1 milestone Mar 23, 2022
@FUDCo FUDCo requested a review from warner March 23, 2022 07:26
@FUDCo FUDCo self-assigned this Mar 23, 2022
@FUDCo FUDCo force-pushed the 4730-persistent-id-counters branch from 6aa9304 to 20e3f88 Compare March 23, 2022 07:58
@warner
Copy link
Member

warner commented Mar 23, 2022

Thanks for grabbing this one!

Let's do the initial read at the beginning of startVat(), unconditionally, just after the !didStartVat check. That will get us more consistent vat startup syscall behavior: always one read, always at the same time, not dependent upon when userspace or the virtual collection manager starts allocating things.

On the write side, I can think of three approaches:

  • during BOYD as you did here: 0 or 1 writes per BOYD, ID count on disk is not guaranteed correct except just after BOYD
  • during the tail end of dispatch(): 0 or 1 writes per delivery, ID count on disk is always correct
  • immediately after each increment: 0..N writes per delivery, ID count on disk is always correct

It hadn't occurred to me before now to consider amending dispatch() to do a syscall afterwards, but now it's kind of growing on me.

Having the counter be correct all of the time feels like a good thing. From our discussion yesterday about performing upgrade with-or-without the previous vat's involvement, it seems like we prefer the performance benefits of involving the old vat, but if it was corrupted somehow, we could still do the right cleanup at the expense of more DB reads. If the next-id counter isn't kept updated, I think we'd be in a tougher situation: we'd have to heuristic out the right counter value by looking at the previous (possibly too low) value, then scanning the c-list for values that were apparently used, until we get to a counter value that seems to be not in use (and scanning a bit further in case it was used and then retired).

If we do it in dispatch, it would need to happen after the waitUntilQuiescent, something like:

  return gcTools.waitUntilQuiescent().then(() => {
    flushCounters();
    return p;
  });

Copy link
Member

@warner warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do the read-once-in-startVat + write-once-per-delivery changes.

@Tartuffo Tartuffo removed this from the Mainnet 1 milestone Mar 23, 2022
@FUDCo
Copy link
Contributor Author

FUDCo commented Mar 23, 2022

The first-read-on-startvat thing seems like a no brainer and I'm kind of sorry I didn't think of it originally :-)

Seems to me that we've had ideas for a number of things that might like to be done as end-of-crank cleanup, particularly things that involve writing stuff to the vatstore. I'm wondering if it would make sense to add a general post-dispatch function that we'd take care to ensure got called at exactly the right time, and then put all such activity into that. In the case of the stuff in the current PR, we really only need to do the work when the event that was dispatched was a message delivery or a promise notification since those are the only operations that engage with the counters. However, it wouldn't really do any harm to be unconditional since in the other dispatch cases the flush would effectively be a no-op on account of none of the counters having changed during the event. And it would save having to think/worry about whether the operation is actually needed in some other particular case, which is a win from the perspectives of both safety and effort expended.

We've kind of informally adopted bring-out-your-dead as a rough proxy for this end-of-dispatch thing in the sense of "well, at least make sure the stuff gets written before we try to do anything that actually cares about the database state on disk being up to date and consistent". While I share your gut sense about it being good to make sure the state-on-disk is always correct, in this case I'm less concerned because the situation before this PR was that the counters were entirely volatile, and the motivation for this new machinery is the shutdown-before-upgrade case, which does entail doing a BOYD. However, shaving off at most one write per crank doesn't seem like a significant enough performance boost to argue strongly that BOYD is the right time to flush the counters; I did it that way because BOYD was a readily identifiable event with an obvious place for the flush to be invoked from, whereas the place to insert the flush call after dispatch was less immediately obvious. Which is all a long winded way of saying I think doing it post-dispatch is probably the right choice, and I will see if I can make that a clean mechanism that we can piggyback on in the future if more of this kind of stuff turns up (as it probably will).

@FUDCo FUDCo requested a review from warner March 24, 2022 01:43
@FUDCo
Copy link
Contributor Author

FUDCo commented Mar 24, 2022

@warner ready for re-review. I put the updates in a separate commit for ease of review, but I'll squash it out for landing.

@FUDCo FUDCo force-pushed the 4730-persistent-id-counters branch 2 times, most recently from e458703 to a9e4d08 Compare March 24, 2022 02:27
Copy link
Member

@warner warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small functional change, two tiny cosmetic changes, feel free to land after fixing.

if (raw) {
idCounters = JSON.parse(raw);
} else {
idCounters = {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case needs a idCountersAreDirty= so we'll do a write during startVat, instead of waiting until the first delivery that allocates something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base state always the same ({}) so we don't have to write it until there's an actual change.

const exportID = nextExportID;
nextExportID += 1;
return exportID;
return allocateNextID('exportID', 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't have one before, but let's add a note that we use "1" because it's bigger than the "o+0" that's hard-coded for the root object.

The "5" for promises is arbitrary and could be reduced to 0 without problems, but the inital exportID must never be less than 1.

// promise queue. We return 'p' so that any bugs in liveslots that
// cause an error during unmeteredDispatch will be reported to the
// Instead, we wait for userspace to become idle by draining the promise
// queue. We clean up and the return 'p' so that any bugs in liveslots
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "clean up and the return" should be "then return"

also please change the next line "bugs in liveslots that cause an error" to "bugs in liveslots which caues an error", for some reason the repeated "that" is bugging my grammar/ambiguity brain today

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the typo.

On the matter of "that" vs. "which", in this case I think I can argue chapter and verse with whatever grammar gods you sacrifice to that the way it's written is correct, as the second "that" introduces a restrictive clause. However, having two "that"s in close proximity definitely scans awkwardly. I think the fix is actually to drop the first "that" entirely, i.e, "... then return 'p' so any bugs in liveslots that cause an error ..."

Also, I can't believe I just wrote an entire paragraph on a grammar issue.

@FUDCo FUDCo force-pushed the 4730-persistent-id-counters branch from a9e4d08 to 3c39d59 Compare March 24, 2022 21:33
@FUDCo FUDCo force-pushed the 4730-persistent-id-counters branch from 3c39d59 to a57f1b7 Compare March 24, 2022 21:36
@FUDCo FUDCo added the automerge:rebase Automatically rebase updates, then merge label Mar 24, 2022
@mergify mergify bot merged commit 965d84b into master Mar 24, 2022
@mergify mergify bot deleted the 4730-persistent-id-counters branch March 24, 2022 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge bug Something isn't working SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

exported object ID counter vs upgrade
3 participants