Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(swingset): better snapshot scheduling, do BOYD before each #7558

Merged
merged 1 commit into from
Apr 30, 2023

Conversation

warner
Copy link
Member

@warner warner commented Apr 29, 2023

This changes the snapshot scheduling logic to be more consistent. We still use snapshotInitial to trigger a snapshot shortly after worker initialization, and snapshotInterval to trigger periodic ones after that.

However the previous code compared snapshotInitial to the absolute deliveryNum, which meant it only applied to the first incarnation, and would not attempt to take a snapshot shortly after upgrade, leaving the kernel vulnerable to replaying the long startVat delivery for a larger window than we intended. And snapshotInterval was compared against the difference between the latest transcript and the latest snapshot, which changed with the addition of the load-worker pseudo-entry.

The new code uses snapshotInitial whenever there is not an existing snapshot (so the first span of all incarnations), and compares it against the length of the current span (so it includes all the pseudo-events). snapshotInterval is also compared against the length of the current span.

The result is simpler and more predictable set of rules:

  • in the first span of each incarnation, trigger a snapshot once we have at least snapshotInterval entries
  • in all other spans, trigger once we have at least snapshotInterval

In addition, when triggering a snapshot, we perform a BringOutYourDead delivery before asking the worker to save a snapshot. This gives us one last chance to shake out any garbage (making the snapshot as small as possible), and reduces the variation we might see forced GC that happens during snapshot write (any FinalizationRegistry callbacks should get run during the BOYD, not the save-snapshot).

closes #7553
closes #7504

@warner warner added the SwingSet package: SwingSet label Apr 29, 2023
@warner warner requested review from mhofman and FUDCo April 29, 2023 23:09
@warner warner self-assigned this Apr 29, 2023
@warner
Copy link
Member Author

warner commented Apr 29, 2023

oops, looks like I need to fix the rest of the swingset tests first, my apologies

@warner warner force-pushed the 7548-increase-snapshot-initial branch from c51df0b to 1afcfa6 Compare April 30, 2023 00:07
Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yuppity.

@warner warner force-pushed the 7548-increase-snapshot-initial branch from 1afcfa6 to 5e723fa Compare April 30, 2023 00:45
Base automatically changed from 7548-increase-snapshot-initial to master April 30, 2023 01:36
@warner warner added the automerge:rebase Automatically rebase updates, then merge label Apr 30, 2023
This changes the snapshot scheduling logic to be more consistent. We
still use `snapshotInitial` to trigger a snapshot shortly after worker
initialization, and `snapshotInterval` to trigger periodic ones after
that.

However the previous code compared `snapshotInitial` to the absolute
deliveryNum, which meant it only applied to the first incarnation, and
would not attempt to take a snapshot shortly after upgrade, leaving
the kernel vulnerable to replaying the long `startVat` delivery for a
larger window than we intended. And `snapshotInterval` was compared
against the difference between the latest transcript and the latest
snapshot, which changed with the addition of the load-worker
pseudo-entry.

The new code uses `snapshotInitial` whenever there is not an existing
snapshot (so the first span of *all* incarnations), and compares it
against the length of the current span (so it includes all the
pseudo-events). `snapshotInterval` is also compared against the length
of the current span.

The result is simpler and more predictable set of rules:

* in the first span of each incarnation, trigger a snapshot once we
  have at least `snapshotInterval` entries
* in all other spans, trigger once we have at least `snapshotInterval`

In addition, when triggering a snapshot, we perform a BringOutYourDead
delivery before asking the worker to save a snapshot. This gives us
one last chance to shake out any garbage (making the snapshot as small
as possible), and reduces the variation we might see forced GC that
happens during snapshot write (any FinalizationRegistry callbacks
should get run during the BOYD, not the save-snapshot).

closes #7553
closes #7504
@mergify mergify bot merged commit 1265a79 into master Apr 30, 2023
@mergify mergify bot deleted the 7504-boyd-before-snapshot branch April 30, 2023 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

schedule snapshot snapshotInitial deliveries after each upgrade Perform snapshots only after BOYD
2 participants