Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snapshot / BOYD interval based on computrons #6786

Open
mhofman opened this issue Jan 13, 2023 · 8 comments
Open

snapshot / BOYD interval based on computrons #6786

mhofman opened this issue Jan 13, 2023 · 8 comments
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet

Comments

@mhofman
Copy link
Member

mhofman commented Jan 13, 2023

What is the Problem Being Solved?

The amount of time it takes to replay from snapshot varies widely based on the kind of work performed since the last snapshot. Similarly the amount of collectible objects depend on the amount of garbage created/released by the vat. These vary between vats, and can also vary over time for the same vat.

Currently our snapshot and BringOutYourDead interval is expressed in terms of number of deliveries to the vat. A closer approximation would be to use the amount of computrons spent by the vat, which is already tangentially in consensus through the run policy of the host, and is proposed to be more firmly in consensus through #6770.

We should also consider tying these operations together, aka perform a snapshot only immediately after bringOutYourDead so that we don't capture garbage in snapshots.

Description of the Design

An in consensus parameter would define the snapshot/bringOutYourDead interval in terms of computrons instead of deliveries.

Updating this parameter probably has interactions with parallel execution (#6447).

Security Considerations

None I can think of

Test Plan

Yes

@mhofman mhofman added enhancement New feature or request SwingSet package: SwingSet labels Jan 13, 2023
@mhofman
Copy link
Member Author

mhofman commented Jan 13, 2023

As requested by @dckc, here is a graph and histogram of the execution time between snapshots as captured by the replay tool for all vats on mainnet.

Execution time between snapshots
Histogram of snapshot spans by execution time

@warner
Copy link
Member

warner commented Jan 14, 2023

In conjunction with this, we should implement @mhofman 's earlier suggestion that we use the transcript to record the act of writing a snapshot, since they force a GC pass, which changes the state of the worker. If we tied BOYD and snapshotting together (always doing them at the same time), then the transcript will look like:

  • deliveryNum 4: dispatch.deliver ..
  • deliveryNum 5: dispatch.notify ..
  • deliveryNum 6: dispatch.bringOutYourDead
  • deliveryNum 7: writeHeapSnapshot
  • deliveryNum 8: dispatch.deliver ..

The writeHeapSnapshot command would be sent over the same command pipe as the normal dispatch.* deliveries, however it would be sent by a different packages/xsnap library API function (on the kernel side), and it would not be delivered to liveslots (on the worker side). These deliveries would also not appear in the kernel run-queue: they would be injected into the delivery stream by the vat-warehouse or the individual vat's manager, just after making a delivery whose results increase the cumulative computron count above the trigger threshold.

Thinking about the improved scheduler and a per-vat input queue, what we'd really do is push these two deliveries onto the front of the vat-input queue, and then let them be delivered in their own time (so not necessarily right away). They'd happen before anything else on that vat's input queue, but we could allow other vats to get time before returning attention to the one that needs a snapshot written. If the snapshot is called for because this particular vat just did a whole lot of work, then maybe we don't want to spend even more time on this vat right away.

One tricky bit is to make sure we don't accidentally re-schedule a BOYD or snapshot because of the computrons used by the BOYD or snapshot.. we need some per-vat state to remember that we pushed the pair onto the input queue already. We need some durable per-vat state anyways, to remember the cumulative computron count, since the kernel might be rebooted, e.g. just before the BOYD/snapshot was to have been scheduled. Those counters cannot be only in RAM.

The resulting transcript "spans" (#6702/#6775) will always end with a BOYD and a writeHeapSnapshot (e.g. deliveryNum 7, above). They will always start with the delivery just after the writeHeapSnapshot (deliveryNum 8).

If we incorporate vat-upgrade into this transcript-tracked scheme, then the last span of an incarnation will end with a BOYD and an upgradeVat pseudo-delivery. The upgradeVat is not a real delivery (the worker does not receive it). At present, the worker receives a stopVat, then the worker is replaced with a new one (using the new vat bundle), then it receives a startVat with the new vatParameters. So the last span of the previous incarnation could conceivably end with BOYD/stopVat, and the first span of the new incarnation could start with startVat. However I'd like to get rid of stopVat (#6650), in which case the old span would end with BOYD, and the new span would start with startVat. In either case, there is no writeHeapSnapshot in between: we don't use heap snapshots during vat upgrade.

@mhofman
Copy link
Member Author

mhofman commented Jan 15, 2023

In the future improved vat scheduler / parallelization, I think makeSnapshot / BOYD should be driven entirely (but still deterministically) by the vat CDP itself. A logical way of thinking about the kernel -> vat input is that it focuses on direct actions such as message send, promise resolve, retire export, upgrade/exit. The vat -> kernel output is a result of execution derived from these input, such as message send, promise resolve and subscribe, but also drop import, etc.

The fact that snapshots are made by the CDP is an implementation detail that shouldn't cross this boundary, asides from the abstract artifacts that the CDP produces for state-sync.

Similarly, I consider BOYD an internal concern of the CDP in how it find its garbage. It is of course observable in some output of the CDP, which is why it needs to be deterministic. It raises the question of how the result of BYOD should be included in the vat output queue? A way to approach it would be to simply roll the results of BYOD into the output of the delivery that triggered the BYOD. The main concern is the apparent disconnection of this output from the input.

The parametrization is interesting. This is IMO something we want to control by a governance parameter, so once the kernel is updated it needs to inform the vat CDP of the new parameter. That new parameter should be processed in the same input queue since the vat may or may not have processed earlier units of work. That means vat parameter changes are like upgrade/exit commands.

Similarly the computron storage should be something kept logically inside the vat CDP.

@mhofman
Copy link
Member Author

mhofman commented Jan 15, 2023

One tricky bit is to make sure we don't accidentally re-schedule a BOYD or snapshot because of the computrons used by the BOYD or snapshot.

Thinking more about this, I hope we do not count BYOD meter usage against the vat? BYOD is intrinsically a gc related operation which should be exempt from metering. We can "charge" the vat through other means, like the amount of "time" each object is retained, which would be higher if a vat produces a lot of garbage and thus a good proxy for the expense of running BYOD.

@warner
Copy link
Member

warner commented Jan 24, 2023

This would be nice to have, but not strictly needed for the vaults/bulldozer release, and we think we can add it later without causing too many problems. We should definitely include computrons in the consensus state, since this will make us even more sensitive to metering differences.

@warner
Copy link
Member

warner commented Mar 14, 2023

We should also consider telling the runPolicy about the "time" spend writing a snapshot, so it could end the block earlier. The size of the snapshot is a reasonable proxy for the amount of time we spend writing it, and is in-consensus.

@mhofman
Copy link
Member Author

mhofman commented Apr 7, 2023

A further tweak on this would be to let the host schedule reaping of eligible vats.

Performing a snapshot in the middle of a block is potentially disruptive to the block's execution. It also raises the possibility of a snapshot being taken twice for the same vat in a block, which is useless.

BOYD in the middle of an execution is also not the optimal time, and it'd be better to perform reaping when quiescent. If snapshots are performed right after reaping, they would also be smaller.

The host could schedule a reaping (which also triggers a snapshot) after the kernel has run to completion on an inbound queue items for a block. To deal with maxed out blocks, the host could gain knowledge about the need of reaping for vats, and pre-allocate "compute time" in the run policy to perform them at the end.

Edit: the layer is the tricky bit here: vat gc is a swingset concern that may not make sense to expose to the host, even though I'd argue it should be accounted for in the run policy, and we cannot rely on execution computrons to measure its cost (similar to how the cost of taking heap snapshots should be accounted for). However on the other side, block boundaries and time passing is a concern of the host, which may not make sense to expose to swingset directly.

@warner
Copy link
Member

warner commented Sep 13, 2023

We might consider an additional limit based on the concatenated size of the transcript span, to avoid problems like #8325 where the swingstore export artifact (the transcript span, expressed as newline-concatenated transcript entries) was larger than the cosmos-sdk state-sync import size limit. That limit was originally 64MB, but #8325 raises it to a new value. If we want to avoid the possibility of running up against the new value in the future, we could have swingset force a new span if the total span size was getting close.

@mhofman observed only a single large span, v1-bootstrap, and it was only about 65MB, just barely over the limit. It was that big because of the large amount of data we provided in the bootstrap() "argv" argument, with all the account balances/state from the pismo-era IAVL tree. That second (startVat) delivery's transcript item was 65,580,676 bytes long. We're not likely to do that again, so in the future the relationship between span size and total computrons per span will probably(?) be closer. The next-largest transcript item on mainnet is 1,985,575 bytes. I don't know the range of full span sizes is (I need more tools to compute them).

@warner warner changed the title snapshot / BYOD interval based on computrons snapshot / BOYD interval based on computrons Feb 23, 2024
warner added a commit that referenced this issue Mar 29, 2024
`dispatch.bringOutYourDead()`, aka "reap", triggers garbage collection
inside a vat, and gives it a chance to drop imported c-list vrefs that
are no longer referenced by anything inside the vat.

Previously, each vat has a configurable parameter named
`reapInterval`, which defaults to a kernel-wide
`defaultReapInterval` (but can be set separately for each vat). This
defaults to 1, mainly for unit testing, but real applications set it
to something like 200.

This caused BOYD to happen once every 200 deliveries, plus an extra
BOYD just before we save an XS heap-state snapshot.

This commit switches to a "dirt"-based BOYD scheduler, wherein we
consider the vat to get more and more dirty as it does work, and
eventually it reaches a `reapDirtThreshold` that triggers the
BOYD (which resets the dirt counter).

We continue to track `dirt.deliveries` as before, with the same
defaults. But we add a new `dirt.gcKrefs` counter, which is
incremented by the krefs we submit to the vat in GC deliveries. For
example, calling `dispatch.dropImports([kref1, kref2])` would increase
`dirt.gcKrefs` by two.

The `reapDirtThreshold.gcKrefs` limit defaults to 20. For normal use
patterns, this will trigger a BOYD after ten krefs have been dropped
and retired. We choose this value to allow the #8928 slow vat
termination process to trigger BOYD frequently enough to keep the BOYD
cranks small: since these will be happening constantly (in the
"background"), we don't want them to take more than 500ms or so. Given
the current size of the large vats that #8928 seeks to terminate, 10
krefs seems like a reasonable limit. And of course we don't want to
perform too many BOYDs, so `gcKrefs: 20` is about the smallest
threshold we'd want to use.

External APIs continue to accept `reapInterval`, and now also accept
`reapGCKrefs`.

* kernel config record
  * takes `config.defaultReapInterval` and `defaultReapGCKrefs`
  * takes `vat.NAME.creationOptions.reapInterval` and `.reapGCKrefs`
* `controller.changeKernelOptions()` still takes `defaultReapInterval`
   but now also accepts `defaultReapGCKrefs`

The APIs available to userspace code (through `vatAdminSvc`) are
unchanged (partially due to upgrade/backwards-compatibility
limitations), and continue to only support setting `reapInterval`.
Internally, this just modifies `reapDirtThreshold.deliveries`.

* `E(vatAdminSvc).createVat(bcap, { reapInterval })`
* `E(adminNode).upgrade(bcap, { reapInterval })`
* `E(adminNode).changeOptions({ reapInterval })`

Internally, the kernel-wide state records `defaultReapDirtThreshold`
instead of `defaultReapInterval`, and each vat records
`.reapDirtThreshold` in their `vNN.options` key instead of
`vNN.reapInterval`. The current dirt level is recorded in
`vNN.reapDirt`.

The kernel will automatically upgrade both the kernel-wide and the
per-vat state upon the first reboot with the new kernel code. The old
`reapCountdown` value is used to initialize the vat's
`reapDirt.deliveries` counter, so the upgrade shouldn't disrupt the
existing schedule. Vats which used `reapInterval = 'never'` (eg comms)
will get a `reapDirtThreshold` of all 'never' values, so they continue
to inhibit BOYD. Otherwise, all vats get a `threshold.gcKrefs` of 20.

We do not track dirt when the corresponding threshold is 'never', to
avoid incrementing the comms dirt counters forever.

This design leaves room for adding `.computrons` to the dirt record,
as well as tracking a separate `snapshotDirt` counter (to trigger XS
heap snapshots, ala #6786). We add `reapDirtThreshold.computrons`, but
do not yet expose an API to set it.

Future work includes:
* upgrade vat-vat-admin to let userspace set `reapDirtThreshold`

New tests were added to exercise the upgrade process, and other tests
were updated to match the new internal initialization pattern.

We now reset the dirt counter upon any BOYD, so this also happens to
help with #8665 (doing a `reapAllVats()` resets the delivery counters,
so future BOYDs will be delayed, which is what we want). But we should
still change `controller.reapAllVats()` to avoid BOYDs on vats which
haven't received any deliveries.

closes #8980
warner added a commit that referenced this issue Apr 9, 2024
`dispatch.bringOutYourDead()`, aka "reap", triggers garbage collection
inside a vat, and gives it a chance to drop imported c-list vrefs that
are no longer referenced by anything inside the vat.

Previously, each vat has a configurable parameter named
`reapInterval`, which defaults to a kernel-wide
`defaultReapInterval` (but can be set separately for each vat). This
defaults to 1, mainly for unit testing, but real applications set it
to something like 200.

This caused BOYD to happen once every 200 deliveries, plus an extra
BOYD just before we save an XS heap-state snapshot.

This commit switches to a "dirt"-based BOYD scheduler, wherein we
consider the vat to get more and more dirty as it does work, and
eventually it reaches a `reapDirtThreshold` that triggers the
BOYD (which resets the dirt counter).

We continue to track `dirt.deliveries` as before, with the same
defaults. But we add a new `dirt.gcKrefs` counter, which is
incremented by the krefs we submit to the vat in GC deliveries. For
example, calling `dispatch.dropImports([kref1, kref2])` would increase
`dirt.gcKrefs` by two.

The `reapDirtThreshold.gcKrefs` limit defaults to 20. For normal use
patterns, this will trigger a BOYD after ten krefs have been dropped
and retired. We choose this value to allow the #8928 slow vat
termination process to trigger BOYD frequently enough to keep the BOYD
cranks small: since these will be happening constantly (in the
"background"), we don't want them to take more than 500ms or so. Given
the current size of the large vats that #8928 seeks to terminate, 10
krefs seems like a reasonable limit. And of course we don't want to
perform too many BOYDs, so `gcKrefs: 20` is about the smallest
threshold we'd want to use.

External APIs continue to accept `reapInterval`, and now also accept
`reapGCKrefs`.

* kernel config record
  * takes `config.defaultReapInterval` and `defaultReapGCKrefs`
  * takes `vat.NAME.creationOptions.reapInterval` and `.reapGCKrefs`
* `controller.changeKernelOptions()` still takes `defaultReapInterval`
   but now also accepts `defaultReapGCKrefs`

The APIs available to userspace code (through `vatAdminSvc`) are
unchanged (partially due to upgrade/backwards-compatibility
limitations), and continue to only support setting `reapInterval`.
Internally, this just modifies `reapDirtThreshold.deliveries`.

* `E(vatAdminSvc).createVat(bcap, { reapInterval })`
* `E(adminNode).upgrade(bcap, { reapInterval })`
* `E(adminNode).changeOptions({ reapInterval })`

Internally, the kernel-wide state records `defaultReapDirtThreshold`
instead of `defaultReapInterval`, and each vat records
`.reapDirtThreshold` in their `vNN.options` key instead of
`vNN.reapInterval`. The current dirt level is recorded in
`vNN.reapDirt`.

The kernel will automatically upgrade both the kernel-wide and the
per-vat state upon the first reboot with the new kernel code. The old
`reapCountdown` value is used to initialize the vat's
`reapDirt.deliveries` counter, so the upgrade shouldn't disrupt the
existing schedule. Vats which used `reapInterval = 'never'` (eg comms)
will get a `reapDirtThreshold` of all 'never' values, so they continue
to inhibit BOYD. Otherwise, all vats get a `threshold.gcKrefs` of 20.

We do not track dirt when the corresponding threshold is 'never', to
avoid incrementing the comms dirt counters forever.

This design leaves room for adding `.computrons` to the dirt record,
as well as tracking a separate `snapshotDirt` counter (to trigger XS
heap snapshots, ala #6786). We add `reapDirtThreshold.computrons`, but
do not yet expose an API to set it.

Future work includes:
* upgrade vat-vat-admin to let userspace set `reapDirtThreshold`

New tests were added to exercise the upgrade process, and other tests
were updated to match the new internal initialization pattern.

We now reset the dirt counter upon any BOYD, so this also happens to
help with #8665 (doing a `reapAllVats()` resets the delivery counters,
so future BOYDs will be delayed, which is what we want). But we should
still change `controller.reapAllVats()` to avoid BOYDs on vats which
haven't received any deliveries.

closes #8980
warner added a commit that referenced this issue Aug 12, 2024
NOTE: deployed kernels require a new `upgradeSwingset()` call upon
(at least) first boot after upgrading to this version of the kernel
code. See below for details.

`dispatch.bringOutYourDead()`, aka "reap", triggers garbage collection
inside a vat, and gives it a chance to drop imported c-list vrefs that
are no longer referenced by anything inside the vat.

Previously, each vat has a configurable parameter named
`reapInterval`, which defaults to a kernel-wide
`defaultReapInterval` (but can be set separately for each vat). This
defaults to 1, mainly for unit testing, but real applications set it
to something like 1000.

This caused BOYD to happen once every 1000 deliveries, plus an extra
BOYD just before we save an XS heap-state snapshot.

This commit switches to a "dirt"-based BOYD scheduler, wherein we
consider the vat to get more and more dirty as it does work, and
eventually it reaches a `reapDirtThreshold` that triggers the
BOYD (which resets the dirt counter).

We continue to track `dirt.deliveries` as before, with the same
defaults. But we add a new `dirt.gcKrefs` counter, which is
incremented by the krefs we submit to the vat in GC deliveries. For
example, calling `dispatch.dropImports([kref1, kref2])` would increase
`dirt.gcKrefs` by two.

The `reapDirtThreshold.gcKrefs` limit defaults to 20. For normal use
patterns, this will trigger a BOYD after ten krefs have been dropped
and retired. We choose this value to allow the #8928 slow vat
termination process to trigger BOYD frequently enough to keep the BOYD
cranks small: since these will be happening constantly (in the
"background"), we don't want them to take more than 500ms or so. Given
the current size of the large vats that #8928 seeks to terminate, 10
krefs seems like a reasonable limit. And of course we don't want to
perform too many BOYDs, so `gcKrefs: 20` is about the smallest
threshold we'd want to use.

External APIs continue to accept `reapInterval`, and now also accept
`reapGCKrefs`, and `neverReap` (a boolean which inhibits all BOYD,
even new forms of dirt added in the future).

* kernel config record
  * takes `config.defaultReapInterval` and `defaultReapGCKrefs`
  * takes `vat.NAME.creationOptions.reapInterval` and `.reapGCKrefs`
    and `.neverReap`
* `controller.changeKernelOptions()` still takes `defaultReapInterval`
   but now also accepts `defaultReapGCKrefs`

The APIs available to userspace code (through `vatAdminSvc`) are
unchanged (partially due to upgrade/backwards-compatibility
limitations), and continue to only support setting `reapInterval`.
Internally, this just modifies `reapDirtThreshold.deliveries`.

* `E(vatAdminSvc).createVat(bcap, { reapInterval })`
* `E(adminNode).upgrade(bcap, { reapInterval })`
* `E(adminNode).changeOptions({ reapInterval })`

Internally, the kernel-wide state records `defaultReapDirtThreshold`
instead of `defaultReapInterval`, and each vat records
`.reapDirtThreshold` in their `vNN.options` key instead of
`vNN.reapInterval`. The vat-level records override the kernel-wide
values. The current dirt level is recorded in `vNN.reapDirt`.

NOTE: deployed kernels require explicit state upgrade, with:

```js
import { upgradeSwingset } from '@agoric/swingset-vat';
..
upgradeSwingset(kernelStorage);
```

This must be called after upgrading to the new kernel code/release,
and before calling `buildVatController()`. It is safe to call on every
reboot (it will only modify the swingstore when the kernel version has
changed). If changes are made, the host application is responsible for
commiting them, as well as recording any export-data updates (if the
host configured the swingstore with an export-data callback).

During this upgrade, the old `reapCountdown` value is used to
initialize the vat's `reapDirt.deliveries` counter, so the upgrade
shouldn't disrupt the existing schedule. Vats which used `reapInterval
= 'never'` (eg comms) will get a `reapDirtThreshold.never = true`, so
they continue to inhibit BOYD. Any per-vat settings that match the
kernel-wide settings are removed, allowing the kernel values to take
precedence (as well as changes to the kernel-wide values; i.e. the
per-vat settings are not sticky).

We do not track dirt when the corresponding threshold is 'never', or
if `neverReap` is true, to avoid incrementing the comms dirt counters
forever.

This design leaves room for adding `.computrons` to the dirt record,
as well as tracking a separate `snapshotDirt` counter (to trigger XS
heap snapshots, ala #6786). We add `reapDirtThreshold.computrons`, but
do not yet expose an API to set it.

Future work includes:
* upgrade vat-vat-admin to let userspace set `reapDirtThreshold`

New tests were added to exercise the upgrade process, and other tests
were updated to match the new internal initialization pattern.

We now reset the dirt counter upon any BOYD, so this also happens to
help with #8665 (doing a `reapAllVats()` resets the delivery counters,
so future BOYDs will be delayed, which is what we want). But we should
still change `controller.reapAllVats()` to avoid BOYDs on vats which
haven't received any deliveries.

closes #8980
mergify bot added a commit that referenced this issue Aug 12, 2024
fix(swingset): use "dirt" to schedule vat reap/bringOutYourDead

NOTE: deployed kernels require a new `upgradeSwingset()` call upon
(at least) first boot after upgrading to this version of the kernel
code.

`dispatch.bringOutYourDead()`, aka "reap", triggers garbage collection
inside a vat, and gives it a chance to drop imported c-list vrefs that
are no longer referenced by anything inside the vat.

Previously, each vat has a configurable parameter named
`reapInterval`, which defaults to a kernel-wide
`defaultReapInterval` (but can be set separately for each vat). This
defaults to 1, mainly for unit testing, but real applications set it
to something like 1000.

This caused BOYD to happen once every 1000 deliveries, plus an extra
BOYD just before we save an XS heap-state snapshot.

This commit switches to a "dirt"-based BOYD scheduler, wherein we
consider the vat to get more and more dirty as it does work, and
eventually it reaches a `reapDirtThreshold` that triggers the
BOYD (which resets the dirt counter).

We continue to track `dirt.deliveries` as before, with the same
defaults. But we add a new `dirt.gcKrefs` counter, which is
incremented by the krefs we submit to the vat in GC deliveries. For
example, calling `dispatch.dropImports([kref1, kref2])` would increase
`dirt.gcKrefs` by two.

The `reapDirtThreshold.gcKrefs` limit defaults to 20. For normal use
patterns, this will trigger a BOYD after ten krefs have been dropped
and retired. We choose this value to allow the #8928 slow vat
termination process to trigger BOYD frequently enough to keep the BOYD
cranks small: since these will be happening constantly (in the
"background"), we don't want them to take more than 500ms or so. Given
the current size of the large vats that #8928 seeks to terminate, 10
krefs seems like a reasonable limit. And of course we don't want to
perform too many BOYDs, so `gcKrefs: 20` is about the smallest
threshold we'd want to use.

External APIs continue to accept `reapInterval`, and now also accept
`reapGCKrefs`, and `neverReap` (a boolean which inhibits all BOYD,
even new forms of dirt added in the future).

* kernel config record
  * takes `config.defaultReapInterval` and `defaultReapGCKrefs`
  * takes `vat.NAME.creationOptions.reapInterval` and `.reapGCKrefs`
    and `.neverReap`
* `controller.changeKernelOptions()` still takes `defaultReapInterval`
   but now also accepts `defaultReapGCKrefs`

The APIs available to userspace code (through `vatAdminSvc`) are
unchanged (partially due to upgrade/backwards-compatibility
limitations), and continue to only support setting `reapInterval`.
Internally, this just modifies `reapDirtThreshold.deliveries`.

* `E(vatAdminSvc).createVat(bcap, { reapInterval })`
* `E(adminNode).upgrade(bcap, { reapInterval })`
* `E(adminNode).changeOptions({ reapInterval })`

Internally, the kernel-wide state records `defaultReapDirtThreshold`
instead of `defaultReapInterval`, and each vat records
`.reapDirtThreshold` in their `vNN.options` key instead of
`vNN.reapInterval`. The vat-level records override the kernel-wide
values. The current dirt level is recorded in `vNN.reapDirt`.

NOTE: deployed kernels require explicit state upgrade, with:

```js
import { upgradeSwingset } from '@agoric/swingset-vat';
..
upgradeSwingset(kernelStorage);
```

This must be called after upgrading to the new kernel code/release,
and before calling `buildVatController()`. It is safe to call on every
reboot (it will only modify the swingstore when the kernel version has
changed). If changes are made, the host application is responsible for
commiting them, as well as recording any export-data updates (if the
host configured the swingstore with an export-data callback).

During this upgrade, the old `reapCountdown` value is used to
initialize the vat's `reapDirt.deliveries` counter, so the upgrade
shouldn't disrupt the existing schedule. Vats which used `reapInterval
= 'never'` (eg comms) will get a `reapDirtThreshold.never = true`, so
they continue to inhibit BOYD. Any per-vat settings that match the
kernel-wide settings are removed, allowing the kernel values to take
precedence (as well as changes to the kernel-wide values; i.e. the
per-vat settings are not sticky).

We do not track dirt when the corresponding threshold is 'never', or
if `neverReap` is true, to avoid incrementing the comms dirt counters
forever.

This design leaves room for adding `.computrons` to the dirt record,
as well as tracking a separate `snapshotDirt` counter (to trigger XS
heap snapshots, ala #6786). We add `reapDirtThreshold.computrons`, but
do not yet expose an API to set it.

Future work includes:
* upgrade vat-vat-admin to let userspace set `reapDirtThreshold`

New tests were added to exercise the upgrade process, and other tests
were updated to match the new internal initialization pattern.

We now reset the dirt counter upon any BOYD, so this also happens to
help with #8665 (doing a `reapAllVats()` resets the delivery counters,
so future BOYDs will be delayed, which is what we want). But we should
still change `controller.reapAllVats()` to avoid BOYDs on vats which
haven't received any deliveries.

closes #8980
kriskowal pushed a commit that referenced this issue Aug 27, 2024
NOTE: deployed kernels require a new `upgradeSwingset()` call upon
(at least) first boot after upgrading to this version of the kernel
code. See below for details.

`dispatch.bringOutYourDead()`, aka "reap", triggers garbage collection
inside a vat, and gives it a chance to drop imported c-list vrefs that
are no longer referenced by anything inside the vat.

Previously, each vat has a configurable parameter named
`reapInterval`, which defaults to a kernel-wide
`defaultReapInterval` (but can be set separately for each vat). This
defaults to 1, mainly for unit testing, but real applications set it
to something like 1000.

This caused BOYD to happen once every 1000 deliveries, plus an extra
BOYD just before we save an XS heap-state snapshot.

This commit switches to a "dirt"-based BOYD scheduler, wherein we
consider the vat to get more and more dirty as it does work, and
eventually it reaches a `reapDirtThreshold` that triggers the
BOYD (which resets the dirt counter).

We continue to track `dirt.deliveries` as before, with the same
defaults. But we add a new `dirt.gcKrefs` counter, which is
incremented by the krefs we submit to the vat in GC deliveries. For
example, calling `dispatch.dropImports([kref1, kref2])` would increase
`dirt.gcKrefs` by two.

The `reapDirtThreshold.gcKrefs` limit defaults to 20. For normal use
patterns, this will trigger a BOYD after ten krefs have been dropped
and retired. We choose this value to allow the #8928 slow vat
termination process to trigger BOYD frequently enough to keep the BOYD
cranks small: since these will be happening constantly (in the
"background"), we don't want them to take more than 500ms or so. Given
the current size of the large vats that #8928 seeks to terminate, 10
krefs seems like a reasonable limit. And of course we don't want to
perform too many BOYDs, so `gcKrefs: 20` is about the smallest
threshold we'd want to use.

External APIs continue to accept `reapInterval`, and now also accept
`reapGCKrefs`, and `neverReap` (a boolean which inhibits all BOYD,
even new forms of dirt added in the future).

* kernel config record
  * takes `config.defaultReapInterval` and `defaultReapGCKrefs`
  * takes `vat.NAME.creationOptions.reapInterval` and `.reapGCKrefs`
    and `.neverReap`
* `controller.changeKernelOptions()` still takes `defaultReapInterval`
   but now also accepts `defaultReapGCKrefs`

The APIs available to userspace code (through `vatAdminSvc`) are
unchanged (partially due to upgrade/backwards-compatibility
limitations), and continue to only support setting `reapInterval`.
Internally, this just modifies `reapDirtThreshold.deliveries`.

* `E(vatAdminSvc).createVat(bcap, { reapInterval })`
* `E(adminNode).upgrade(bcap, { reapInterval })`
* `E(adminNode).changeOptions({ reapInterval })`

Internally, the kernel-wide state records `defaultReapDirtThreshold`
instead of `defaultReapInterval`, and each vat records
`.reapDirtThreshold` in their `vNN.options` key instead of
`vNN.reapInterval`. The vat-level records override the kernel-wide
values. The current dirt level is recorded in `vNN.reapDirt`.

NOTE: deployed kernels require explicit state upgrade, with:

```js
import { upgradeSwingset } from '@agoric/swingset-vat';
..
upgradeSwingset(kernelStorage);
```

This must be called after upgrading to the new kernel code/release,
and before calling `buildVatController()`. It is safe to call on every
reboot (it will only modify the swingstore when the kernel version has
changed). If changes are made, the host application is responsible for
commiting them, as well as recording any export-data updates (if the
host configured the swingstore with an export-data callback).

During this upgrade, the old `reapCountdown` value is used to
initialize the vat's `reapDirt.deliveries` counter, so the upgrade
shouldn't disrupt the existing schedule. Vats which used `reapInterval
= 'never'` (eg comms) will get a `reapDirtThreshold.never = true`, so
they continue to inhibit BOYD. Any per-vat settings that match the
kernel-wide settings are removed, allowing the kernel values to take
precedence (as well as changes to the kernel-wide values; i.e. the
per-vat settings are not sticky).

We do not track dirt when the corresponding threshold is 'never', or
if `neverReap` is true, to avoid incrementing the comms dirt counters
forever.

This design leaves room for adding `.computrons` to the dirt record,
as well as tracking a separate `snapshotDirt` counter (to trigger XS
heap snapshots, ala #6786). We add `reapDirtThreshold.computrons`, but
do not yet expose an API to set it.

Future work includes:
* upgrade vat-vat-admin to let userspace set `reapDirtThreshold`

New tests were added to exercise the upgrade process, and other tests
were updated to match the new internal initialization pattern.

We now reset the dirt counter upon any BOYD, so this also happens to
help with #8665 (doing a `reapAllVats()` resets the delivery counters,
so future BOYDs will be delayed, which is what we want). But we should
still change `controller.reapAllVats()` to avoid BOYDs on vats which
haven't received any deliveries.

closes #8980
@mhofman mhofman self-assigned this Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

2 participants