Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include extra replay data in vat transcripts #6770

Closed
mhofman opened this issue Jan 10, 2023 · 3 comments
Closed

Include extra replay data in vat transcripts #6770

mhofman opened this issue Jan 10, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet vaults_triage DO NOT USE
Milestone

Comments

@mhofman
Copy link
Member

mhofman commented Jan 10, 2023

What is the Problem Being Solved?

When replaying vats with the replay-transcript.js tool, some information is useful or necessary to ensure fidelity of the replay with the original:

  • The schedule of heap snapshot saves: they cause a forced GC impacting the following state of the worker
  • The computrons used by a delivery influence the Swingset execution (through the run policy) but cannot be matched during replay with the tool.

Description of the Design

This data should be included along other vat transcript data in the stream store.
I believe stream store additions are not currently part of the activityHash generation, but even if it were, both of these are effectively part of consensus operations already, so they could safely be included even then (we decided to move heap snapshots hashes under consensus for state-sync, see #3769).

Observation

While we've also seen reload from snapshot triggering bugs in XS and impacting further execution, these operations should not be included in the transcript data because:

Test Plan

Some unit test to verify data is included. No integration test exists for the replay tool.

@mhofman mhofman added enhancement New feature or request SwingSet package: SwingSet vaults_triage DO NOT USE labels Jan 10, 2023
@mhofman
Copy link
Member Author

mhofman commented Feb 8, 2023

@FUDCo this is the extra info I mentioned we should have recorded in the transcript.

@warner
Copy link
Member

warner commented Apr 16, 2023

I'm working on this now, on top of the groundwork in PR #7428

warner added a commit that referenced this issue Apr 21, 2023
* remove transcript.js, functionality merged into vat-warehouse
* move transcript replay into vat-warehouse
* remove vatSyscallHandler from manager-factory
  * it becomes an argument to manager.deliver()
  * created by vat-warehouse, not vat-loader
* remove compareSyscalls from manager-factory
  * vat-warehouse embeds it in the syscall handler
* remove workerCanBlock: always assumed
* remove useTranscript from manager
  * all deliveries build a transcript entry
  * vat-warehouse only saves it if options.useTranscript is true
* build full anachrophobia log message after delivery is complete
  * show full syscall list for the delivery
  * each annotated as ok/wrong/extra/missing
* shorter transcript property names

refs #6770
warner added a commit that referenced this issue Apr 21, 2023
* remove transcript.js, functionality merged into vat-warehouse
* move transcript replay into vat-warehouse
* remove vatSyscallHandler from manager-factory
  * it becomes an argument to manager.deliver()
  * created by vat-warehouse, not vat-loader
* remove compareSyscalls from manager-factory
  * vat-warehouse embeds it in the syscall handler
* remove workerCanBlock: always assumed
* remove useTranscript from manager
  * all deliveries build a transcript entry
  * vat-warehouse only saves it if options.useTranscript is true
* build full anachrophobia log message after delivery is complete
  * show full syscall list for the delivery
  * each annotated as ok/wrong/extra/missing
* shorter transcript property names

refs #6770
warner added a commit that referenced this issue Apr 23, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (just not `deliver()` commands). The vat-warehouse
records these events in the transcript to help
subsequent (manual/external) replay tools know what happened. Without
them, we'd need to deduce e.g. the heap-snapshot writing schedule by
counting deliveries and comparing them against
snapshotInitial/snapshotInterval .

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

The transcript is broken up into "spans", delimited by heap snapshots
or upgrade-related shutdowns. To bring a worker up to date, we want to
start a worker (either a blank one, or from a snapshot), and then
replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. The current span will never include a 'save-snapshot' or
'shutdown-worker': the span is closed immediately after those events
are added, so replay will never see them. But a tool which replays a
historical span will see them at the end.

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), except that save-snapshot includes the snapshot hash in its
results. In the future, we'll probably record the deterministic subset
of metering results (computrons, maybe something about memory
allocation).

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 24, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 24, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 25, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 27, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 27, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
@ivanlei ivanlei added this to the Vaults EVP milestone Apr 27, 2023
@warner
Copy link
Member

warner commented Apr 28, 2023

Both heap snapshot saves (with snapshotID hashes in the results) and computrons spent during delivery (also in the results) are now included in the transcript, thanks to #7484.

We also add load-snapshot transcript entries, whose arguments include the snapshotID. These entries are always included, immediately after the save-snapshot (but in the subsequent transcript span), even though the kernel is free to either continue using the existing worker, or to discard the worker and launch a new one. Thus the kernel's worker-reuse policy is not part of consensus.

All transcript entries (including results) are folded into the current-span hash, which causes an update to the swing-store export data, which makes them part of consensus.

Declaring victory on this one.

@warner warner closed this as completed Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet vaults_triage DO NOT USE
Projects
None yet
Development

No branches or pull requests

4 participants