-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include extra replay data in vat transcripts #6770
Comments
@FUDCo this is the extra info I mentioned we should have recorded in the transcript. |
I'm working on this now, on top of the groundwork in PR #7428 |
* remove transcript.js, functionality merged into vat-warehouse * move transcript replay into vat-warehouse * remove vatSyscallHandler from manager-factory * it becomes an argument to manager.deliver() * created by vat-warehouse, not vat-loader * remove compareSyscalls from manager-factory * vat-warehouse embeds it in the syscall handler * remove workerCanBlock: always assumed * remove useTranscript from manager * all deliveries build a transcript entry * vat-warehouse only saves it if options.useTranscript is true * build full anachrophobia log message after delivery is complete * show full syscall list for the delivery * each annotated as ok/wrong/extra/missing * shorter transcript property names refs #6770
* remove transcript.js, functionality merged into vat-warehouse * move transcript replay into vat-warehouse * remove vatSyscallHandler from manager-factory * it becomes an argument to manager.deliver() * created by vat-warehouse, not vat-loader * remove compareSyscalls from manager-factory * vat-warehouse embeds it in the syscall handler * remove workerCanBlock: always assumed * remove useTranscript from manager * all deliveries build a transcript entry * vat-warehouse only saves it if options.useTranscript is true * build full anachrophobia log message after delivery is complete * show full syscall list for the delivery * each annotated as ok/wrong/extra/missing * shorter transcript property names refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (just not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent (manual/external) replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshotInitial/snapshotInterval . The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. The transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. The current span will never include a 'save-snapshot' or 'shutdown-worker': the span is closed immediately after those events are added, so replay will never see them. But a tool which replays a historical span will see them at the end. The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), except that save-snapshot includes the snapshot hash in its results. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). refs #7199 refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. refs #7199 refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. Previously vat-warehouse used `lastVatID` to track which vat received a delivery most recently, and `saveSnapshot()` used that to decide which vat requires a snapshot. This commit changes that path to be more explicit, and removes `lastVatID`. refs #7199 refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. Previously vat-warehouse used `lastVatID` to track which vat received a delivery most recently, and `saveSnapshot()` used that to decide which vat requires a snapshot. This commit changes that path to be more explicit, and removes `lastVatID`. refs #7199 refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. Previously vat-warehouse used `lastVatID` to track which vat received a delivery most recently, and `saveSnapshot()` used that to decide which vat requires a snapshot. This commit changes that path to be more explicit, and removes `lastVatID`. refs #7199 refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. Previously vat-warehouse used `lastVatID` to track which vat received a delivery most recently, and `saveSnapshot()` used that to decide which vat requires a snapshot. This commit changes that path to be more explicit, and removes `lastVatID`. refs #7199 refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. Previously vat-warehouse used `lastVatID` to track which vat received a delivery most recently, and `saveSnapshot()` used that to decide which vat requires a snapshot. This commit changes that path to be more explicit, and removes `lastVatID`. refs #7199 refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. Previously vat-warehouse used `lastVatID` to track which vat received a delivery most recently, and `saveSnapshot()` used that to decide which vat requires a snapshot. This commit changes that path to be more explicit, and removes `lastVatID`. refs #7199 refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. Previously vat-warehouse used `lastVatID` to track which vat received a delivery most recently, and `saveSnapshot()` used that to decide which vat requires a snapshot. This commit changes that path to be more explicit, and removes `lastVatID`. refs #7199 refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. Previously vat-warehouse used `lastVatID` to track which vat received a delivery most recently, and `saveSnapshot()` used that to decide which vat requires a snapshot. This commit changes that path to be more explicit, and removes `lastVatID`. refs #7199 refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. Previously vat-warehouse used `lastVatID` to track which vat received a delivery most recently, and `saveSnapshot()` used that to decide which vat requires a snapshot. This commit changes that path to be more explicit, and removes `lastVatID`. refs #7199 refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. Previously vat-warehouse used `lastVatID` to track which vat received a delivery most recently, and `saveSnapshot()` used that to decide which vat requires a snapshot. This commit changes that path to be more explicit, and removes `lastVatID`. refs #7199 refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. Previously vat-warehouse used `lastVatID` to track which vat received a delivery most recently, and `saveSnapshot()` used that to decide which vat requires a snapshot. This commit changes that path to be more explicit, and removes `lastVatID`. refs #7199 refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. Previously vat-warehouse used `lastVatID` to track which vat received a delivery most recently, and `saveSnapshot()` used that to decide which vat requires a snapshot. This commit changes that path to be more explicit, and removes `lastVatID`. refs #7199 refs #6770
This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. Previously vat-warehouse used `lastVatID` to track which vat received a delivery most recently, and `saveSnapshot()` used that to decide which vat requires a snapshot. This commit changes that path to be more explicit, and removes `lastVatID`. refs #7199 refs #6770
Both heap snapshot saves (with snapshotID hashes in the results) and computrons spent during delivery (also in the results) are now included in the transcript, thanks to #7484. We also add All transcript entries (including results) are folded into the current-span hash, which causes an update to the swing-store export data, which makes them part of consensus. Declaring victory on this one. |
What is the Problem Being Solved?
When replaying vats with the
replay-transcript.js
tool, some information is useful or necessary to ensure fidelity of the replay with the original:Description of the Design
This data should be included along other vat transcript data in the stream store.
I believe stream store additions are not currently part of the activityHash generation, but even if it were, both of these are effectively part of consensus operations already, so they could safely be included even then (we decided to move heap snapshots hashes under consensus for state-sync, see #3769).
Observation
While we've also seen reload from snapshot triggering bugs in XS and impacting further execution, these operations should not be included in the transcript data because:
Test Plan
Some unit test to verify data is included. No integration test exists for the replay tool.
The text was updated successfully, but these errors were encountered: