Stream snapshots between kernel and xsnap worker #6363

mhofman · 2022-09-30T16:59:02Z

What is the Problem Being Solved?

One of the cause of #5507 are snapshots being taking during a crank which take time to complete, sometimes upwards of 2s depending on the size as highlighted by #6164. While #6225 helped remove some inefficiencies by enabling the use of a tmpfs for the temporary snapshot file, and parallelizing the hashing with the compression, we are still doing snapshotting as a 2 stroke engine: write the raw snapshot, then compress it.

One major opportunity for reducing the time it takes for snapshots to be taken is to hash and compress them as they are generated.

Even in a world where we move snapshots taking to end of block, or out of the time critical path, this would be beneficial as it allows to more quickly resume actual work in the vat.

Description of the Design

Open a new stdio pipe between swingset and xsnap
Update xsnap and kernel to handle a snapshot length so that when reading from the pipe, we know when to stop and no more data is available.
Parallelize the compression/decompression with the snapshot taking/loading.

Performance considerations

According to some perf analysis by @warner, a Google Cloud instance should be able to sustain:

141MBps for xsnap to generate a snapshot (written in a tmpfs file)
160 MBps for hashing a snapshot (stored in a tmpfs file)
172 MBps for writing into netfs 4k blocks at a time
64 MBps for compressing a snapshot from/to tmpfs using gzip -3

The compression is likely going to remain the bottle neck, so we may consider writing the snapshot uncompressed, and have a background process compress the snapshots. However streaming the snapshot is still beneficial as we need to hash it before we can commit.

The text was updated successfully, but these errors were encountered:

warner · 2022-12-13T23:56:59Z

cc @FUDCo in case it interacts with the plan to store XS heap snapshots in sqlite

mhofman · 2022-12-14T00:21:00Z

FYI the implementation of this is 80% done in a local (outdated) branch.

mhofman added the enhancement New feature or request label Sep 30, 2022

mhofman self-assigned this Sep 30, 2022

Tartuffo added SwingSet package: SwingSet next-release labels Sep 30, 2022

warner mentioned this issue Oct 5, 2022

how to change XS on a deployed system #6361

Open

Tartuffo added migrate-in-progress and removed migrate-in-progress labels Nov 17, 2022

rowgraus added the vaults_triage DO NOT USE label Dec 20, 2022

ivanlei removed the vaults-release label Jan 4, 2023

ivanlei modified the milestones: Vaults RC0, Vaults Functional Testing, Vaults EVP Feb 1, 2023

mhofman mentioned this issue Feb 8, 2023

Force xsnap reload from snapshot after writing each snapshot #6943

Closed

warner mentioned this issue Apr 24, 2023

add streaming readSnapshot API to snapStore #7490

Closed

This was referenced Apr 27, 2023

feat(xsnap)!: refactor xsnap wrapper and snapStore to use streams for snapshots #7531

Merged

support streaming snapshot over fd agoric-labs/xsnap-pub#39

Merged

feat(xsnap): stream snapshots over process pipe #7541

Merged

mergify bot closed this as completed in #7541 Apr 30, 2023

mhofman mentioned this issue Sep 11, 2024

put xsnap worker in a seccomp jail #2386

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream snapshots between kernel and xsnap worker #6363

Stream snapshots between kernel and xsnap worker #6363

mhofman commented Sep 30, 2022

warner commented Dec 13, 2022

mhofman commented Dec 14, 2022

Stream snapshots between kernel and xsnap worker #6363

Stream snapshots between kernel and xsnap worker #6363

Comments

mhofman commented Sep 30, 2022

What is the Problem Being Solved?

Description of the Design

Performance considerations

warner commented Dec 13, 2022

mhofman commented Dec 14, 2022