Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream snapshots between kernel and xsnap worker #6363

Closed
mhofman opened this issue Sep 30, 2022 · 2 comments · Fixed by #7541
Closed

Stream snapshots between kernel and xsnap worker #6363

mhofman opened this issue Sep 30, 2022 · 2 comments · Fixed by #7541
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet vaults_triage DO NOT USE
Milestone

Comments

@mhofman
Copy link
Member

mhofman commented Sep 30, 2022

What is the Problem Being Solved?

One of the cause of #5507 are snapshots being taking during a crank which take time to complete, sometimes upwards of 2s depending on the size as highlighted by #6164. While #6225 helped remove some inefficiencies by enabling the use of a tmpfs for the temporary snapshot file, and parallelizing the hashing with the compression, we are still doing snapshotting as a 2 stroke engine: write the raw snapshot, then compress it.

One major opportunity for reducing the time it takes for snapshots to be taken is to hash and compress them as they are generated.

Even in a world where we move snapshots taking to end of block, or out of the time critical path, this would be beneficial as it allows to more quickly resume actual work in the vat.

Description of the Design

  • Open a new stdio pipe between swingset and xsnap
  • Update xsnap and kernel to handle a snapshot length so that when reading from the pipe, we know when to stop and no more data is available.
  • Parallelize the compression/decompression with the snapshot taking/loading.

Performance considerations

According to some perf analysis by @warner, a Google Cloud instance should be able to sustain:

  • 141MBps for xsnap to generate a snapshot (written in a tmpfs file)
  • 160 MBps for hashing a snapshot (stored in a tmpfs file)
  • 172 MBps for writing into netfs 4k blocks at a time
  • 64 MBps for compressing a snapshot from/to tmpfs using gzip -3

The compression is likely going to remain the bottle neck, so we may consider writing the snapshot uncompressed, and have a background process compress the snapshots. However streaming the snapshot is still beneficial as we need to hash it before we can commit.

@warner
Copy link
Member

warner commented Dec 13, 2022

cc @FUDCo in case it interacts with the plan to store XS heap snapshots in sqlite

@mhofman
Copy link
Member Author

mhofman commented Dec 14, 2022

FYI the implementation of this is 80% done in a local (outdated) branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet vaults_triage DO NOT USE
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants