You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the cause of #5507 are snapshots being taking during a crank which take time to complete, sometimes upwards of 2s depending on the size as highlighted by #6164. While #6225 helped remove some inefficiencies by enabling the use of a tmpfs for the temporary snapshot file, and parallelizing the hashing with the compression, we are still doing snapshotting as a 2 stroke engine: write the raw snapshot, then compress it.
One major opportunity for reducing the time it takes for snapshots to be taken is to hash and compress them as they are generated.
Even in a world where we move snapshots taking to end of block, or out of the time critical path, this would be beneficial as it allows to more quickly resume actual work in the vat.
Description of the Design
Open a new stdio pipe between swingset and xsnap
Update xsnap and kernel to handle a snapshot length so that when reading from the pipe, we know when to stop and no more data is available.
Parallelize the compression/decompression with the snapshot taking/loading.
Performance considerations
According to some perf analysis by @warner, a Google Cloud instance should be able to sustain:
141MBps for xsnap to generate a snapshot (written in a tmpfs file)
160 MBps for hashing a snapshot (stored in a tmpfs file)
172 MBps for writing into netfs 4k blocks at a time
64 MBps for compressing a snapshot from/to tmpfs using gzip -3
The compression is likely going to remain the bottle neck, so we may consider writing the snapshot uncompressed, and have a background process compress the snapshots. However streaming the snapshot is still beneficial as we need to hash it before we can commit.
The text was updated successfully, but these errors were encountered:
What is the Problem Being Solved?
One of the cause of #5507 are snapshots being taking during a crank which take time to complete, sometimes upwards of 2s depending on the size as highlighted by #6164. While #6225 helped remove some inefficiencies by enabling the use of a tmpfs for the temporary snapshot file, and parallelizing the hashing with the compression, we are still doing snapshotting as a 2 stroke engine: write the raw snapshot, then compress it.
One major opportunity for reducing the time it takes for snapshots to be taken is to hash and compress them as they are generated.
Even in a world where we move snapshots taking to end of block, or out of the time critical path, this would be beneficial as it allows to more quickly resume actual work in the vat.
Description of the Design
Performance considerations
According to some perf analysis by @warner, a Google Cloud instance should be able to sustain:
gzip -3
The compression is likely going to remain the bottle neck, so we may consider writing the snapshot uncompressed, and have a background process compress the snapshots. However streaming the snapshot is still beneficial as we need to hash it before we can commit.
The text was updated successfully, but these errors were encountered: