move snapstore (XS heap snapshots) into SQLite #6742

warner · 2023-01-03T02:14:59Z

What is the Problem Being Solved?

The next step of #3087 is to move snapStore into SQLite too: this is the component of swing-store that holds XS heap snapshots. These heap snapshots are files, 2-20MB when compressed, created by xsnap when it is instructed to write out the state of its heap. The xsnap process can be launched from a snapshot instead of an empty heap, which saves a lot of time (no need to replay the entire history of the vat).

Currently, swing-store holds these in a dedicated directory (one file per snapshot), in which each file is named after the SHA256 hash of its uncompressed contents (.agoric/data/ag-cosmos-chain-state/xs-snapshots/${HASH}.gz). The kvStore holds a JSON blob with { snapshotID, startPos } in the local.v$NN.lastSnapshot key, to keep track of the vatID->snapshot mapping. It also holds local.snapshot.$id = JSON(vatIDs..) to track the snapshot->vatIDs direction (remember that two vats might converge and use the same snapshot, e.g. newly-created ZCF vats running the same contract that have not diverged significantly yet).

The one-file-per-snapshot approach effectively creates a distinct database, whose commit semantics are based upon an atomic rename (creating the HASH.gz file) and some eventual unlink() syscall that deletes the file. These commit points are different than those of the kvstore which references the files, requiring some annoying interlocks to make sure 1: we always add the file before adding the kvstore reference, and 2: we never delete the file before committing the removal of the last kvstore reference.

It would be a lot cleaner to record both the vat-to-snapshot mapping and the snapshots themselves in the same atomicity domain. Basically two tables:

CREATE TABLE heapSnapshots (
 id TEXT,
 compressed BLOB,
 PRIMARY KEY (id)
)

CREATE TABLE vatHeaps (
 vatID TEXT,
 snapshotID TEXT, -- maybe add a FOREIGN KEY constraint
 PRIMARY KEY (vatID)
)

During commit, or just after changing a vatHeaps entry, we can scan heapSnapshots for unreferenced heaps and delete them.

This will interact with @mhofman 's work to make xsnap read/write its heap by streaming it over a pipe, rather than writing it to a file. This also removes the need for xsnap to have access to the filesystem, which will help with #2386 jail.

Description of the Design

In addition to the new tables, I think the swingStore.snapStore component will have some different APIs. I think one pair to read/write snapshots (doing some streaming thing, maybe an AsyncIterator of uncompressed chunks), and a separate pair to either assign a vatID->snapshotID mapping, or clear the mapping (e.g. when upgrading or terminating a vat).

The kvstore keys (local.v$NN.lastSnapshot and local.snapshot.$id) will go away, in favor of the proper cross-table foreign keys. The startPos field from lastSnapshot needs to be tracked next to the snapshotID: the possibility of convergence means that two different vats might conceivably arrive at the same heap snapshot but on different deliveryNums. This should probably coordinate with the streamStore, so they're all using a matched deliveryNum or transcript entry index.

Efficiency Considerations

We've had some concerns about putting large blobs in SQLite. I (warner) am pretty sure this will be fine. I found one article (https://www.sqlite.org/intern-v-extern-blob.html) examining read speed differences between external files and BLOBs, and for the (default) 4kiB pages we use, they report that external files can be read about twice as fast as in-DB blobs. I ran the tests on a follower node (SSD filesystem), and found the same difference. But note that we're talking about 544MBps for in-DB blobs, vs 965MBps for files on disk, so a typical 2MB compressed snapshot is going to load in a millisecond or two, and the extra speed isn't going to matter.

$ ./kvtest init x1.db --count 1000 --size 10000000  # 1000 snapshots of 10MB each
$ ./kvtest export x1.db dir  # copy all blobs to files in dir/
$ ./kvtest run x1.db --count 1000 --blob-api
SQLite version: 3.40.1
--count 1000 --max-id 1000 --asc
--cache-size 1000 --jmode delete
--mmap 0 --blob-api
Database page size: 4096
Total elapsed time: 18.372
Microseconds per BLOB read: 18372.000
Content read rate: 544.3 MB/s
$ ./kvtest run dir --count 1000 --blob-api
--count 1000 --max-id 1000 --asc
Total elapsed time: 10.365
Microseconds per BLOB read: 10365.000
Content read rate: 964.8 MB/s

Using blobs from DB will require slightly more memory, because the SQLite API doesn't provide streaming access to the blob contents (it is delivered as a single large span of memory), whereas pulling files from disk could read just enough bytes to decompress the next chunk. So while we start a worker from heap snapshot, the kernel process will briefly require 2-20MB of RAM to hold the compressed snapshot data. This will be freed once decompression is complete. Note that we don't need to hold a copy of the decompressed data: we can stream that out as fast as the xsnap process can accept it, and never need to hold more than a reasonably-sized buffer.

Debugging Considerations

We might want a switch to disable the "delete unused snapshots" code, for archive nodes (@mhofman has found it awfully useful to have a way to retain all heap snapshots, for later forensics). To help correlate these with vats, maybe we should have a table of historical (vatID, lastPos) -> snapshotID mappings. Each time we update the main table, we also add an entry to this debugging table (but the debug table entries won't keep the snapshots alive, so either they aren't FOREIGN KEYs or the table only exists when we also remove the "delete unused snapshots" code, so the constraint is never violated).

Security Considerations

Shouldn't be any.

Test Plan

Unit tests.

The text was updated successfully, but these errors were encountered:

mhofman · 2023-01-03T02:35:08Z

Quick observation: the temporary buffering of the compressed snapshot would also need to happen when making a snapshot, as there is the same streaming into DB limitation, as well as the inability to know the hash of the snapshot for the primary ID. The latter could be solved by using a primary ID generated randomly or incrementally, but that would likely require making sure that this primary ID is internal and not used in any consensus paths. However all that is unnecessary if there are no way to stream blobs from the DB.

Btw we could imagine a chunking mechanism to avoid holding full compressed snapshots in memory, but that's effectively re-implementing streaming.

I am also unconvinced that we need to store identical snapshots in the same table entry. This feels like an unnecessary optimization, where the potential space savings are not worth the complexity costs.

warner · 2023-01-03T08:56:01Z

Good points. I'm not worried about the RAM on the snapshot-write side (at least I'm equally non-worried about the write- and read- sides). So I think we read the stream from xsnap, feed each chunk into both the hasher and the compressor, accumulate the compressed data in RAM, then when the stream is done, we write the large compressed blob into the DB under its hash name.

I agree that de-duplicating snapshots is not an important use case (and the practical chances of convergence are pretty low, especially if we update our "when do we take the first snapshot" code to make sure it includes all the deliveries we do during contract startup, which will probably make them completely diverge). I'm a big fan of hash-named files, but if we're saving them as blobs, then we might as well just use CREATE TABLE heapSnapshots ( vatID TEXT, compressedSnapshot BLOB, startPos INTEGER), with maybe a separate debug table for historical values (would cause a bit more churn during updates, since the history table and the real table wouldn't share data, but I doubt that's a big deal).

mhofman · 2023-01-03T10:06:04Z

Was thinking we could simply select all rows for a particular vatID, sort by startPos, and only use the last one when loading from snapshot. That way removing old rows is simply a matter of pruning, which can be host defined.
Edit: to support updates, why not store the incarnation in a column and index on vatID+incarnation?

Also we probably should still store the computed hash of the uncompressed data along with the blob of the compressed data, since we will need it for consensus, state sync and debugability.

For state sync however we did talk a few weeks ago about being able to mark a snapshot as "in use" by the host application while the state sync artifacts are being generated. Goal is to not do expensive operations when initiating a state sync snapshot, and instead leave that to the asynchronous processing which can span blocks if necessary. If we don't do reference counting on snapshot IDs and go with vatID+startPos instead, we may need the state sync logic to constraint XS snapshot pruning. Or we could just go the route of creating a read transaction on this table for state sync purposes.

…tore This is phase 1 of #6742. These changes cease storing snapshots in files but instead keep them in a new table in the swingstore SQLite database. However, in this commit, snapshot tracking metadata is still managed the old way using entries in the kvstore, rather than being integrated directly into the snapshots table.

Completes #6742

…tore This is phase 1 of #6742. These changes cease storing snapshots in files but instead keep them in a new table in the swingstore SQLite database. However, in this commit, snapshot tracking metadata is still managed the old way using entries in the kvstore, rather than being integrated directly into the snapshots table.

Completes #6742

…tore This is phase 1 of #6742. These changes cease storing snapshots in files but instead keep them in a new table in the swingstore SQLite database. However, in this commit, snapshot tracking metadata is still managed the old way using entries in the kvstore, rather than being integrated directly into the snapshots table.

Completes #6742

Closes #6742

warner added enhancement New feature or request SwingSet package: SwingSet labels Jan 3, 2023

warner assigned FUDCo Jan 3, 2023

ivanlei added the vaults_triage DO NOT USE label Jan 3, 2023

mhofman mentioned this issue Jan 4, 2023

Implementing snapshotters of consensus data for SwingSet state sync #5542

Closed

FUDCo mentioned this issue Jan 6, 2023

feat: move snapstore into SQLite database with the rest of the swingstore #6755

Merged

FUDCo added a commit that referenced this issue Jan 12, 2023

feat: relocate snapshot metadata from kvStore to snapStore

f84b136

Completes #6742

FUDCo mentioned this issue Jan 12, 2023

feat: relocate snapshot metadata from kvStore to snapStore #6781

Merged

FUDCo added a commit that referenced this issue Jan 14, 2023

feat: relocate snapshot metadata from kvStore to snapStore

21617b8

Completes #6742

FUDCo added a commit that referenced this issue Jan 14, 2023

feat: relocate snapshot metadata from kvStore to snapStore

e88d84a

Completes #6742

warner mentioned this issue Jan 15, 2023

fix(swing-store): replace getAllState/etc with a debug facet #6796

Merged

FUDCo added a commit that referenced this issue Jan 18, 2023

feat: relocate snapshot metadata from kvStore to snapStore

ad84441

Completes #6742

FUDCo added a commit that referenced this issue Jan 18, 2023

feat: relocate snapshot metadata from kvStore to snapStore

0da5f4b

Closes #6742

FUDCo added a commit that referenced this issue Jan 18, 2023

feat: relocate snapshot metadata from kvStore to snapStore

98acf3b

Closes #6742

mergify bot closed this as completed in 4e0f679 Jan 19, 2023

This was referenced Jan 24, 2023

Await safety not yet fixed in swing-store/src/snapStore.js for sync call #6239

Closed

explore sqlite as kernelDB #3087

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move snapstore (XS heap snapshots) into SQLite #6742

move snapstore (XS heap snapshots) into SQLite #6742

warner commented Jan 3, 2023

mhofman commented Jan 3, 2023

warner commented Jan 3, 2023

mhofman commented Jan 3, 2023 •

edited

Loading

move snapstore (XS heap snapshots) into SQLite #6742

move snapstore (XS heap snapshots) into SQLite #6742

Comments

warner commented Jan 3, 2023

What is the Problem Being Solved?

Description of the Design

Efficiency Considerations

Debugging Considerations

Security Considerations

Test Plan

mhofman commented Jan 3, 2023

warner commented Jan 3, 2023

mhofman commented Jan 3, 2023 • edited Loading

mhofman commented Jan 3, 2023 •

edited

Loading