feat: relocate snapshot metadata from kvStore to snapStore #6781

FUDCo · 2023-01-12T00:34:35Z

Completes #6742

warner

Looks pretty good: some small API/name recommendations, and one query/index change to consider. Provisional r+ assuming those changes.

warner · 2023-01-14T20:15:00Z

packages/swing-store/src/snapStore.js

+ *   deleteVatSnapshots: (vatID: string) => void,
+ *   deleteUnusedSnapshots: () => void,
+ *   deleteSnapshotByHash: (hash: string, vatID: string) => void,
+ *   getSnapshotInfo: (vatID: string) => SnapshotInfo,


API name suggestions: to make it more clear whether a method touches all-vs-just-one vat, and/or all-vs-just-one snapshot, and to somewhat factor out the word "snapshot" (since the call sites will all be like snapstore.deleteXXX), I'm thinking:

(remove has, it doesn't make sense any more, at least not without a VatID)

deleteAllUnused (the only non-VatID-specific method)

(I'm not positive we need an API for this: callers could just wait for the first new save() to trigger deletion of all the old ones, and/or they could run a manual /usr/bin/sqlite3 command if they're impatient)

loadLatest

save

deleteAllForVat

deleteVatSnapshotByHash

getLatestInfo

maybe?

has is used in tests so it needs to stay in, though a case can be made that it should acquire a vatID parameter.

In a similar vein, deleteUnusedSnapshots and deleteSnapshotByHash are currently only used by tests, though one can imagine future maintenance or debugging tools that might use them. I'm very disinclined to fall back on using the sqlite command line client to do things like clear out unused snapshots, because that implies a level of understanding of the schema and a level of care using the sqlite tool that I wouldn't want to count on operators (or even myself after stepping away from this code for a few weeks) to necessarily have.

I'm a bit wishy-washy about getting rid of the word "snapshot" in the method names, given folks' propensity to use destructuring on API objects, but your comment does point out a naming inconsistency in that some method names have it and some don't -- if we want to have the word "snapshot" in the names, then for example save should probably be saveSnapshot. And putting "snapshot" in all of them definitely seems silly. So, yeah, we can clean that up. However, all of the non-maintenance, non-test operations (i.e., the ones that are used normally and ubiquitously) are per-vat and operate on the latest snapshot, so by the same reasoning I don't think putting "vat" or "latest" in the names helps.

It also strikes me that it's probably good to be consistent about parameter order -- the large majority of methods that take a vatID parameter should have it in the same (first) position, which is not currently universal.

So I think the main API ends up as:

load(vatID, loadRaw) (unchanged)

save(vatID, endPos, saveRaw) (unchanged)

getInfo(vatID) (was getSnapshotInfo)

deleteAll(vatID) (was deleteVatSnapshots)

with testing/debug/maintenance methods:

hasHash(vatID, hash) (was has(hash))

deleteByHash(vatID, hash) (was deleteSnapshotByHash(hash, vatID))

deleteAllUnused() (was deleteUnusedSnapshots)

How does that all work for you?

After making the above changes to see what it looked/felt like in the code, I'm reconsidering the names. Part of the problem is that the API object is called "snapStore" and not "snapshotStore". Perhaps we might want to change that, but I'm leaning towards mostly keeping the old/existing names. I've found that naked getInfo et al looks strange in the actual use context. Now I'm leaning towards:

loadSnapshot(vatID, loadRaw) (was load)

saveSnapshot(vatID, endPos, saveRaw) (was save)

getSnapshotInfo(vatID) (now unchanged)

deleteSnapshots(vatID) (now unchanged

and then:

hasHash(vatID, hash) (was hasHash(hash) -- keep the added vatID parameter from the above proposal)

deleteSnapshotByHash(vatID, has) (was same name but make the parameter order consistent)

deleteUnusedSnapshots (now unchanged)

Now what do you think?

looks good, although deleteUnusedSnapshots could be deleteAllUnusedSnapshots to highlight the fact that it doesn't take a vatID argument

warner · 2023-01-14T20:16:29Z

packages/swing-store/src/snapStore.js

@@ -98,16 +110,14 @@ export function makeSnapStore(
 ) {
  db.exec(`
    CREATE TABLE IF NOT EXISTS snapshots (
+      vatID TEXT,
+      endPos INTEGER,
+      inUse INTEGER,


nit: used would be easier to type when running a manual /usr/bin/sqlite3 query.. if it's easy to avoid camelCase by using just one word instead of two (and if the single word is just as readable as the pair), I usually try to do so

Also, how hard would it be to add an uncompressedSize column? I can see us wanting some telemetry around that value. The compressed size should be retrievable from SQL directly with length(compressedSnapshot), I think (haven't tested it yet), but that wouldn't help with the uncompressed size.

I added uncompressedSize. I'm wondering if I should also add compressedSize to the table while I'm at it, since it would allow this to be queried without requiring fetching the whole (possibly very large) snapshot blob.

I don't like used because it has a past tense meaning ("was used"), whereas what we really want to indicate "is in use right now".

warner · 2023-01-14T22:48:27Z

packages/swing-store/src/snapStore.js

+  const sqlStopUsingLastSnapshot = db.prepare(`
+    UPDATE snapshots
+    SET inUse = 0
+    WHERE vatID = ?


I know we're deferring indexes and general perf questions until later, but this jumps out at me as doing an O(N) operation when really there will only ever be a single row that needs updating.

If the user has configured their node to automatically delete all snapshots, the query will be O(1) (simply because there won't be any other rows to examine), but if they're archiving for whatever reason, then this query will be more expensive than it really needs to be.

Some other options would be:

add an index on (vatID, inUse), make the statement WHERE vatID = ? AND inUse = 0, to clear the flag on only the rows where the flag was already set

or, make the index on (inUse, vatID), which would accomplish the same thing, but also make a big delete-all-unused-snapshots operation cheaper (because the inUse column appears on the MSB end of the index, and the operation would be DELETE FROM snapshots WHERE inUse = 0)

do a sqlGetSnapshotInfo -like read query first, to learn the endPos of the latest row, and then use WHERE vatID = ? AND endPos = ? to clear the flag on only the most recent snapshot

I think I'm more in favor of the inUse, vatID index.

Adding an inUse qualifier to the WHERE clause on the query should take care of this, i.e. change the SQL to UPDATE snapshots SET inUse = 0 WHERE inUse = 1 and vatID = ?

Also, DELETE FROM snapshots WHERE inUse = 0 is already how the delete-all-unused operation is implemented.

In general, I think we should hold off adding explicit indexes to the database schema until data tells us we need them.

packages/swing-store/test/test-snapstore.js

packages/SwingSet/test/test-state.js

packages/swing-store/src/snapStore.js

packages/SwingSet/src/kernel/state/vatKeeper.js

warner · 2023-01-14T23:32:15Z

packages/SwingSet/src/kernel/vat-loader/manager-subprocess-xsnap.js

+    const snapshotInfo = vatKeeper.getSnapshotInfo();
+    if (snapshotInfo) {
+      const { hash, endPos } = snapshotInfo;
+      kernelSlog.write({ type: 'heap-snapshot-load', vatID, hash, endPos });


I think this slog entry uses hash, while the heap-snapshot-save one uses snapshotID.. we need to make them consistent. I think I prefer using snapshotID rather than hash. You might consider making the contents of snapshotInfo use snapshotID instead of hash (and avoiding variables named hash) to make that consistency easier.

I quite prefer hash, since really there's no longer any such thing as a "snapshot ID". The hash is not being used to identify the snapshot but is instead being used more like a checksum to validate that it consists of the bits that it's supposed to consist of. However, being consistent seems right; I think the proper thing is to excise snapshotID from the few remaining places it appears. Having 'hash' in the slog entry says "I saved/loaded a snapshot and its hash was blah" - i.e., the hash is now just more descriptive metadata.

Closes #6742

FUDCo added enhancement New feature or request SwingSet package: SwingSet labels Jan 12, 2023

FUDCo requested a review from warner January 12, 2023 00:34

FUDCo self-assigned this Jan 12, 2023

FUDCo force-pushed the 6742-snapstore branch from 41afbb0 to 08386ba Compare January 14, 2023 00:45

FUDCo force-pushed the 6742-snapshot-with-metadata branch 2 times, most recently from 21617b8 to e88d84a Compare January 14, 2023 02:44

warner mentioned this pull request Jan 14, 2023

fix(swing-store): move getAllState/setAllState onto a debug object #6722

Closed

warner approved these changes Jan 14, 2023

View reviewed changes

FUDCo force-pushed the 6742-snapstore branch 2 times, most recently from 9ce8a36 to f06a171 Compare January 18, 2023 01:15

FUDCo force-pushed the 6742-snapshot-with-metadata branch from e88d84a to b843da9 Compare January 18, 2023 01:23

feat: relocate snapshot metadata from kvStore to snapStore

0da5f4b

Closes #6742

FUDCo force-pushed the 6742-snapshot-with-metadata branch from b843da9 to 0da5f4b Compare January 18, 2023 22:24

FUDCo merged commit 98acf3b into 6742-snapstore Jan 18, 2023

FUDCo deleted the 6742-snapshot-with-metadata branch January 18, 2023 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: relocate snapshot metadata from kvStore to snapStore #6781

feat: relocate snapshot metadata from kvStore to snapStore #6781

FUDCo commented Jan 12, 2023

warner left a comment

warner Jan 14, 2023

FUDCo Jan 18, 2023

FUDCo Jan 18, 2023

warner Jan 18, 2023

warner Jan 14, 2023

warner Jan 14, 2023

FUDCo Jan 18, 2023

warner Jan 14, 2023

FUDCo Jan 18, 2023

FUDCo Jan 18, 2023

warner Jan 14, 2023

FUDCo Jan 18, 2023

feat: relocate snapshot metadata from kvStore to snapStore #6781

feat: relocate snapshot metadata from kvStore to snapStore #6781

Conversation

FUDCo commented Jan 12, 2023

warner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment