-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
explore sqlite as kernelDB #3087
Comments
https://www.npmjs.com/package/better-sqlite3 provides synchronous bindings. |
@warner @FUDCo is the "already in use" aspect of a streamStore a required API feature or just a limitation? sqlite3 provides isolation, so concurrent queries are not a problem. As of 8def34b , |
After today's dive (#6056) into The "already in use" checks are not a requirement: they were a holdover from the old file-based streamStore. We'll still access streams with the same modality (read a suffix during transcript replay, then switch into append-only mode as we make new deliveries, then switch back to reading a suffix if/when a worker is evicted and then reloaded). We can get rid of the awkward As recorded in #6056 (comment) , we'll need to rewrite the CrankBuffer in terms of a SQLite "savepoint", so it includes all the non-kvStore changes. |
Oh, and I'm thinking that we continue to expose only And what would be really fun would be if there were some way to safely give each vat worker direct access to their own vatstore: that would remove all of the cross-process overhead of accessing virtualized/durablized data. The kernel never really needs to read from it, and no other vats ever touch it, so the worker has exclusive access. The tricky part is how to unwind any changes the worker makes if the kernel decides to unwind the crank. One expensive option would be to have the kernel hold a read transaction open on the DB, allow the worker to do its thing, then if it unwinds, the kernel uses that pre-modification txn to read out the full contents of the earlier state, and writes it all into a brand new database. (That'd be free for the usual commit case, but really expensive for the unwind case). Oh, but also, we must ensure that a crash before the block commit point does not allow the individual crank vatstore changes to survive. SQLite has a nifty feature called Attached Databases that allows multiple files to be updated in an atomic fashion (I have no idea how they pull that off). If that could conceivably allow the kernel to open a transaction on the per-vat |
More thoughts:
|
Oh and it might be a good idea to make the kvStore use type BLOB instead of type STRING, so the values 1: can be arbitrary bytes, and 2: there's no DB-side uncertainty about encoding. |
@FUDCo and I talked through this a bit today. We're exploring the idea of doing this in a couple of steps. For the first, #3087 would be complete when we merge the various swing-store DBs (kvStore, streamStore, snapStore) into a unified SQLite DB, but we'd still have only one such DB, not one per vat. The per-vat Then, under a separate ticket, the second step would be to break out the vatstore into a separate table, which would include a Later, we'd look into enhancing the vatstore API to support refcounts more directly, which would help the kernel (or the vat-worker in #6447) be more involved in efficient cleanup of old data after a vat upgrade. We're thinking one table for all virtual/durable objects (with columns for virtual-vs-durable status, kindID, serialized capdata body, serialized capdata slots, refcounts from ephemeral/RAM objects, refcounts from virtual objects, refcounts from durable objects). That way, after an upgrade, we first Later still, #6254 would move the kernel-side shared vatstore tables (with their The issues I can think of that need to be solved to make the first step are:
|
In the course of its evolution, the vatstore has morphed from a general purpose store available to vat code for whatever purposes it chooses into a specialized store accessible only to liveslots. However, the vatstore syscall API still presents it as a general purpose string-to-string key-value store, in line with the original concept for its use. In practice, aside from a small number of simple bookkeeping keys, liveslots' use of the vatstore is for the implementation of more complex persistent abstractions such as virtual objects, collections, and watched promises, for which a simple key-value store is not ideal. Liveslots does various key string encoding tricks to shoehorn the storage for these abstractions and their associated metadata into the key-value patterns that the current syscall API supports. If, as anticipated in the comment above, we wish to exploit the more sophisticated storage affordances of Sqlite or another relational store in order to more directly implement higher level persistence abstractions, then we must either (a) relocate the vatstore into a database accessible directly by the worker process as anticipated in #6254, in which case we can provide a local, liveslots-specific store API whose implementation makes use of the native Sqlite (or whatever) database API, or (b) augment the syscall vatstore interface with additional calls that reflect the liveslots-specific API affordances that would be added in (a). My take is that course (b) is undesirable for several reasons:
This suggests that the sequence of steps is:
Note that relocating vat-specific storage into the worker processes' own databases should be able to be done without committing to any of the strategies for concurrent vat execution that we've been contemplating, though it is likely to be a prerequisite to being able to implement any of them. |
What is the Problem Being Solved?
The swingset kernel currently encodes all of its state in a simple key-value format, with a "schema" (i.e. what we use each key for) informally defined in a large comment in
kernelKeeper.js
. The kernel requires synchronous access to the DB.We use LMDB to hold this "kernelDB" by implementing and using the
swing-store-lmdb
package. We selected LMDB because we found mature Node.js bindings which provided synchronous access. In contrast, the much richer "levelup/leveldown" ecosystem has bindings which are either async, or immature/unsupported.@dckc and I are both fans of SQLite, and think it could be a good idea to move the kernelDB to it:
INDEX
es, which could help a lot with the virtual collections: range queries, sort options, indices #2004 "virtual collections" that @erights is working onlock.mdb
that LMDB uses)data.mdb
is overhead)However we haven't yet looked for synchronous SQLite Node.js bindings, so we don't know if it's really an option or not.
Description of the Design
If we go this direction, we'd start with a
swing-store-sqlite
that has the same API asswing-store-lmdb
, and only offers the same key-value features. Then, if it goes well, we'd abandonswing-store-lmdb
, and start adding new features that depend upon SQLite, such as whatever indexing support #2004 needs.Security Considerations
Both SQLite and LMDB are pretty battle-tested libraries, at least their core C/C++ implementation. The maturity of the bindings will make a big difference, though.
The text was updated successfully, but these errors were encountered: