implement secondary-storage syscalls #1831

warner · 2020-10-01T01:31:48Z

What is the Problem Being Solved?

#455 ("hierarchical object identifiers") is about moving voluminous state from RAM onto disk. The bottommost layer will be a new pair of syscalls to allow vats to perform synchronous reads/writes to a per-vat key-value store. This will be used by liveslots to implement the Container API (#1832).

Each vat will have exclusive access to a key-value store whose keys are strings (maybe integers) and whose values are initially strings (but will probably eventually be capdata).

The task here is to implement those two syscalls.

Description of the Design

The names are up for discussion, but I'll use syscall.readX and syscall.writeX for now. We need a name for this particular storage pool to use for the "X": we want to distinguish at least the following pools:

this one: mutable, exclusive to a single vat, indexed by short string or integer (which does not go through the c-list), contains (probably) capdata, syscalls perform add/write/update and read, may eventually store more structured data, add a list syscall with range queries or sort options to support e.g. finding the best matching offer among many
the blobstore: append-only, accessed by c-list -managed blobcaps, values are immutable large strings/bytes, shared among all vats, used as an alternative communication path for large data like vat/contract bundles, may have partial-range read functions. Operations include add and read, and maybe decref. Operations probably want to bypass the transcript (because they're large) and use vatPowers instead of syscalls.
others?

Simply value = syscall.read(key) and syscall.write(key, value) may be good for now.

I think the vat's data should be stored in the HostDB with keys like v$NN.data.$KEY.body. This reserves room for .slots to hold the capdata later.

Alternative Designs

secondary-storage device

If we first refined the device model (#55) to have distinct read and write calls, we might implement this ticket in terms of that model. However:

we'd still need to write the actual device
the device must record this state in the HostDB (with transaction/atomicity boundaries that match the rest of the kernel), which could be an awkward set of endowments to grant to the device
the device should be made available to many vats (I'd argue all of them), and users shouldn't have to update their bootstrap.js to distribute it to the vats which need it
expanding the functionality to include list, range queries, sort options, etc, might be accomplished by adding additional arguments to read (and changing the return value), or it might be better achieved by adding new syscalls
I'm pretty sure we need reads to observe preceeding writes. new device model: read/writeLater #55 suggests "writeLater", where writes aren't visible to read calls until after some outside-the-kernel "block" boundary (on the other hand I think new device model: read/writeLater #55 should really be read/write too)

I'm inclined to think it would be simple to add new syscalls, but I'm interested in other opinions. I believe @dtribble has suggested that a device is the obvious choice for this sort of functionality.

vatPowers instead of syscall

Using syscalls to fetch the numerous data means all of it will be recorded in the transcript. #451 is about removing data from RAM, and doesn't worry about how much is left on disk (or how much we're adding to disk to minimize our RAM footprint). If we wanted to try and minimize this too, we might have the secondary storage API go through vatPowers instead.

However, since vatPowers functions do not go through the transcript, we have no mechanism to make sure the vat sees the right values as the transcript is replayed (that's the whole point of the transcript). This could work for blobcaps, since their data is immutable, but this secondary storage is specifically for mutable data like Purse balances.

I think we're stuck with going through syscalls.

future: capdata

Our hunch is that the second use case (Zoe contracts tracking large numbers of offers/seats) will require the ability to store more than flat data in this table. We expect to store object references too (o-NN/etc), as well as the Representatives described in #455 (which means hierarchical o+NN.NN ids).

The syscalls would need to accept capdata ({ body, slots }, where body is a JSON-format string, and slots is an array of vat-side reference ids). The kernel would need to translate the slots through the vat's c-list just like it does with a syscall.send() call (for write), or dispatch.deliver() (for read). The liveslots-provided Container API's c.get() function would need to use the same marshal.unserialize as the rest of liveslots, and c.update would need to use marshal.serialize.

This marshal instance must know about the Container tables, so any Representatives are serialized properly. This will enable cross-table references (which would be "FOREIGN KEYs" in a SQL system).

I'm not sure whether this ought to include Promises, specifically resolved ones. When these appear in a syscall.send, the vat sends a new (unresolved) promise ID, then schedules a syscall.resolve* call to happen later during the same crank. The kernel does the same thing on the inbound side when it includes an unresolved promise in the arguments of a dispatch.deliver.

The problem here is that the secondary storage API is synchronous, so any promise that it retrieves from disk won't be resolved until later. Should the kernel watch for reads that mention promises in that state and inject resolution notifications for them to the back of the run-queue? Should the read API be enhanced to return capdata plus a list of resolutions for all the promises that happen to be in that state?

I think we need to learn more about the use-case for including Promises in these offline tables before we tackle the question of how to safely implement them. I'd recommend that the first pass use translators which reject promises, and only accept object IDs in the .slots list.

future: range queries, sort options, indices

Once #455 grows to include data like the offer book in a large exchange contract, we'll need more than a simple key-value store. At that point we'll need features like:

list the entire contents of the table
retrieve the subset of the table that meets some search criteria ("all open orders")
sort the results (eg by price)
limit the number of returned results
add an index to improve performance

I don't know what the API will need to look like yet.

Security Considerations

usage limits
- limit number of keys, size of any individual value, aggregate size of all values.
- should the vat be killed when it exceeds the limits? or merely have the syscall return an error?
- budgets should eventually be managed like Meters, when spawning a new vat the parent can share some of its space budget with the child
if we're taking user-provided keys and interpolating them into KernelDB keys, we must be careful about format-confusion attacks, especially if we add enumeration with list
when we add list, we want to continue to enforce ocap discipline
- calling list should not enable code to access an object that it couldn't have seen by other means
- we'll need to define a "collection" object. Holding one grants access to all the Representatives that were previously added to the collection.
- we don't expect to need rights-amplification patterns that involve two collections at the same time (no intersection queries)
- we don't expect to need the collection objects themselves to be serializable or sharable. Objects in collection1 might reference objects in collection2, but they won't reference the collection2 object itself.

The text was updated successfully, but these errors were encountered:

warner · 2020-10-27T06:43:44Z

@FUDCo I think you can close this one, but maybe break out the index stuff to a new ticket.

warner · 2020-11-10T18:38:37Z

The main syscalls were added in PR #1856, which didn't cite this ticket.

warner · 2020-11-10T23:15:49Z

We moved the "virtual collections" aspect off to #2004, so we can close this now.

warner added enhancement New feature or request SwingSet package: SwingSet labels Oct 1, 2020

warner mentioned this issue Oct 1, 2020

implement Representative/Container API in liveslots #1832

Closed

warner mentioned this issue Oct 8, 2020

Strawman ideas for vat secondary storage #1846

Closed

warner assigned FUDCo Oct 27, 2020

warner mentioned this issue Nov 10, 2020

virtual collections: range queries, sort options, indices #2004

Closed

warner closed this as completed Nov 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement secondary-storage syscalls #1831

implement secondary-storage syscalls #1831

warner commented Oct 1, 2020 •

edited

Loading

warner commented Oct 27, 2020

warner commented Nov 10, 2020

warner commented Nov 10, 2020

implement secondary-storage syscalls #1831

implement secondary-storage syscalls #1831

Comments

warner commented Oct 1, 2020 • edited Loading

What is the Problem Being Solved?

Description of the Design

Alternative Designs

secondary-storage device

vatPowers instead of syscall

future: capdata

future: range queries, sort options, indices

warner commented Oct 27, 2020

warner commented Nov 10, 2020

warner commented Nov 10, 2020

warner commented Oct 1, 2020 •

edited

Loading