Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement secondary-storage syscalls #1831

Closed
warner opened this issue Oct 1, 2020 · 3 comments
Closed

implement secondary-storage syscalls #1831

warner opened this issue Oct 1, 2020 · 3 comments
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Oct 1, 2020

What is the Problem Being Solved?

#455 ("hierarchical object identifiers") is about moving voluminous state from RAM onto disk. The bottommost layer will be a new pair of syscalls to allow vats to perform synchronous reads/writes to a per-vat key-value store. This will be used by liveslots to implement the Container API (#1832).

Each vat will have exclusive access to a key-value store whose keys are strings (maybe integers) and whose values are initially strings (but will probably eventually be capdata).

The task here is to implement those two syscalls.

Description of the Design

The names are up for discussion, but I'll use syscall.readX and syscall.writeX for now. We need a name for this particular storage pool to use for the "X": we want to distinguish at least the following pools:

  • this one: mutable, exclusive to a single vat, indexed by short string or integer (which does not go through the c-list), contains (probably) capdata, syscalls perform add/write/update and read, may eventually store more structured data, add a list syscall with range queries or sort options to support e.g. finding the best matching offer among many
  • the blobstore: append-only, accessed by c-list -managed blobcaps, values are immutable large strings/bytes, shared among all vats, used as an alternative communication path for large data like vat/contract bundles, may have partial-range read functions. Operations include add and read, and maybe decref. Operations probably want to bypass the transcript (because they're large) and use vatPowers instead of syscalls.
  • others?

Simply value = syscall.read(key) and syscall.write(key, value) may be good for now.

I think the vat's data should be stored in the HostDB with keys like v$NN.data.$KEY.body. This reserves room for .slots to hold the capdata later.

Alternative Designs

secondary-storage device

If we first refined the device model (#55) to have distinct read and write calls, we might implement this ticket in terms of that model. However:

  • we'd still need to write the actual device
  • the device must record this state in the HostDB (with transaction/atomicity boundaries that match the rest of the kernel), which could be an awkward set of endowments to grant to the device
  • the device should be made available to many vats (I'd argue all of them), and users shouldn't have to update their bootstrap.js to distribute it to the vats which need it
  • expanding the functionality to include list, range queries, sort options, etc, might be accomplished by adding additional arguments to read (and changing the return value), or it might be better achieved by adding new syscalls
  • I'm pretty sure we need reads to observe preceeding writes. new device model: read/writeLater #55 suggests "writeLater", where writes aren't visible to read calls until after some outside-the-kernel "block" boundary (on the other hand I think new device model: read/writeLater #55 should really be read/write too)

I'm inclined to think it would be simple to add new syscalls, but I'm interested in other opinions. I believe @dtribble has suggested that a device is the obvious choice for this sort of functionality.

vatPowers instead of syscall

Using syscalls to fetch the numerous data means all of it will be recorded in the transcript. #451 is about removing data from RAM, and doesn't worry about how much is left on disk (or how much we're adding to disk to minimize our RAM footprint). If we wanted to try and minimize this too, we might have the secondary storage API go through vatPowers instead.

However, since vatPowers functions do not go through the transcript, we have no mechanism to make sure the vat sees the right values as the transcript is replayed (that's the whole point of the transcript). This could work for blobcaps, since their data is immutable, but this secondary storage is specifically for mutable data like Purse balances.

I think we're stuck with going through syscalls.

future: capdata

Our hunch is that the second use case (Zoe contracts tracking large numbers of offers/seats) will require the ability to store more than flat data in this table. We expect to store object references too (o-NN/etc), as well as the Representatives described in #455 (which means hierarchical o+NN.NN ids).

The syscalls would need to accept capdata ({ body, slots }, where body is a JSON-format string, and slots is an array of vat-side reference ids). The kernel would need to translate the slots through the vat's c-list just like it does with a syscall.send() call (for write), or dispatch.deliver() (for read). The liveslots-provided Container API's c.get() function would need to use the same marshal.unserialize as the rest of liveslots, and c.update would need to use marshal.serialize.

This marshal instance must know about the Container tables, so any Representatives are serialized properly. This will enable cross-table references (which would be "FOREIGN KEYs" in a SQL system).

I'm not sure whether this ought to include Promises, specifically resolved ones. When these appear in a syscall.send, the vat sends a new (unresolved) promise ID, then schedules a syscall.resolve* call to happen later during the same crank. The kernel does the same thing on the inbound side when it includes an unresolved promise in the arguments of a dispatch.deliver.

The problem here is that the secondary storage API is synchronous, so any promise that it retrieves from disk won't be resolved until later. Should the kernel watch for reads that mention promises in that state and inject resolution notifications for them to the back of the run-queue? Should the read API be enhanced to return capdata plus a list of resolutions for all the promises that happen to be in that state?

I think we need to learn more about the use-case for including Promises in these offline tables before we tackle the question of how to safely implement them. I'd recommend that the first pass use translators which reject promises, and only accept object IDs in the .slots list.

future: range queries, sort options, indices

Once #455 grows to include data like the offer book in a large exchange contract, we'll need more than a simple key-value store. At that point we'll need features like:

  • list the entire contents of the table
  • retrieve the subset of the table that meets some search criteria ("all open orders")
  • sort the results (eg by price)
  • limit the number of returned results
  • add an index to improve performance

I don't know what the API will need to look like yet.

Security Considerations

  • usage limits
    • limit number of keys, size of any individual value, aggregate size of all values.
    • should the vat be killed when it exceeds the limits? or merely have the syscall return an error?
    • budgets should eventually be managed like Meters, when spawning a new vat the parent can share some of its space budget with the child
  • if we're taking user-provided keys and interpolating them into KernelDB keys, we must be careful about format-confusion attacks, especially if we add enumeration with list
  • when we add list, we want to continue to enforce ocap discipline
    • calling list should not enable code to access an object that it couldn't have seen by other means
    • we'll need to define a "collection" object. Holding one grants access to all the Representatives that were previously added to the collection.
    • we don't expect to need rights-amplification patterns that involve two collections at the same time (no intersection queries)
    • we don't expect to need the collection objects themselves to be serializable or sharable. Objects in collection1 might reference objects in collection2, but they won't reference the collection2 object itself.
@warner
Copy link
Member Author

warner commented Oct 27, 2020

@FUDCo I think you can close this one, but maybe break out the index stuff to a new ticket.

@warner
Copy link
Member Author

warner commented Nov 10, 2020

The main syscalls were added in PR #1856, which didn't cite this ticket.

@warner
Copy link
Member Author

warner commented Nov 10, 2020

We moved the "virtual collections" aspect off to #2004, so we can close this now.

@warner warner closed this as completed Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

2 participants