Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

durable Kinds forget state property names across upgrade, are inconsistent when init() adds more #4935

Closed
warner opened this issue Mar 27, 2022 · 6 comments · Fixed by #4952
Assignees
Labels
bug Something isn't working SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Mar 27, 2022

I'm working on getting #1848 upgrade working, and I've got a test that creates a durable Kind (and handle) in v1, creates an exports an instance, gets upgraded, the v2 code re-defines that kind (using the handle from baggage), then tries to invoke a method on a durable object. The defineDurableKind part works ok, but when I try to invoke the method, it fails because the state object is empty: no property getters/setters.

I think there are some significant problems in defineKindInternal. There's a propertyNames Set that lives in RAM, scoped to defineKindInternal (so one per Kind). The Set is populated by makeNewInstance as it walks the properties of the initial state (as returned by the user-provided init function). It looks like this add happens for every call to init. That means the properties we track could grow over time, if the user's init function doesn't return a consistent set of properties.

When we make a Representative, makeRepresentative uses this propertyNames Set to create the wrappedState object: it makes one getter/setter pair per property name in propertyNames. That means the wrapper will grow new properties over time based upon the last init that took place, unrelated to the properties that were stored for that particular instance. That's inconsistent at best, probably crashy if some behavior function tries to get() a property that some other instance invented but which has no state on this instance (it'll try to m.unserialize(null), which isn't capdata), and feels like it might enable an ambient communication channel between unrelated instances (but one that probably needs the Kind's behavior functions to help, which doesn't count).

The problem that affects Durables is that when we start the vat back up in a new version, we need to be able to thaw out these instances in a heap that has never seen an init. We can't rely upon userspace to tell us what the property names are. Instead, we must either store the property names in the DB at some point, or we need to build getters/setters for each Representative based upon the state stored on that instance (which means we need to fetch the state during makeRepresentative, to at least read the .keys() on it).

Storing the property names in the DB makes the most sense, but it's a bit awkward that we don't learn what they are until the first init is called. But not impossibly awkward. The DB will have most of the description of the Kind as soon as defineKind is called, but the .statePropertyNames key will be empty until the first instance is created. Then the key will be populated, and all subsequent init calls are obligated to use the same names. We can hold the property names in RAM too, for performance during the remainder of that version.

When defineKindInternal is called, it needs to use the kindID to look up the stored list of property names. If present, it should be loaded into RAM, used to define the getters/setters, and imposed upon all init calls. If missing, the next init call should be used to populate it (and save to DB). In the missing case, we don't need the list for getters/setters because we know there aren't any instances yet.

One other option (I think we discussed this a while ago) might be to require userspace to declare the state property names they're going to use, when calling makeKindHandle. Then we could store the property names in the kindDescriptor DB key. That would be simple and elegant for the implementation, but maybe a bit clunky for userspace, which would see two lists of property names: one in makeKindHandle, the other in their init function.

OTOH, storing the list in the kindDescriptor would both force all instances to have the same schema, and preclude version-2 from augmenting the schema (which feels like a more significant restriction). I know @FUDCo has thought a lot about lazy/immediate DB upgrades (the usual "read v1, write v2" upgrade-when-touched approach), but we haven't really talked as a group about schema changes.

@FUDCo thoughts? @erights could we get away with requiring makeKindHandle(tag, [ ...statePropertyNames ])? I think I prefer the simplicity of it, and if we don't have an obvious way to handle new property names anyways, then we might has well be more strict now, and find a way to relax the rules later on.

@warner warner added bug Something isn't working SwingSet package: SwingSet labels Mar 27, 2022
@erights
Copy link
Member

erights commented Mar 27, 2022

I like putting it in the kind. Rather than a list, lets talk about using a copyRecord pattern as schema.

@warner
Copy link
Member Author

warner commented Mar 27, 2022

On further thought, the most uniform approach would be to supply the names in defineKind/defineDurableKind, so it's done the same way on both durable and merely-virtual Kinds (since merely-virtual Kinds don't have a KindHandle, so they're created with just one call, not two). That's not the most convenient to implement, which would be supplying it in makeKindHandle, so it can be stored in the Handle's DB key, at the one time we write to that key.

Supplying it in define(Durable)Kind means we need an answer for what happens if the stored list doesn't match the one supplied by version-2. We could require them to be the same: defineDurableKind throws in version-2 if you try to change the schema. Or we could use that as the tool to upgrade the schema: version-2 creates getters for everything in the new list, but if they see missing rawState properties then they fill in capdata(undefined) instead of throwing an error when it tries to m.unserialize(undefined).

Although.. if the list is always present in defineDurableKind, maybe we don't strictly need to store the names in the DB. Our requirement is that we know the list before we build the state object and it's getters/setters. We don't do that until we deserialize an instance of the kind. Those instances might appear in vatParameters (which must be deserialized before calling bringOutYourDead), and the baggage itself is a Kind instance (which has the same lifetime). Userspace version-2 cannot call defineDurableKind to reconnect the existing handles until it gets a handle out of baggage, which means the buildRootObject sequence must be:

  • deserialize vatParameters
  • liveslots calls buildRootObject
  • userspace buildRootObject extracts KindHandles from baggage
  • (userspace might try to pull other data from baggage, which causes deserialization)
  • userspace calls defineDurableKind(kindHandle, implementation..)
  • user-defined Kind instances can now be deserialized

So there's a clear contradiction: we need to deserialize vatParameters and other potentially-Kind-instances from baggage before userspace is even capable ot re-attaching the Kinds and thus enabling the deserialization of those instances. That tells me that v2 needs to get the property names from storage, not the v2 call to defineDurableKind, which means we need v1 to store it.

It also tells me that upgrading the schema is going to be tricky: we may already have deserialized instances by the time v2 can tell us that it wants to change the property lists.

We could conceivably disallow local durable objects from vatParameters, to delay the need for deserializaing Kind instances until after userspace is able to reattach their handlers. buildRootObject could still mess things up by pulling some durable-object -bearing property out of baggage too early, but at least they'd have a way to do it correctly.

@warner
Copy link
Member Author

warner commented Mar 27, 2022

I like putting it in the kind. Rather than a list, lets talk about using a copyRecord pattern as schema.

Aye. What does userspace need to write? I'm hoping there's a syntax which adds as few characters as possible to ['key1', 'key2'], but something like { key1: M.any, key2: M.any } would be almost as good (and getting to omit the quotes around names-as-strings would definitely make it easier to type).

To get myself unblocked (so I can give you a working vat upgrade sooner), I may start by adding a plain list of strings to makeKindHandle. If you wind up writing a function that takes a copyRecord pattern, asserts that the keys are a fixed set of strings (none missing, no extra), and emits a Set or list, that could drop in nicely.

@FUDCo
Copy link
Contributor

FUDCo commented Mar 28, 2022

The propertyNames Set was originally intended as a kind of backstop, but the pathologies you describe are very real. There's not actually a good reason for it to be there at all and we should just get rid of it; the wrapped data can be derived from the stored data directly without consulting the propertyNames set. We don't need to store a definition of the properties somewhere, though doing so might be good for various kinds of diagnostics and cross-checks. Having the shape of the state be part of what you specify in the parameters to defineKind has always seemed appealing, but I've been inhibited in moving in that direction by the fact what this would mean in practice is that 99% of the time you'd just be writing the init function twice (once to declare the properties and a second time to initialize them -- if you were to describe it with a copyRecord as @erights proposes that would just mean you write it twice in two different notations). I personally hate that kind of repetition and I think a lot of developers hate it even more than I do.

@warner
Copy link
Member Author

warner commented Mar 28, 2022

I guess the counterbalance would be: what can we do with that extra information, especially if we get it early (during definition, rather than waiting for the first instance to be created). The wouldn't-it-be-cool vision is a SQLite table definition, if that would give us some extra-efficient storage/indexing/lookup functionality. We could probably defer building the CREATE TABLE statement until we saw the first instance, but that'd be super awkward.

There's probably also an argument to be made about forcing userspace to be up-front about their data structures, and providing the schema on a definition call, rather than having us glean it from init, is a textbook example of explicit vs implicit typing. My hunch is that this gets to the core question of what kind of system we're building: database manipulation methods (explicit/statically-defined schema), or dynamic runtime-specified behavior (where every instance could have a different shape, no types, no rules, hey it's JavaScript go wild you crazy kids).

@mhofman
Copy link
Member

mhofman commented Mar 28, 2022

That means the wrapper will grow new properties over time based upon the last init that took place, unrelated to the properties that were stored for that particular instance.

Well that sounds like a leak which would allow to discover if your representative was created before or after another instance was inited ...

I would first like to clarify our intent, along the lines of @warner's question above: do we or don't we want to support init functions defining different state shapes for the same kind. Aka, should the state shape be per object or per kind. The former has implications on upgrade changing state shape since the current approach of getter/setter for the state does not allow for post init addition of state entries. This is however an artifact of implementation, and we could use a proxy object instead. Or we could introduce a migrate function if we wanted to only support a once per upgrade state shape change.

If all objects of a kind should have the same state shape, then the shape definition shouldn't be done through implicit extraction from a dynamic instance creation. I hear @FUDCo's concern about repetition, and I'm wondering if there'd be a way to build the init from an extended shape definition. Main problem I see is mapping args to state props, or defining default values.

warner added a commit that referenced this issue Mar 29, 2022
This is an aggressively-dynamic way to deal with virtual-object state
property names. It allows each instance to have an entirely different
set. I think we want something more static/pre-declared, but this
approach unblocks my work on vat-upgrade and doesn't require changing
any externally-visible API. We can replace it with a more static
approach later.

refs #4935
warner added a commit that referenced this issue Mar 29, 2022
This is an aggressively-dynamic way to deal with virtual-object state
property names. It allows each instance to have an entirely different
set. I think we want something more static/pre-declared, but this
approach unblocks my work on vat-upgrade and doesn't require changing
any externally-visible API. We can replace it with a more static
approach later.

refs #4935
FUDCo added a commit that referenced this issue Mar 30, 2022
FUDCo added a commit that referenced this issue Mar 31, 2022
@FUDCo FUDCo closed this as completed Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants