Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need test harness for virtual/durable object restoration #4997

Closed
warner opened this issue Apr 1, 2022 · 6 comments · Fixed by #5207
Closed

need test harness for virtual/durable object restoration #4997

warner opened this issue Apr 1, 2022 · 6 comments · Fixed by #5207
Assignees
Labels
enhancement New feature or request needs-design SwingSet package: SwingSet tooling repo-wide infrastructure

Comments

@warner
Copy link
Member

warner commented Apr 1, 2022

What is the Problem Being Solved?

@Chris-Hibbert pointed out that we need a good way to test that a userspace (contract) which uses virtual/durable Kinds will correctly reload those objects after getting paged out and paged back in (or after the entire vat gets upgraded).

Instances of a virtual/durable Kind are presented to JavaScript code by a "Representative", much like how a "Presence" object is the way JS interacts with a remote object in some other vat. Each Representative may or may not have its state data in RAM: there is an LRU cache which limits how many objects' data will be loaded at a time. When a method is invoked on a Representative, its state data is loaded (or moved to the front of the cache), and this might push some other object's data off the cache, which is what writes it to the DB. The dirty state records are also written to DB when a special "bringOutYourDead" delivery is made, which happens on a configurable schedule, somewhere between once after every normal delivery, once every N deliveries, and once at the end of each block.

The Representative itself is held in memory either by the userspace code that holds it, or by the LRU cache. Once it is not held by either, the next time GC happens (either "organically" because XS needs a new struct slot and the free list is empty, or when bringOutYourDead forces a GC sweep).

By design, normal vat userspace cannot reliably force an object out of memory, nor can it sense when one has been forced out. If you drop your RAM references to the Representative, then allocate or interact with a whole bunch of other objects, you might be able to both flood the LRU and trigger organic GC. The next time you try to get a Representative (e.g. reading it from a property of some other virtual object, fetching it from a collection, or receiving it in a message from a different vat), the manager will build a new one for you. This is the point at which the Kind's registered behavior record is used, to create a state object and a new batch of methods to wrap it, which collectively make up the Representative(s).

So the low-level requirement is for some sort of test harness to allow this "rebuild a Representative" process to be exercised.

I'm not yet sure how to do this. If virtual objects work correctly, there should be no way for userspace to mess up (if it could, userspace could probably cause a consensus failure or state divergence). So in once sense, userspace authors don't need to test this; swingset authors do.

There is a related higher-level requirement here: contract upgrade means vat upgrade, and the version-1 code must be designed to store enough data in durable objects/collections to enable version-2 to fulfill its obligations to the existing clients. That sort of "restore" is very much user-visible. We need a test harness for this too.

Description of the Design

Security Considerations

Test Plan

@warner warner added the enhancement New feature or request label Apr 1, 2022
@Chris-Hibbert
Copy link
Contributor

As a client of the virtual object system, what I care about is verifying that objects that get paged out and back in have all the necessary state. For this purpose, it doesn't matter wether the object was actually paged out due to inactivity. Can you provide a test harness that will take a virtual object as an argument, and give back another one that contains what it would hold if paged back in?

Is "getting a copy of the object in the same address space to examine" an incoherent concept?

@warner
Copy link
Member Author

warner commented Apr 2, 2022

No, no, that's a coherent concept. It's just something that liveslots tries to make impossible to observe.

Let's see, the API of this would need to be something like:

const c = makeScalarBigMapStore();
const root = Far('iface', {
  one: () => {
    c.init('key', makeFoo(args));
  },
 two: () => {
    const foo = c.get('key');
    testFoo();
  },
});

and then something to drive it that somehow guarantees that a full GC flush happens between calling one and two. This program must make extra sure to not keep the output of makeFoo around in between, otherwise you'll be getting the original Representative and not a rebuilt one.

Hm, so to make sure the test is actually testing the right thing, you'd like an extra flag or query or something to ask "how many times have you made a Representative for this current+live Representative"? If that reported only 1, your test is passing without really exercising the rebuild. Or a debug mode thing that said "give me a new Representative, I don't care if there's one out there already", but that's exactly what liveslots is trying very hard to prevent.

Let's see, a pair of debug tools like:

const c = makeScalarBigMapStore();
function populate() {
  c.init('key', makeFoo(args));
}

async function test() {
  populate();
  await TOOL.forceGC();
  const foo = c.get('key');
  assert(TOOL.getCreationCount(foo) > 1, 'test cannot test properly');
  testFoo();
}

We could add extra debug stuff to vatPowers but we'd need to guard it carefully, since it could be used to break consensus/determinism.

@FUDCo do you have any clever ideas on how to pull this off? I'm guessing that one of the fakeVirtualStuff helper functions might be the right way to go here, rather than something in a real vat.

@Tartuffo Tartuffo added tooling repo-wide infrastructure SwingSet package: SwingSet labels Apr 6, 2022
@Tartuffo
Copy link
Contributor

@FUDCo Sounds like you just volunteered for this, or at least to build some tools that will obviate the need to build a full-fledged version of this.

@warner
Copy link
Member Author

warner commented Apr 23, 2022

I had a thought, it's probably entirely subsumed by @FUDCo 's plans, but I'll mention it in case it inspires anything.

Restoration (and upgrade) is about dropping an old Representative and acquiring a new Representative (possibly with different behavior). The purely-restoration flow nominally looks like:

{
  const makeFoo = defineKind(behaviorVersion1..);
  const fooRepresentative1 = makeFoo(args);
  durableMap.init(key, fooRepresentative1);
}
// time passes, GC happens
{
  const fooRepresentative2 = durableMap.get(key);
  // use fooRepresentative2
}

and the upgrade flow looks like:

{
  const makeFoo = defineDurableKind(handle, behaviorVersion1..);
  const fooRepresentative1 = makeFoo(args);
  durableMap.init(key, fooRepresentative1);
}
// time passes, GC happens, vat upgrade happens
{
  const makeFooV2 = defineDurableKind(handle, behaviorVersion2..);
  const fooRepresentative2 = durableMap.get(key);
  // use fooRepresentative2
}

Now technically we could allow the second flow without a whole vat upgrade: defineDurableKind could mean "re-define", and could be allowed to happen multiple times within the lifetime of a single version of a vat. We don't do this for two (good) reasons:

  • 1: It would be less legible/auditable. Currently, to understand the behavior of a Representative, you must find the one Kind definition within the source code used for the current version of the vat. If we allowed redefinition, you'd need to know the history of this version, i.e. know about any additional calls to defineDurableKind, and that's a lot harder to inspect by hand.
  • 2: Our Representatives are hardened, so their methods are fixed, so re-defining the kind could not add new methods to an existing Representative (even if it were a Proxy). So if there were any outstanding Representatives, re-definition would leave us with a mix of old and new behaviors, which would be even less legible. The only way to avoid this confusion is to ensure there are no old Representatives left over.

The only time that we know there are zero Representatives, in a way that doesn't expose GC sensitivity, is immediately after upgrade, so that's currently the only time we allow Kind behavior to be "changed" (i.e. defined for the first time in a given version).

But unit tests don't need the same protections: we don't need to guard against GC sensitivity (they already have the full start compartment WeakRef/etc). And we could also allow vats to observe multiple Representatives of the same object (if they ask for it, with some special test-only function).

So we could provide a function in the test environment (never exposed to normal vats) that clears the tables which normally ensure the "zero or one Representatives at a time" rule, and then the next map.get(key) they do would get them a new Representative.

{
  const makeFoo = defineKind(behaviorVersion1..);
  const fooRepresentative1 = makeFoo(args);
  durableMap.init(key, fooRepresentative1);
}
// test case flushes VOM tables
testThing.clearVirtualObjects();
{
  const fooRepresentative2 = durableMap.get(key);
  // use fooRepresentative2, definitely distinct from fooRepresentative1
}

We could even allow them to re-define the behavior first, and then the new rep would behave the new way.

{
  const makeFoo = defineDurableKind(handle, behaviorVersion1..);
  const fooRepresentative1 = makeFoo(args);
  durableMap.init(key, fooRepresentative1);
}
// test case flushes VOM tables
testThing.clearVirtualObjects(); // also durables
{
  // we're still in the one version, now redefine the live Kind
  const makeFooV2 = defineDurableKind(handle, behaviorVersion2..);
  const fooRepresentative2 = durableMap.get(key);
  // use fooRepresentative2, has new behavior
}

It might be safer to rig all Representatives to self-destruct when this function is invoked, to help the authors of these tests discover mistakes, like continuing to use the old representative even though they thought they'd stopped. Maybe we make state throw when accessed if the disable switch is thrown, or we delete some important property on it. This might be too invasive (the code would be present but untriggered in normal vats too), but might make tests easier to debug.

@Chris-Hibbert
Copy link
Contributor

An insistence on having only one behavior present at a time for each kind seems inconsistent with best practices as I learned them at ${JOB-1} for managing version migration in a distributed system where you can't replace everything at once.

Each (formal or informal) API has a client and server end. It's often the case that the software at the two ends gets updated at different times, so one or the other has to be prepared to talk to at least two versions. (If you can't force upgrades, there may be more than two. You have to support at least two by the hypothesis that new software isn't available to both sides simultaneously.)

It is often the case that the server end is easier to upgrade, so I'll write as if the higher velocity happens there. What you end up having to do is writing software that attempts to migrate in parallel. Since they can't be released in parallel, the server side goes first, and adds new methods or new parameters, but continues to support all old calls. When the new client is released, it can call the new methods, or use new calling conventions. The server should expect to be speaking to a mix of old and new clients. Once you can be sure that no old clients remain, you can release another version of the server code that drops support for the now-unused old calls.

This approach worked when the client ran in a browser, but was developed by a team with a distinct release schedule and when the client was an app that had to be released through two different app stores.

This approach means that the client only needs to support one server version per release, but in practice, we found that we had version flags in the code on both sides because scheduling software releases is an imprecise art. If server-side development of feature A misses its deadline, but feature B is ready, and has been promised to someone, you really want to be able to release the client software on schedule but with the switch to B enabled and A disabled.

I suspect we can't expect to upgrade all infrastructure, contracts, and clients simultaneously, so I think we need to be prepared to live in a distributed version migration world.

@mhofman
Copy link
Member

mhofman commented Apr 23, 2022

I'm not sure I see where the inconsistency lies. Indeed the API exposed by the object must be stable enough that the consumer's expectations are not broken through an upgrade. However that doesn't prevent the object from upgrading its implementation and having only one copy of that implementation running at any given time. I don't think building a system where a consumer interacts with a specific implementation until it itself upgrades is scalable, or even wanted because the contract exporting objects needs the ability to fix bugs. That basically puts the burden on the object to satisfy the ancestor's API contract, possibly forever. There is also always to possibility of adding new objects, with a new API.

FUDCo added a commit that referenced this issue Apr 26, 2022
FUDCo added a commit that referenced this issue Apr 26, 2022
FUDCo added a commit that referenced this issue Apr 27, 2022
FUDCo added a commit that referenced this issue Apr 28, 2022
@mergify mergify bot closed this as completed in #5207 Apr 28, 2022
mergify bot pushed a commit that referenced this issue Apr 29, 2022
* feat: first pass at VO and DVO test harnesses

Closes #4997

* docs: add documentation

* test: a virtual object test for AMM pools.

see: #5207

Pretty rudimentary.  It just sees that the objects come back. The test
fails when I manually remove (tested) stuff from state.

There's one test that changed state is persisted: I update the
notifier, and verify that on revival, the count has increased.

A plausible next test would be to add liquidity, but that's a bunch
more work.

* chore: review-inspired cleanups.

Co-authored-by: Chip Morningstar <chip@fudco.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs-design SwingSet package: SwingSet tooling repo-wide infrastructure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants