Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confusing "vat terminated" error for messages sent to upgrade-abandoned objects #9582

Open
warner opened this issue Jun 26, 2024 · 1 comment
Labels
bug Something isn't working SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Jun 26, 2024

@mhofman and @Chris-Hibbert ran into some confusing error messages today:

  • vatA exported some non-durable object to vatB
  • vatA was then upgraded, so the object was abandoned
  • later, vatB sent a message to that abandoned object
  • vatB's result promise is rejected with a "vat terminated" Error

.. when in fact, the vat is still running, it's merely been upgraded.

This is especially confusing because if vatB sent its message before the upgrade (and if vatA didn't return an answer right away, so the result promise was still pending at the time of upgrade), it would have received a Disconnection rejection instead, which mentions "vat upgraded" in one of the properties. So there's a bit of a race between the message being sent and the vat being upgraded.

It would be great if both cases resulted in the same rejection.

The core problem is that abandoned objects don't record any history or reason for their abandonment. We can tell that an object is abandoned because it lacks an owner: the koNN.owner kvStore entry is missing (where previously it had the vNN VatID of its owner). Terminating a vat abandons all of its objects; upgrading a vat abandons its non-durable objects, but since both are indicated by deleting a field, there's no place left to remember the reason.

One approach might be to add a field to mark an object as abandoned: keep the VatID but check for koNN.abandoned. That would hurt perf in the common case, though.

Another approach could be to add a koNN.rejection field, and make it exclusive with the .owner field. This would contain a string, or maybe a complete capdata, which would be used to build (or as) the Error response with which we splat messages to the orphan.

@mhofman and @gibson042 talked about an Error instance cause property since 2021, which ties into our discussions about error logging/collecting and deep stacks. We can imagine using it to indicate that an Error is because the target vat was upgraded (or terminated), and changing our Disconnection-object protocol to look for cause instead (or as well, to maintain backwards compatibility). @mhofman proposed giving each abandoned object a distinct reason (an arbitrary number), and adding it to a separate table which could map it back to a VatID plus upgradeID/incarnation-number, so developers/diagnostics/console-logs could show more information.

Some constraints we want to maintain:

  • we don't want to expose VatIDs to userspace, to avoid revealing how many vats have been created (which would reveal execution pathways)
  • nor do we want to reveal object IDs
  • if we used a different cause for each object, ideally we'd generate them randomly, or hash a private seed plus a counter
    • following the same logic, we should do the same when serializing Error objects, but currently we just use a per-vat counter, but-but we currently don't expose this counter to the receiving userspace (just the receiving liveslots), but-but-but we would expose this to a comms-protocol-attached receiving kernel, and ideally we wouldn't rely upon trusted recipient layers to keep it away from userspace, which we could achieve by having them be generated randomly or pseudo-randomly in the first place
  • if Errors are assigned numbers, I have to ask about how persistent those assignments are, do they round-trip, who might try to EQ two of them, and I really want to avoid more GC code, so I'm interested in them being as pass-by-copy as possible
@warner warner added bug Something isn't working SwingSet package: SwingSet labels Jun 26, 2024
@mhofman
Copy link
Member

mhofman commented Jun 26, 2024

@mhofman proposed giving each abandoned object a distinct reason (an arbitrary number), and adding it to a separate table which could map it back to a VatID plus upgradeID/incarnation-number, so developers/diagnostics/console-logs could show more information.

To clarify, if we're minting different error rejections for each abandoned object, we can rely on the built-in error id mechanism. The nested reason could remain shared copy data for that upgrade.

Orthogonally we should still consider exposing a unique number for the upgrade in the nested upgrade reason as technically 2 vats on the path of a call could get upgraded one after the other with the same "incarnation number", resulting in the same looking rejection which would trigger the bailout logic for "same upgrade rejection". So it sound like we need a unique non-revealing number associated with each upgrade as well, but that needs to be revealed and usable by contract code unlike error ids.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

2 participants