Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comms/kernel GC cycle collection tricks: back-pointers #2870

Open
warner opened this issue Apr 13, 2021 · 4 comments
Open

comms/kernel GC cycle collection tricks: back-pointers #2870

warner opened this issue Apr 13, 2021 · 4 comments
Assignees
Labels
enhancement New feature or request needs-design performance Performance related issues SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Apr 13, 2021

What is the Problem Being Solved?

In both the kernel and the comms vat, we've got object- and promise- tables whose entries need to be deleted when they're no longer referenced. This will mirror the JS engine's GC approach, except all the objects involved are logical ones, referenced by data (krefs), so we can't benefit from the engine's support for collection actual Object objects.

Simple refcounting is pretty straightforward and effective unless it encounters cycles. Currently (until we implement #2069 auxdata), cycles will only happen when Promise p1 is resolved to data which references p2, and p2 is resolved to data referencing p1 (or a larger cycle). The comms vat does not have a run-queue or messages queued up on a target promise (all messages are pipelined immediately to the target's decider), so the only inbound references will be clist entries: other vats (or the kernel) who might still be able to reference the promise. Once resolved, we retire the promise-id as quickly as possible: the kernel can retire it immediately, but for other vats we must wait for our outbound resolution message to be acknowleged, which will take a round-trip time, so the clist's reference won't be deleted right away.

While alive, the cycle of mutually-referencing promises will have at least one inbound reference from a clist. When the last clist entry goes away, we'd like to detect and delete the cycle.

The standard way of noticing cycles is to perform a full mark-and-sweep operation, starting from the roots (all clist entries). I'm nervous about the performance consequences of walking through the entire comms clist and promise tables (as a side-effect of receiving a retirement ack, no less, so: jank). In addition, so far we've managed to avoid adding a secondary-storage API to enumerate DB keys, and both clists and the promise table are stored in secondary storage. So we're also disinclined to support such enumeration, just to keep the API simple.

Given such a constraint, @FUDCo and I were noodling about tricks we might still be able to use. I've been pondering back-pointers as a tool, and he suggested tracking inbound references from clist entries (as a simple counter) separately from the cross-promise references (recorded as a list of other promise-ids). The algorithm to apply when deleting a c-list entry would then look like:

  • decrement the c-list refcount of the target promise
  • if the resulting c-list refcount is nonzero, end
  • else: examine the cross-promise reference list. If zero, drop the target promise, recurse
  • else: perform a search:
    • crawl the transitive closure of inbound cross-promise pointers, starting at the target
    • for each node found, if the c-list refcount is non-zero, end: the cycle is still reachable
    • if the crawl terminates without finding an inbound c-list reference, the cycle is non-reachable
      • delete every node of the cycle, recurse on all of them

This will get more complicated when auxdata is introduced, because object table entries can also supply inbound references (iff we allow auxdata to include promises, I know @dtribble thinks that would be pretty handy, but I (and I think @erights) am nervous about the implementation). And promise resolution data can reference objects, whose auxdata can participate in the cycle. But, since c-list entries continue to be the only roots, I suspect we can apply this same trick: both
objects and promises have a 2-tuple of (clist refcount, object/promise inbound reference backpointer), and we apply the same algorithm but crawl the combined object/promise graph.

Now that I think about it, maybe we can apply the same trick to the kernel. The roots there will be the c-list entries, the run-queue, and the queued messages within each unresolved promise. Except for the last case, the algorithm should work just as effectively: we know GC cannot cause the run-queue messages to go away, so we treat them as roots. The troublesome spot is the queued messages, because if a promise becomes completely unreachable, it can never be resolved, which means the queued messages can never be delivered, which means we can delete them, which should release anything they're keeping alive. But.. ok, so as long as the decider of the promise remains alive (the deciding vat has not yet been terminated), those messages might become deliverable and need to be kept alive. But if the decider dies, we reject all those orphaned promises, which will drain the promise's queue because all their queued messages will be dropped and their result promises rejected. So there isn't really a case in which GC might help us delete an unresolved promise: the deciding vat is not allowed to disown the promise without resolving it first, and vat termination still provides some form of resolution.

So we can treat the queued messages as roots without loss of functionality/collectability. Yay.

Performance Considerations

Jank: high deviance in the time it takes to do an operation, because of normally-invisible platform activity not correlated to the operation occasionally taking significantly more time than usual.

Any refcounting system will experience increased GC time if a lot of objects can be collected. Imagine a 1000-entry table kept alive by a single inbound reference, and that last reference goes away. Or a 1000-entry linked list and the head pointer is deleted. Our cycle-following algorithm will also take a long time if there are large cycles that can be collected, or large cycles which are kept alive by a pointer that takes a long search to find (imagine a 1000-entry ring, kept alive by pointers to nodes 1 and 2, and we delete the pointer to node 1: we'll traverse a thousand backpointers before discovering the clist entry holding node 2 alive).

We'll certainly start by just tolerating this, but we should keep in mind the additional time it might take. This will be extra significant when we get metering/gas working: we don't want to penalize callers who enable GC (we should reward them instead), but if some calls take more time than others because GC is triggered, we need to consider how that should interact with intended limits on normal user code.

Security Considerations

We may need to treat cycle collection as a DoS vector, in which malicious code might provoke large cycles, with an aim to interfere with a victim message that happens to trigger collection.

Bugs in the GC implementation could cause objects to be dropped prematurely, or garbage to accumulate (maybe a DoS vector).

Consensus Considerations

The algorithm must obviously be deterministic, but this shouldn't be a problem.

@warner warner added enhancement New feature or request SwingSet package: SwingSet performance Performance related issues labels Apr 13, 2021
@warner
Copy link
Member Author

warner commented Apr 22, 2021

I tihnk @erights and @FUDCo said this is equivalent to the "Arturo-Behar Garbage Collector" invented(?)/used back at Electric Communities. Which fills me with joy, both for following in the footsteps of giants and providing evidence that it might be correct. If there's any literature on that technique from, could y'all add a pointer?

@FUDCo
Copy link
Contributor

FUDCo commented Apr 22, 2021

His name is Arturo Bejar. See https://patents.google.com/patent/US5991779A/en

@FUDCo
Copy link
Contributor

FUDCo commented Apr 22, 2021

@FUDCo
Copy link
Contributor

FUDCo commented Feb 3, 2022

This ties back to #1126. I'm not sure it qualifies as a task unto itself but rather it's more of a commentary on a possible strategy for approaching the cyclic GC problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs-design performance Performance related issues SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

3 participants