-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deleting a virtual weakmap fails to retire imported vrefs used as keys #7355
Comments
I've added a failing test in |
If this bug is present, and the vat fails to send its Given that, I think we can defer fixing this for a while. The missing |
Liveslots was suffering from the following bugs: * vrefs weren't properly encoded into DB keys and "encodePassable" keys when deleting collections (such that `isEncodedRemotable()` always answered false) * recognition records (vom.ir. keys) weren't created or removed for Remotable-style keys in weak collections * Remotable-style vrefs were not submitted for retirement processing when deleting a weak collection As a result: * Invoking the `.clear()` method on a virtual/durable collection, or dereferencing the collection itself (and allowing it to be garbage collected), did not free any Remotables, virtual/durable objects (Representatives), or imported Presences used as keys. This could cause data in other weak collections (or other vats) to be retained longer than necessary. * Allowing a weak virtual/durable collection to be garbage collected did not inform the kernel that the vat can no longer recognize the Remotable/Representative/Presence-style keys, consuming c-list entries longer than necessary. * `collection.clear()` left the "ordinal-assignment"` records in the vatstore, causing subsequent .has() to incorrectly return true, and .init() to throw a "already registered" error, as well as consuming DB space longer than necessary. * retiring (deleting) a Remotable used as a key in a weak virtual/durable collection did not free the corresponding value, causing data (in local and/or remote vats) to be retained longer than necessary This commit fixes those bugs, which fixes the immediate cause of issues #8756 , #7355 , and #9956 . As a change to liveslots, full deployment requires both restarting the kernel with this new code, *and* triggering a vat upgrade of all vats. Once that is done, no new collection entries will suffer the problems listed above. However, this commit makes no attempt to remediate any existing data corruption. See #8759 for plans to build a tool that can audit the virtual-data reference graph and detect the consequences of these bugs in pre-existing kernel databases, and see the issues listed above for notes on efforts to build remediation tools.
Liveslots was suffering from the following bugs: * vrefs weren't properly encoded into DB keys and "encodePassable" keys when deleting collections (such that `isEncodedRemotable()` always answered false, which skipped the refcount processing of keys, and removal of the "ordinal-assignment" record) * recognition records (vom.ir. keys) weren't created or removed for Remotable-style keys in weak collections * Presence-style vrefs were not submitted for retirement processing when deleting a weak collection As a result: * `collection.clear()` left the "ordinal-assignment"` records in the vatstore, causing subsequent .has(oldkey) to incorrectly return true, and .init(oldkey, newvalue) to throw a "already registered" error, as well as consuming DB space longer than necessary. * Invoking the `.clear()` method on a virtual/durable collection, or dereferencing the collection itself (and allowing it to be garbage collected), did not free any Remotables, virtual/durable objects (Representatives), or imported Presences used as keys. This could cause data in other weak collections (or other vats) to be retained longer than necessary. * retiring (deleting) a Remotable used as a key in a weak virtual/durable collection did not free the corresponding value, causing data (in local and/or remote vats) to be retained longer than necessary * Allowing a weak virtual/durable collection to be garbage collected did not inform the kernel that the vat can no longer recognize the Presence-style keys, consuming c-list entries longer than necessary. This commit fixes those bugs, which fixes the immediate cause of issues #8756 , #7355 , and #9956 . As a change to liveslots, full deployment requires both restarting the kernel with this new code, *and* triggering a vat upgrade of all vats. Once that is done, no new collection entries will suffer the problems listed above. However, this commit makes no attempt to remediate any existing data corruption. See #8759 for plans to build a tool that can audit the virtual-data reference graph and detect the consequences of these bugs in pre-existing kernel databases, and see the issues listed above for notes on efforts to build remediation tools.
PR #9961 will fix this problem. I'll close this ticket once that PR lands. The PR doesn't attempt to remediate any pre-existing consequences of this bug. Imported ("Presence-style") vrefs that were not retired will still not be retired. Fortunately the consequences of this are small: the kernel will maintain a "state: RECOGNIZABLE" c-list entry for the importing/recognizing vat (two kvStore keys), which could otherwise be dropped. If this vat was the only one still recognizing the object, the kernel will also maintain an object-table record for it (another two kvStore keys), and will maintain a c-list entry for the exporting vat (yet another two keys). This bookkeeping data is the only unnecessary storage consumption: no userspace-visible data will be retained by the RECOGNIZABLE state. The #8759 tool (once it exists) will let you audit an existing kernel DB for mismatches between c-list and refcount, which may help determine the scale of the problem. Remediation from within a vat seems unlikely to be cheap enough to perform (it amounts to a mark-and-sweep refcount audit, plus a new vat-kernel API to learn the import/c-list status of a vref). But we're considering an approach that would allow an external party to submit alleged refcount fixes, to notify a vat of a subset of the vrefs that are worthy of analysis and updates. That would allow the expensive work to be done off-chain, and the corrections submitted as transactions, and safely verified before applying the recommended changes. Also note that once a vat is deleted, its state will no longer be corrupted. Replacing a vat with a new clean one is another form of remediation. |
Notes for the DB scanner tool:
|
Liveslots was suffering from the following bugs: * vrefs weren't properly encoded into DB keys and "encodePassable" keys when deleting collections (such that `isEncodedRemotable()` always answered false, which skipped the refcount processing of keys, and removal of the "ordinal-assignment" record) * recognition records (vom.ir. keys) weren't created or removed for Remotable-style keys in weak collections * Presence-style vrefs were not submitted for retirement processing when deleting a weak collection As a result: * `collection.clear()` left the "ordinal-assignment"` records in the vatstore, causing subsequent .has(oldkey) to incorrectly return true, and .init(oldkey, newvalue) to throw a "already registered" error, as well as consuming DB space longer than necessary. * Invoking the `.clear()` method on a virtual/durable collection, or dereferencing the collection itself (and allowing it to be garbage collected), did not free any Remotables, virtual/durable objects (Representatives), or imported Presences used as keys. This could cause data in other weak collections (or other vats) to be retained longer than necessary. * retiring (deleting) a Remotable used as a key in a weak virtual/durable collection did not free the corresponding value, causing data (in local and/or remote vats) to be retained longer than necessary * Allowing a weak virtual/durable collection to be garbage collected did not inform the kernel that the vat can no longer recognize the Presence-style keys, consuming c-list entries longer than necessary. This commit fixes those bugs, which fixes the immediate cause of: * fixes #8756 * fixes #7355 * fixes #9956 As a change to liveslots, full deployment requires both restarting the kernel with this new code, *and* triggering a vat upgrade of all vats. Once that is done, no new collection entries will suffer the problems listed above. However, this commit makes no attempt to remediate any existing data corruption. See #8759 for plans to build a tool that can audit the virtual-data reference graph and detect the consequences of these bugs in pre-existing kernel databases, and see the issues listed above for notes on efforts to build remediation tools. This commit marks the new tests as expected to pass again. It adds one new (failing) test to demonstrate the lack of remediation code. Thanks to @gibson042 for recommendations.
Liveslots was suffering from the following bugs: * vrefs weren't properly encoded into DB keys and "encodePassable" keys when deleting collections (such that `isEncodedRemotable()` always answered false, which skipped the refcount processing of keys, and removal of the "ordinal-assignment" record) * recognition records (vom.ir. keys) weren't created or removed for Remotable-style keys in weak collections * Presence-style vrefs were not submitted for retirement processing when deleting a weak collection As a result: * `collection.clear()` left the "ordinal-assignment"` records in the vatstore, causing subsequent .has(oldkey) to incorrectly return true, and .init(oldkey, newvalue) to throw a "already registered" error, as well as consuming DB space longer than necessary. * Invoking the `.clear()` method on a virtual/durable collection, or dereferencing the collection itself (and allowing it to be garbage collected), did not free any Remotables, virtual/durable objects (Representatives), or imported Presences used as keys. This could cause data in other weak collections (or other vats) to be retained longer than necessary. * retiring (deleting) a Remotable used as a key in a weak virtual/durable collection did not free the corresponding value, causing data (in local and/or remote vats) to be retained longer than necessary * Allowing a weak virtual/durable collection to be garbage collected did not inform the kernel that the vat can no longer recognize the Presence-style keys, consuming c-list entries longer than necessary. This commit fixes those bugs, which fixes the immediate cause of: * fixes #8756 * fixes #7355 * fixes #9956 As a change to liveslots, full deployment requires both restarting the kernel with this new code, *and* triggering a vat upgrade of all vats. Once that is done, no new collection entries will suffer the problems listed above. However, this commit makes no attempt to remediate any existing data corruption. See #8759 for plans to build a tool that can audit the virtual-data reference graph and detect the consequences of these bugs in pre-existing kernel databases, and see the issues listed above for notes on efforts to build remediation tools. This commit marks the new tests as expected to pass again. It adds one new (failing) test to demonstrate the lack of remediation code. Thanks to @gibson042 for recommendations.
Liveslots was suffering from the following bugs: * vrefs weren't properly encoded into DB keys and "encodePassable" keys when deleting collections (such that `isEncodedRemotable()` always answered false, which skipped the refcount processing of keys, and removal of the "ordinal-assignment" record) * recognition records (vom.ir. keys) weren't created or removed for Remotable-style keys in weak collections * Presence-style vrefs were not submitted for retirement processing when deleting a weak collection As a result: * `collection.clear()` left the "ordinal-assignment"` records in the vatstore, causing subsequent .has(oldkey) to incorrectly return true, and .init(oldkey, newvalue) to throw a "already registered" error, as well as consuming DB space longer than necessary. * Invoking the `.clear()` method on a virtual/durable collection, or dereferencing the collection itself (and allowing it to be garbage collected), did not free any Remotables, virtual/durable objects (Representatives), or imported Presences used as keys. This could cause data in other weak collections (or other vats) to be retained longer than necessary. * retiring (deleting) a Remotable used as a key in a weak virtual/durable collection did not free the corresponding value, causing data (in local and/or remote vats) to be retained longer than necessary * Allowing a weak virtual/durable collection to be garbage collected did not inform the kernel that the vat can no longer recognize the Presence-style keys, consuming c-list entries longer than necessary. This commit fixes those bugs, which fixes the immediate cause of: * fixes #8756 * fixes #7355 * fixes #9956 As a change to liveslots, full deployment requires both restarting the kernel with this new code, *and* triggering a vat upgrade of all vats. Once that is done, no new collection entries will suffer the problems listed above. However, this commit makes no attempt to remediate any existing data corruption. See #8759 for plans to build a tool that can audit the virtual-data reference graph and detect the consequences of these bugs in pre-existing kernel databases, and see the issues listed above for notes on efforts to build remediation tools. This commit marks the new tests as expected to pass again. It adds one new (failing) test to demonstrate the lack of remediation code. Thanks to @gibson042 for recommendations.
Liveslots was suffering from the following bugs: * vrefs weren't properly encoded into DB keys and "encodePassable" keys when deleting collections (such that `isEncodedRemotable()` always answered false, which skipped the refcount processing of keys, and removal of the "ordinal-assignment" record) * recognition records (vom.ir. keys) weren't created or removed for Remotable-style keys in weak collections * Presence-style vrefs were not submitted for retirement processing when deleting a weak collection As a result: * `collection.clear()` left the "ordinal-assignment"` records in the vatstore, causing subsequent .has(oldkey) to incorrectly return true, and .init(oldkey, newvalue) to throw a "already registered" error, as well as consuming DB space longer than necessary. * Invoking the `.clear()` method on a virtual/durable collection, or dereferencing the collection itself (and allowing it to be garbage collected), did not free any Remotables, virtual/durable objects (Representatives), or imported Presences used as keys. This could cause data in other weak collections (or other vats) to be retained longer than necessary. * retiring (deleting) a Remotable used as a key in a weak virtual/durable collection did not free the corresponding value, causing data (in local and/or remote vats) to be retained longer than necessary * Allowing a weak virtual/durable collection to be garbage collected did not inform the kernel that the vat can no longer recognize the Presence-style keys, consuming c-list entries longer than necessary. This commit fixes those bugs, which fixes the immediate cause of: * fixes #8756 * fixes #7355 * fixes #9956 As a change to liveslots, full deployment requires both restarting the kernel with this new code, *and* triggering a vat upgrade of all vats. Once that is done, no new collection entries will suffer the problems listed above. However, this commit makes no attempt to remediate any existing data corruption. See #8759 for plans to build a tool that can audit the virtual-data reference graph and detect the consequences of these bugs in pre-existing kernel databases, and see the issues listed above for notes on efforts to build remediation tools. This commit marks the new tests as expected to pass again. It adds one new (failing) test to demonstrate the lack of remediation code. Thanks to @mhofman and @gibson042 for recommendations.
Liveslots was suffering from the following bugs: * vrefs weren't properly encoded into DB keys and "encodePassable" keys when deleting collections (such that `isEncodedRemotable()` always answered false, which skipped the refcount processing of keys, and removal of the "ordinal-assignment" record) * recognition records (vom.ir. keys) weren't created or removed for Remotable-style keys in weak collections * Presence-style vrefs were not submitted for retirement processing when deleting a weak collection As a result: * `collection.clear()` left the "ordinal-assignment"` records in the vatstore, causing subsequent .has(oldkey) to incorrectly return true, and .init(oldkey, newvalue) to throw a "already registered" error, as well as consuming DB space longer than necessary. * Invoking the `.clear()` method on a virtual/durable collection, or dereferencing the collection itself (and allowing it to be garbage collected), did not free any Remotables, virtual/durable objects (Representatives), or imported Presences used as keys. This could cause data in other weak collections (or other vats) to be retained longer than necessary. * retiring (deleting) a Remotable used as a key in a weak virtual/durable collection did not free the corresponding value, causing data (in local and/or remote vats) to be retained longer than necessary * Allowing a weak virtual/durable collection to be garbage collected did not inform the kernel that the vat can no longer recognize the Presence-style keys, consuming c-list entries longer than necessary. This commit fixes those bugs, which fixes the immediate cause of issues #8756 , #7355 , and #9956 . As a change to liveslots, full deployment requires both restarting the kernel with this new code, *and* triggering a vat upgrade of all vats. Once that is done, no new collection entries will suffer the problems listed above. However, this commit makes no attempt to remediate any existing data corruption. See #8759 for plans to build a tool that can audit the virtual-data reference graph and detect the consequences of these bugs in pre-existing kernel databases, and see the issues listed above for notes on efforts to build remediation tools.
Describe the bug
While investigating #7353 I realized that we have another problem in the liveslots GC code. The setup is:
makeScalarBigWeakMapStore()
to construct a virtual WeakMapA
B
as a key, with some arbitrary valueA
is said to "recognize"B
, i.e.A
is a member of the recognizer set forB
vom.ir.${vrefB}|${collectionID}
vatstore key to remember the recognition referenceB
from RAM, and do a BOYDsyscall.dropImports
, because it no longer has a strong reference toB
anywhere (no RAM pillar, it's an import so there's no export pillar, and it wasn't stored strongly in vdata)dispatch.retireImports
inA
continues to recognizeB
A
Representative from RAM, and do a BOYDB
, it removes the recognition referenceRemoving the last recognition reference should emit a
syscall.retireImports
for B's vref.The bug is that we don't send that
retireImports
.Liveslots has a
possiblyRetiredSet
, with anaddToPossiblyRetiredSet()
function to add vrefs to it. This is called from only one place,removeRecognizableVref()
, which only uses it on imported Presences and local virtual/durable Representatives, and only if the recognizer is a VirtualObjectAwareWeakMap/Set (i.e.! recognizerIsVirtual
):scanForDeadObjects()
uses this set, with some checks againstgetValForSlot
,deadSet
, andvrm.isVrefRecognizable()
, to decide whether these vrefs should be retired or not. This clause only appears to act upon imported Presence vrefs.A later clause offers a second path into
importsToRetire
: if a Presence is indeadSet
(it's RAM pillar has dropped and stayed down), and if the vref is not reachable by vdata, and it is also not recognizable by vdata, then the vref will be retired:This second clause provides a fast path for the most common case: the imported Presence was used in neither Maps nor WeakMaps: it only had a RAM pillar, so our vat's ability to recognize it disappears along with the Presence, and we should emit both
dropImports
andretireImports
in the same BOYD. It also covers another common case, where it was used in only a WeakMap, so losing the RAM pillar mandates adropImport
, but we continue to recognize it, and wait for the exporting vat to retire it with adispatch.retireImports
.The case that's missing is when the recognizability is lost because our recognizer is deleted, after the Presence has gone.
Fixes
I'm still trying to think about the right way to fix this. I'm pretty sure that
removeRecognizableVref()
should be callingaddToPossiblyRetiredSet()
for both therecognizerIsVirtual
and! recognizerIsVirtual
cases: when we delete thevom.ir.
key, we should be scheduling a check at the next BOYD, in case that was the only remaining key. And thenscanForDeadObjects
should be doing an independentvrm.isVrefRecognizable()
check on everything inpossibleRetiredSet
.What I'm not yet clear on is how the first
scanForDeadObjects
clause should be changed."reachable" and "recognizable" are two of three possible states: it's really one of "no reference", "recognizable but not reachable", and "both recognizable and reachable". So we've got a reference graph whose edges come in two colors, or we have one (smaller) reference graph overlaid on top of a second, depending on your way of thinking.
Virtual objects have three possible pillars (RAM/Representative, export, vdata), of which both export and vdata come in both reachable and recognizable flavors (the JS engine handles reachability of Representatives, and we've commandeered everything userspace might get that could let them express mere recognizability). Imported Presences lack the export pillar, but the export still has a "not retired yet" attribute that we must pay attention to. Local/precious/ephemeral Remotables are held strongly when a virtual collection is holding their vref, so they generally aren't tracked with the notion of pillars.
So we kind of have two separate state machines, one for virtual objects (where a Representative is the RAM pillar), and a second for imports (where a Presence serves that role). But we'd like to share as much of their mechanics as we can, like
possibleRetiredSet
.The text was updated successfully, but these errors were encountered: