Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(liveslots): collection deletion and retirement bugs #9961

Merged
merged 3 commits into from
Aug 31, 2024

Conversation

warner
Copy link
Member

@warner warner commented Aug 26, 2024

Liveslots was suffering from the following bugs:

  • vrefs weren't properly encoded into DB keys and "encodePassable"
    keys when deleting collections (such that isEncodedRemotable()
    always answered false, which skipped the refcount processing of
    keys, and removal of the "ordinal-assignment" record)

  • recognition records (vom.ir. keys) weren't created or removed for
    Remotable-style keys in weak collections

  • Presence-style vrefs were not submitted for retirement processing
    when deleting a weak collection

As a result:

  • collection.clear() left the "ordinal-assignment"` records in the
    vatstore, causing subsequent .has(oldkey) to incorrectly return
    true, and .init(oldkey, newvalue) to throw a "already registered"
    error, as well as consuming DB space longer than necessary.

  • Invoking the .clear() method on a virtual/durable collection, or
    dereferencing the collection itself (and allowing it to be garbage
    collected), did not free any Remotables, virtual/durable
    objects (Representatives), or imported Presences used as keys. This
    could cause data in other weak collections (or other vats) to be
    retained longer than necessary.

  • retiring (deleting) a Remotable used as a key in a weak
    virtual/durable collection did not free the corresponding value,
    causing data (in local and/or remote vats) to be retained longer
    than necessary

  • Allowing a weak virtual/durable collection to be garbage collected
    did not inform the kernel that the vat can no longer recognize the
    Presence-style keys, consuming c-list entries longer than necessary.

This commit fixes those bugs, which fixes the immediate cause of
issues #8756 , #7355 , and #9956 . As a change to liveslots, full
deployment requires both restarting the kernel with this new
code, and triggering a vat upgrade of all vats. Once that is done,
no new collection entries will suffer the problems listed above.

However, this commit makes no attempt to remediate any existing data
corruption. See #8759 for plans to build a tool that can audit the
virtual-data reference graph and detect the consequences of these bugs
in pre-existing kernel databases, and see the issues listed above for
notes on efforts to build remediation tools.

@warner warner added the liveslots requires vat-upgrade to deploy changes label Aug 26, 2024
@warner warner self-assigned this Aug 26, 2024
@warner warner force-pushed the warner/8756-collection-deletion-gc-bug branch from f21826c to a0d7218 Compare August 26, 2024 09:02
Copy link

cloudflare-workers-and-pages bot commented Aug 26, 2024

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: 685e629
Status: ✅  Deploy successful!
Preview URL: https://e1fa3ef8.agoric-sdk.pages.dev
Branch Preview URL: https://warner-8756-collection-delet.agoric-sdk.pages.dev

View logs

@warner warner force-pushed the warner/8756-collection-deletion-gc-bug branch 2 times, most recently from 114a64c to 72741b8 Compare August 28, 2024 16:29
@warner warner marked this pull request as ready for review August 28, 2024 16:30
@warner warner force-pushed the warner/8756-collection-deletion-gc-bug branch from 72741b8 to 108cd1c Compare August 28, 2024 16:44
@warner warner changed the title fix(liveslots): collection deletion didn't free key objects fix(liveslots): collection deletion and retirement bugs Aug 28, 2024
Copy link
Member

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent test coverage, as usual.

@erights notes that all three styles are "Remotables", so this PR should probably adopt new vocabulary for classifying vrefs (e.g., "presence" vs. "representative" vs. "remotable" "ephemeral"/"heap").

packages/swingset-liveslots/src/boyd-gc.js Outdated Show resolved Hide resolved
packages/swingset-liveslots/src/boyd-gc.js Outdated Show resolved Hide resolved
packages/swingset-liveslots/src/boyd-gc.js Outdated Show resolved Hide resolved
packages/swingset-liveslots/src/boyd-gc.js Outdated Show resolved Hide resolved
packages/swingset-liveslots/src/boyd-gc.js Outdated Show resolved Hide resolved
packages/swingset-liveslots/src/collectionManager.js Outdated Show resolved Hide resolved
packages/swingset-liveslots/src/collectionManager.js Outdated Show resolved Hide resolved
packages/swingset-liveslots/src/collectionManager.js Outdated Show resolved Hide resolved
packages/swingset-liveslots/src/collectionManager.js Outdated Show resolved Hide resolved
@warner warner changed the base branch from master to warner/9939-retire-without-drop August 29, 2024 05:35
@warner warner force-pushed the warner/9939-retire-without-drop branch from e489af7 to 568549b Compare August 29, 2024 20:52
@warner warner force-pushed the warner/8756-collection-deletion-gc-bug branch 2 times, most recently from 4179386 to 48d5da1 Compare August 30, 2024 00:56
@warner warner force-pushed the warner/9939-retire-without-drop branch 2 times, most recently from a47fa2c to c045163 Compare August 30, 2024 16:04
@warner warner force-pushed the warner/8756-collection-deletion-gc-bug branch from 48d5da1 to 2e9347f Compare August 30, 2024 16:04
mergify bot pushed a commit that referenced this pull request Aug 30, 2024
These tests will fail without the fixes in the next commit

One new test is disabled because of additional pending bugs that
interfere with the test setup (a combo of #9956 and #9959), which will
be re-enabled in PR #9961 (to address #8756 and others)
Base automatically changed from warner/9939-retire-without-drop to master August 30, 2024 16:43
Copy link

Base branch is changed to master. Please re-run the integration tests by adding 'force:integration' label.

Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still reviewing the tests, but I expect I won't have any blocking feedback on them.

Consider addressing the 2 following items:

  • avoid assert() style checks in potentially performance sensitive paths. There is a decent overhead compared to || Fail style.
  • Incidental: double check the entry count logic (and maybe add a test) for when the collection is cleared with a key pattern in clearInternal.

The rest is all docs nitpicks.

packages/swingset-liveslots/src/boyd-gc.js Outdated Show resolved Hide resolved
packages/swingset-liveslots/src/boyd-gc.js Outdated Show resolved Hide resolved
packages/swingset-liveslots/src/boyd-gc.js Outdated Show resolved Hide resolved
const dbEntryKey = dbKey.substring(dbKeyPrefix.length);
return decodeKey(dbEntryKey);
}

function dbKeyToEncodedKey(dbKey) {
assert(dbKey.startsWith(dbKeyPrefix), dbKey);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a case where the cond || Fail style is meaningfully more efficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That saves one function call, right? Because we aren't doing any string interpolation in the message argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah the most expensive part of assert was if there was no message argument provided. Anyway, saving a function call is good too, but maybe not as necessary.

packages/swingset-liveslots/src/virtualReferences.js Outdated Show resolved Hide resolved
@warner warner force-pushed the warner/8756-collection-deletion-gc-bug branch from 2e9347f to f6fd29b Compare August 31, 2024 00:08
@warner warner added the automerge:rebase Automatically rebase updates, then merge label Aug 31, 2024
This was broken because of #8756 . The modified tests are marked as
failing, until the fix is landed.
Liveslots was suffering from the following bugs:

* vrefs weren't properly encoded into DB keys and "encodePassable"
  keys when deleting collections (such that `isEncodedRemotable()`
  always answered false, which skipped the refcount processing of
  keys, and removal of the "ordinal-assignment" record)

* recognition records (vom.ir. keys) weren't created or removed for
  Remotable-style keys in weak collections

* Presence-style vrefs were not submitted for retirement processing
  when deleting a weak collection

As a result:

* `collection.clear()` left the "ordinal-assignment"` records in the
  vatstore, causing subsequent .has(oldkey) to incorrectly return
  true, and .init(oldkey, newvalue) to throw a "already registered"
  error, as well as consuming DB space longer than necessary.

* Invoking the `.clear()` method on a virtual/durable collection, or
  dereferencing the collection itself (and allowing it to be garbage
  collected), did not free any Remotables, virtual/durable
  objects (Representatives), or imported Presences used as keys. This
  could cause data in other weak collections (or other vats) to be
  retained longer than necessary.

* retiring (deleting) a Remotable used as a key in a weak
  virtual/durable collection did not free the corresponding value,
  causing data (in local and/or remote vats) to be retained longer
  than necessary

* Allowing a weak virtual/durable collection to be garbage collected
  did not inform the kernel that the vat can no longer recognize the
  Presence-style keys, consuming c-list entries longer than necessary.

This commit fixes those bugs, which fixes the immediate cause of:

* fixes #8756
* fixes #7355
* fixes #9956

As a change to liveslots, full deployment requires both restarting the
kernel with this new code, *and* triggering a vat upgrade of all
vats. Once that is done, no new collection entries will suffer the
problems listed above.

However, this commit makes no attempt to remediate any existing data
corruption. See #8759 for plans to build a tool that can audit the
virtual-data reference graph and detect the consequences of these bugs
in pre-existing kernel databases, and see the issues listed above for
notes on efforts to build remediation tools.

This commit marks the new tests as expected to pass again. It adds one
new (failing) test to demonstrate the lack of remediation code.

Thanks to @mhofman and @gibson042 for recommendations.
@warner warner force-pushed the warner/8756-collection-deletion-gc-bug branch from f6fd29b to 685e629 Compare August 31, 2024 00:38
@mergify mergify bot merged commit 813208a into master Aug 31, 2024
80 checks passed
@mergify mergify bot deleted the warner/8756-collection-deletion-gc-bug branch August 31, 2024 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge liveslots requires vat-upgrade to deploy changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants