count + remediate the non-retired abandoned unreachable objects from 7212 on mainnet #9896
Labels
enhancement
New feature or request
performance
Performance related issues
SwingSet
package: SwingSet
Issue #7212 identified a kernel GC bug which would cause some objects to be retained longer than they needed to be. The problem is fixed by PR #8695, however our mainnet kernel might have experienced the bug before the fix was/will-be deployed.
The first task is to write a tool which takes a copy of the mainnet swingstore DB (the
swingstore.sqlite
file, perhaps pruned of old transcript spans first), and finds all the kernel objects in this state, and prints a count and a list of krefs:That tool should follow the plan described in #7212 (comment) (which claims that I've already written this tool, so the task might just be to find where I left it, polish it enough to commit, and then make a PR to add it to
misc-tools/
).The second task is to remediate the problem on mainnet. As of last december we only had one leaked object, so if the fix is deployed soon, we might not have a lot of remediation to do. But if we're not so lucky, we'll need to decide on a good way to clean up the leftover objects. The 7212 comment above describes a potential (expensive) algorithm. A later comment describes an offline-selection online-verification approach that could be a lot cheaper, but would require some new API surface to submit the claimed krefs.
This ticket can be closed when our mainnet kernel no longer has any krefs in need of remediation. It will be low priority until/unless the count of such krefs grows large enough that we'd really like to reclaim that space.
The text was updated successfully, but these errors were encountered: