-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(swingset): allow slow deletion of terminated vats #9227
Conversation
7841696
to
55452ec
Compare
Deploying agoric-sdk with Cloudflare Pages
|
5d23745
to
b3beede
Compare
b3beede
to
701a1a2
Compare
62aa511
to
0fe9f39
Compare
701a1a2
to
c3299e5
Compare
0fe9f39
to
967e458
Compare
Note: in addition to having the kernel spread c-list deletion processing over time (to spread out the GC consequences in other vats), I had to change the swing-store to let the kernel spread transcript/snapshot deletion over time (to limit the size of the DB txn). The swingstore work is in the first commit of this PR, the kernel side is in the second. The swingstore needs to maintain the invariant that exports and imports still work. I arranged it so that transcript spans are deleted starting at the highest startPos ( The snapshots are still deleted oldest-first ( The resulting data-deletion and export-size profiles, starting from the block where the vat is terminated, will look like:
|
c3299e5
to
a31549a
Compare
967e458
to
402811a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preliminary review of the first commit introducing budgeted deletion.
I think we should have a first commit changing the semantics of termination to set inUse
/isCurrent
to null
for the active snapshot/span, and assert in the deletion function that there is no active snapshot/span before proceeding. Then a second commit can introduce an optional budgeted deletion, which I believe it should do in a consistent order (either old to new, or opposite, but not mix and match).
It would also avoid unnecessarily exporting snapshot/transcript span artifacts while their slow deletion is in progress (since the kv entries are processed first).
// Unlike transcripts, here we delete the oldest snapshots first, | ||
// to simplify the logic: we delete the only inUse=1 snapshot | ||
// last, and then immediately delete the .current record, at which | ||
// point we're done. This has a side-effect of keeping the unused | ||
// snapshot in the export artifacts longer, but it doesn't seem | ||
// worth fixing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find a little weird to do things in a different order. It also causes the metadata entries between transcript and snapshot to be inconsistent between each other.
* | ||
* @param {string} vatID | ||
* @param {number} budget | ||
* @returns {{ done: boolean, cleanups: number }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of interface really feels like a generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, but I didn't find a way to take advantage of that fact.
An actual function*
generator wouldn't work, of course, because the process can be killed and a new process started while the deletion is going on, and a real generator would lose state when the application is rebooted.
And, I think changing the function signature to match that of a normal function*
generator is only an improvement if the caller gets to use for..of syntax, but as long as the snapStore function is doing internal iteration (deleting more than one thing per call), the vatKeeper.js deleteSnapshotsAndTranscripts()
caller is only going to call it once per block (per terminated vat), so there's no good place for a for..of
loop. (the real loop is higher up, with one iteration per block).
To get one, we'd need to change snapStore's deleteSomeVatSnapshots
into maybeDeleteOneVatSnapshot
, to delete at most one per call, and then have vatKeeper's deleteSnapshotsAndTranscripts()
use a for..of
loop. We'd still need to return whether a cleanup was done or not, and have the caller accumulate them, so deleteSnapshotsAndTranscripts
knows when to switch from snapshots to transcripts. maybeDeleteOneVatSnapshot
would always make one DB query (with a LIMIT 1) to get which snapshot to delete, if any. Then it either returns, or does a second DB query to delete the one row, and a third to noteExport
the deletion, making the cost 3 small DB queries until all the snapshots are gone, then 1 small DB query each block until all the transcripts are gone (since we always check for remaining snapshots on the way to checking for transcript spans).
That's compared to the current cost (with a budget of 5) of one moderate-sized query every time (using LIMIT 5, returning anywhere from 0 to 5 rows), followed by 0 to 5 noteExports
, maybe followed by a single DELETE
query removing 1 to 5 rows at once.
And we'd need snapStore to expose maybeDeleteOneVatSnapshot
separately from deleteVatSnapshots
(unlimited), so the latter could to queries without LIMIT
constraints, and delete everything in one shot.
In general, it de-amortizes the DB queries, because to make use of the iterator, we have to move responsibility for doing more than one deletion (per block) up into vatKeeper, which then can't give a hint to swingstore about how many deletions are coming up, so it could query them in a batch.
// if you didn't set a budget, you won't be counting deletions | ||
return { done: true, cleanups: 0 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could fairly easily return deletions.length
from deleteAllVatSnapshots
to be consistent
// if we reach here, the last sqlDeleteOneVatSnapshot() in that | ||
// loop had deleted the inUse=1 snapshot and the corresponding | ||
// snapshotMetadataKey, so now it is time to delete the .current | ||
// record and inform the kernel that we're done | ||
noteExport(currentSnapshotMetadataKey({ vatID }), undefined); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm wondering if:
- vat termination should not set the
inUse
snapshot tonull
and remove the.current
snapshot marker - assert when we call
deleteAllVatSnapshots
that there are noinUse
snapshots for the vat
// isCurrent=1 span first, which causes export to ignore the | ||
// entire vat (good, since it's deleted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify? That does not sound good. An export / import in the middle of a slow prune must reconstitute the partially deleted swing-store so that the slow deletion can continue in consensus on that restored node.
I think we need to be careful differentiating items and spans metadata here.
My understanding is that the isCurrent
only impacts the artifacts yielded during export, and the completeness checks of items during import. Yielding no artifacts and skipping checks is indeed consistent and the right behavior, and since the metadata is always restored, the pruning behavior will be the same on restore.
That said, I am uneasy to rely on the deletion operation to impact the completeness checks. Imagine we switched things around and started deleting from the oldest span. The operational check would fail. I believe that vat termination should explicitly "close" the span (set isCurrent = null
), and only allow deletion of transcripts for which there is no current span. In the future this could be modified to slowly delete transcripts of old incarnations by just adding a constraint on incarnation number on the queries.
// no budget? no accounting. | ||
return { done: true, cleanups: 0 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, we could return deletions.length
from deleteAllVatTranscripts
Hm, in general I like it, now I'm trying to walk through how that would work. Say we terminate a vat in block 1, then start deleting parts of it in block 2, and continue on through block 100. We delete the kvStore entries first (say blocks 2-40), then the snapshots (say blocks 41-50), then the transcript spans/items (say blocks 51-100). I think you're aiming to have swingstore exports immediately stop including artifacts for the terminated vat as of block 1. No transcript span artifacts, no snapshot artifacts. The exports at that point continue to have export-data for everything. We start losing export-data for kvStore entries during 2-40, but an export at block 40 still has all the snapshot export-data (hashes), plus transcript span records. Then in 41-50 we start seeing fewer and fewer snapshot export-data records, and in 51-100 we start losing transcript span records, until by block 100 we see no export-data records for anything related to the now-fully-deleted vat. We can't afford to delete all the export-data records during block 1, since they're all shadowed into IAVL, which we're protecting/rate-limiting just as much as SQLite. But we want
Ok, so clearing the For the snapStore, So.. I think it would just work? We delete the inUse/isCurrent record when the vat is terminated, and we immediately stop observing that vat's heap/transcript-span artifact names or artifacts in the export. The importer would import them if they were present, but it won't complain if they are not. Then, slowly, we delete the actual DB items, budget-limited, until they're all gone, at which point vatKeeper learns that there was nothing left to delete, and it deletes the record that says the vat was still being deleted. We still delete a |
Now let's see, should we delete the inUse/isCurrent entry, or should we set the flag to NULL and then delete the entry along with all the rest? I think it simplifies the I bet it would work to clear the flag too, but the IAVL |
That was my conclusion as well.
I was thinking of setting it to NULL.
Yes, I think that's the only change needed (and on import making sure we don't choke if there is no |
I was able to make this work by clearing the |
402811a
to
3200c52
Compare
a31549a
to
13ea1dc
Compare
fa87abe
to
4ad9689
Compare
1ba5547
to
afe1c29
Compare
4ad9689
to
0140249
Compare
My most recent rebase+edit includes a fix for the way we call the runPolicy: previously the |
afe1c29
to
49ba122
Compare
0140249
to
71ede77
Compare
49ba122
to
77c46d1
Compare
71ede77
to
06ccc1f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation comments; code review to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with the usual code style suggestions. And thanks again for the testing.
// false, or an object with optional .budget | ||
if (allowCleanup) { | ||
assert.typeof(allowCleanup, 'object'); | ||
if (allowCleanup.budget) { | ||
assert.typeof(allowCleanup.budget, 'number'); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like another opportunity for @endo/patterns
.
// false, or an object with optional .budget | |
if (allowCleanup) { | |
assert.typeof(allowCleanup, 'object'); | |
if (allowCleanup.budget) { | |
assert.typeof(allowCleanup.budget, 'number'); | |
} | |
} | |
mustMatch(harden(allowCleanup), allowCleanupShape); |
with
const allowCleanupShape = M.or(
// Prohibit cleanup.
false,
// Allow cleanup, optionally with a limiting budget.
M.splitRecord({}, { budget: M.number() }, M.record()),
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applied, but do we have any benchmarks on how expensive the more general pattern-matching approach is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an ad hoc measurement:
$ node esbench.mjs -h V8 -b 5 \
-M @endo/init \
-i 'import { M, mustMatch } from "@endo/patterns"; Object.assign(globalThis, { M, mustMatch });' \
-s '
const allowCleanupShape = M.or(
false,
M.splitRecord({}, { budget: M.number() }, M.record()),
);
const checkManual = allowCleanup => {
if (allowCleanup) {
assert.typeof(allowCleanup, "object");
if (allowCleanup.budget) assert.typeof(allowCleanup.budget, "number");
}
};
const makeObj = () => harden({ budget: 42 });
' \
'checkManual(false)' 'checkManual(makeObj())' \
'mustMatch(false, allowCleanupShape)' 'mustMatch(makeObj(), allowCleanupShape)'
#### V8
checkManual(false) (0) 19161.615353858455 ops/ms after 195 491520-count samples
checkManual(makeObj()) (0) 610.7172827172827 ops/ms after 199 15360-count samples
mustMatch(false, allowCleanupShape) (0) 426.1556886227545 ops/ms after 278 7680-count samples
mustMatch(makeObj(), allowCleanupShape) (0) 17.63226192852865 ops/ms after 368 240-count samples
It's about 25x slower in this case, but I think not of significance anywhere we're writing to a database. And I'll be improving it over the next few months anyway—I can already triple the speed for false
input and double it for object input with some tweaks to endo:
mustMatch(false, allowCleanupShape) (0) 1227.981345769487 ops/ms after 120 30720-count samples
mustMatch(makeObj(), allowCleanupShape) (0) 38.56 ops/ms after 482 240-count samples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking! So, converting into actually-comparable units, and focusing on the most-common "false" case, and ignoring the effects of the extra harden()
call:
- manual: 52ns / op
- allowCleanupShape: 2.3us / op
I know I can't legitimately complain unless I also measure the DB speed and show that it's not the limiting factor, and I'm not prepared to do that now, but I really worry about how these (admittedly tidy) helpers mask significant runtime.
I don't know where your instincts suggest that DB interaction is slow, but just in case it makes a difference, note that we only commit once per block, so most of the DB writes made by the kernel will just get parked in RAM for a while. And I expect that SQLite will be smart about serving a lot of reads from a cache.
I've been trying to think of a better
Host-apps that are ok with complete+immediate cleanup can either omit Host-apps that want to allow a little bit of cleanup in each Host-apps (like ours) which want to only perform cleanup during empty blocks should do their initial runs with Some concerns:
|
c862b45
to
227dafa
Compare
5c8afcc
to
47fe6e9
Compare
227dafa
to
486dfbb
Compare
fc2718e
to
3648e55
Compare
486dfbb
to
f6787e8
Compare
3648e55
to
1dcb1d2
Compare
f6787e8
to
c43bf63
Compare
This introduces new `runPolicy()` controls which enable "slow termination" of vats. When configured, terminated vats are immediately dead (all promises are rejected, all new messages go splat, they never run again), however the vat's state is deleted slowly, one piece at a time. This makes it safe to terminate large vats, with a long history, lots of c-list imports/exports, or large vatstore tables, without fear of causing an overload (by e.g. dropping 100k references all in a single crank). See docs/run-policy.md for details and configuration instructions. The kernelKeeper is upgraded from v1 to v2, to add a new 'vats.terminated' key, which tracks the vats that have been terminated but not yet completely deleted. NOTE: deployed applications must use `upgradeSwingset()` when using this kernel version for the first time. Also refactor vatKeeper.deleteSnapshotsAndTranscripts() into two separate methods, to fix a bug that hid in the combination: if the snapshot deletion phase exhausted our budget, we'd call deleteVatTranscripts() with a budget of 0, which was interpreted as "unlimited", and deleted all the transcript spans in a single burst. refs #8928 Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
1dcb1d2
to
9ac2ef0
Compare
This introduces new
runPolicy()
controls which enable "slowtermination" of vats. When configured, terminated vats are immediately
dead (all promises are rejected, all new messages go splat, they never
run again), however the vat's state is deleted slowly, one piece at a
time. This makes it safe to terminate large vats, with a long history,
lots of c-list imports/exports, or large vatstore tables, without fear
of causing an overload (by e.g. dropping 100k references all in a
single crank).
See docs/run-policy.md for details and configuration instructions.
Also changes swing-store to enable budget-limited deletion of vat
transcripts and snapshots.
refs #8928