-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: triage await safety #6219
fix: triage await safety #6219
Conversation
4f8a244
to
b38f158
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at a handful of files as noted.
// `ensureVatOnline` is an async function, so this turn boundary always | ||
// happens at the end of each iteration. However, the control-flow is | ||
// still unbalanced. If the loop iterates zero times, we proceed to | ||
// the next code synchronously. Else we proceed asynchronously. | ||
// However, the next loop is terminal, and each iteration of that loop | ||
// resemble each iteration of this one. Considered together, in all | ||
// cases the first iteration of one of these loops will happen first, | ||
// if there are any. And if there are more iterations from either loop, | ||
// a turn boundary separated each iteration from the next. | ||
// eslint-disable-next-line no-await-in-loop, @jessie.js/no-nested-await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur with the justifications in this file, but yes let's do refer to the full explanations by reference rather than having them inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// flow branch, and is top level within that branch. | ||
// eslint-disable-next-line @jessie.js/no-nested-await | ||
const array = await handleCommand(message.subarray(1)); | ||
// This nested await is ALMOST safe because "terminal-control-flow". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paging @kriskowal . PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments so far
@@ -85,6 +85,8 @@ export default async function cosmosMain(progname, rawArgs, powers, opts) { | |||
} | |||
|
|||
if (popts.pull) { | |||
// This await is safe because "terminal-control-flow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree here. Since the line below the if
is a recursive call, if opts.dockerTag && ! popts.pull
then this will infinite loop within a turn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not something an attacker can supply, so this is low risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #6235
@@ -197,9 +197,22 @@ const start = async zcf => { | |||
trace('exiting processTranches loop'); | |||
return; | |||
} | |||
// This await almost certainly does not do what was intended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dang it! That seems correct. How would it get integrated so that a long-lived generator is pulled on and awaited once per block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #6220
await oncePerBlock(); | ||
|
||
// Determine the max allowed tranche size | ||
// | ||
// This nested await is safe because "terminal-control-flow". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all part of the related design pattern of using for for await
and an async generator. We should have one comment to that effect and then suppress the overall nested await check in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we turn the current Jessie warning off for the whole file, then we won't notice if an await slips in that isn't safe for this reason. Until we have more static check distinctions, I think we should still suppress them individually, citing the safety principle for each one. But obviously, we can pull out all further commentary.
// It occurs at top level of the body of a top level terminal | ||
// while. But do note that the enclosing function is an asynchronous | ||
// generator function and that there is a `yield` within the loop. | ||
// I don't think that causes any additional worries, but I have not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it does not. This must be used within a for await
, which awaits at the top of the loop (and therefore at each yield
). So a yield
i effectively an await
.
@@ -466,7 +466,16 @@ const helperBehavior = { | |||
|
|||
// The rest of this method will not happen until after a quote is received. | |||
// This may not happen until much later, when the market changes. | |||
// eslint-disable-next-line no-await-in-loop | |||
// | |||
// This nested await is safe because "terminal-control-flow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should refer to the for/await/generator/while pattern
that is used in several places for vaults.
// | ||
// Some code does execute after this one, synchronously or asynchronously. | ||
// That would normally be unsafe, but all that code is clearly not stateful. | ||
// Its correctness is independent on timing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Its correctness is independent on timing. | |
// Its correctness is independent of timing. |
await oncePerBlock(); | ||
|
||
// Determine the max allowed tranche size | ||
// | ||
// This nested await is safe because "terminal-control-flow". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we turn the current Jessie warning off for the whole file, then we won't notice if an await slips in that isn't safe for this reason. Until we have more static check distinctions, I think we should still suppress them individually, citing the safety principle for each one. But obviously, we can pull out all further commentary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one element that I think should NOT be checked in, and a few alternate suggestions. I reviewed all changes, except for the "IIUC" examples in e.g., solo
directories.
// this statement and proceeding syncronously to the next | ||
// iteration. The await at the top of the loop ensures that the | ||
// remainder of the loop only happens after a turn boundary. | ||
// The only remaining unsafety is that the awaited expression of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the other branches are terminal (return or throw) and so this unsafety does not actaully occur here.
// loop body safer. A `for await of` will correctly process a synchronous | ||
// iterator, but it will unconditionally separate the iterations with | ||
// with turn boundaries. | ||
for await (const step of steps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This potentially adds a a lot of awaits. (2 additional awaits per step). While they are cheap individually, steps could I think be quite large if e.g., you are trying to catch up on a restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading vats on replay is a serialized process, and so I think the risk of the current code is low. I recommend NOT including this change without a lot more testing than we can perform at this time. And so it shoudl just wait until it gets refactored to not need this kind of layering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #6242
// IIAFE, we ensure that the `unlink` can only be reached asynchronously. | ||
// | ||
// eslint-disable-next-line @jessie.js/no-nested-await | ||
await (async () => rename(tmpFilePath, target))(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler might be:
await null;
await rename(tmpFilePath, target);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics are more different. Might be ok in this case, but...
The original code always calls rename in the same turn that was already executing. For all I know that atomicity may be important. My change preserves that. Your's puts an interleaving point before the rename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTOH, there is an earlier await almost immediately above it. In this case, your extra await would only break up the atomicity with the resolve
which seems unlikely to cause a problem, but would still need looking into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #6237
// `if` is not terminal because of the stateful `close()`s afterward. | ||
// On the else branch, those `close()`s would happen synchronously. | ||
// eslint-disable-next-line @jessie.js/no-nested-await | ||
await (async () => destination.sync())(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above. This seems too mysterious. I'd live with it if it were more efficient though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, in this case it happens to be safe to introduce a new interleaving point before the sync()
call, because the immediately preceding stateful code is another await. But it is not a technique that works in general, so we should be careful about making it a habit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #6239
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The await safety triaging seemed to have enough review so I reviewed how it's communicated.
I will then collect an explanation of the surviving principles and put them someplace more reusable, like our wiki.
+1 to having an index of the principles to reference. Ideally this would be something you can pop into within the IDE. I think something like this would accomplish that: /** @see {SafeAwait['terminal-control-flow']} */
. Short of that, I think the URL should be stable and go directly to the explanation, which both rule out the Github wiki. E.g. shorten.er/await-safety#terminal-control-flow
For the "IIUC" suppressions, I think you did so I pushed package level disabling (agoric-cli, solo).
I also noticed the change to allValues
was just one of the few instances so I pushed a change to consolidate them.
@@ -25,8 +25,10 @@ const { keys, values, fromEntries } = Object; | |||
* @returns {Promise<Record<string, V>>} | |||
* @template V | |||
*/ | |||
const allValues = async obj => | |||
fromEntries(zip(keys(obj), await Promise.all(values(obj)))); | |||
const allValues = async obj => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have several of these. I pushed 667ad5c to DRY them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1,3 +1,8 @@ | |||
// IIUC the purpose of the agoric-cli package, we do not need to worry | |||
// about await safety for any of the code in this package. | |||
// TODO someone who understands this package better should verify this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirmed. let's disable for the whole package. 397c702
@@ -1,3 +1,8 @@ | |||
// IIUC the purpose of the solo package, we do not need to worry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disabled for package. 0163fee
.eslintrc.cjs
Outdated
// are code-reviewed. | ||
// The rule is “no nested awaits” but the architectural goal is “no | ||
// possibility of ‘awaits sometimes but not always’”. That is our | ||
// architectural rule for vats which require distributed determinism. If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is distributed determinism relevant? I thought the issue was correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing this part I forgot to mention. I was trying to distinguish why CLI and solo didn't need this linking.
My understanding is it's only relevant when there things can happen in different turns, ergo blockchain context.
What's a better description of when the lint rule needs to be enforced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the purposes of the cli
and solo
packages? What are the worst case consequences of an interleaving bug in the code of these packages?
Depending on the answers, we may want to leave these warnings on for these packages and carry through the triage for the problematic awaits is these packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different turns happen any time code doesn't complete sequentially; every await
or promise.then(...)
or I/O callback introduces another turn. No blockchain context needed.
My understanding is that the rule is relevant any time we want correct code; that is: to avoid a pattern that is known to cause bugs. I could be wrong. @erights ? @dtribble ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the purposes of the cli and solo packages? What are the worst case consequences of an interleaving bug in the code of these packages?
CLI is ergonomics for an end-user (or dev) to craft and send commands. Solo is is the solo Vat runner, which tmk is for testing. I can't think of any dire consequences to bugs in either.
Different turns happen any time code doesn't complete sequentially
Where can I read more about this? I've never encountered the term before Agoric and can't find it in web search.
If this PR needs to land for PS0, @erights feel free to remove any of my changes. I can follow up with them later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though they don't seem to use the term "turn", the concept is well known in the JS world, yes? A "turn" is just one turn through the event loop.
For example, https://developer.mozilla.org/en-US/docs/Web/JavaScript/EventLoop:
At some point during the event loop, the runtime starts handling the messages on the queue, starting with the oldest one. To do so, the message is removed from the queue and its corresponding function is called with the message as an input parameter. As always, calling a function creates a new stack frame for that function's use.
The processing of functions continues until the stack is once again empty. Then, the event loop will process the next message in the queue (if there is one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. I do agree that the whole "Robust Composition" thesis is worth reading in due course. Though it's over 200 pages, it's not dense material. Much of it is stuff you almost certainly already know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solo is is the solo Vat runner, which tmk is for testing.
Solo powers ag-solo
; it's it's intended, in due course, for dapp back-ends that may hold significant assets. @dtribble recently gave an example of a gateway to an online game world, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds like we should just drop my two eslint/comment changing commits. I figure that's easiest to do by interactive rebase and force push. To avoid clobbering any concurrent work I'll refrain unless you request it @erights
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do it when I get back to working on this PR. Thanks.
// In that case, the function exits would executing and further | ||
// code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can’t parse this sentence. Perhaps s/would/without/, s/any/any/?
// that initial await may therefore be reached synchronously or | ||
// asynchronously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression awaited at the top of the cthuloop, messagesFromXsnap.next(undefined)
, just manipulates an entangled pair of async promise queues, putting undefined on one and getting the next promise from the other. I’m not sure whether this affects your analysis.
// | ||
// Some code does execute after this one, synchronously or asynchronously. | ||
// That would normally be unsafe, but all that code is clearly not stateful. | ||
// Its correctness is independent on timing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Its correctness is independent on timing. | |
// Its correctness is independent of timing. |
writeContents
argument function
#6240
atomicWriteInRoot(baseName, async tmpGzPath => | ||
filter(tmpSnapPath, createGzip(), tmpGzPath, { flush: true }), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #6240
// loop body safer. A `for await of` will correctly process a synchronous | ||
// iterator, but it will unconditionally separate the iterations with | ||
// with turn boundaries. | ||
for await (const step of steps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #6242
a1a5790
to
08b83f6
Compare
3dd42e2
to
43c98b8
Compare
43c98b8
to
2434353
Compare
95f67b4
to
cd75501
Compare
Reviewers, this PR is in good shape. The embargo has been lifted. AFAIK, nothing significant changed since the reviews above. But, this is the PR that I'm most nervous about needing to keep up to date to follow changes on master. I'd much rather merge it before there are substantial changes on master to reduce the coordination work. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed the whole thing... when I saw separate issues filed for the various parts, I assumed this PR would get split up similarly. But I gather you're interested to merge this soonish.
Please let's do refer to the full explanations by reference rather than having them inline (e.g. vat-warehouse.js)
And we should have a test for each fix. Perhaps that's not cost-effective, but let's make that decision explicitly; no fair skipping the "testing considerations" part of the PR template :)
I'm not sure which reviewer is in a position to approve this whole thing. I am not. Perhaps a PR per package? hm... a dozen PRs is a lot of tedium. Maybe separate commits? Also tedious, and doesn't clarify who approved what very well.
Start of that https://gist.github.com/turadg/e7d7645789218173a578508ae7640365 (now at endojs/Jessie#93)
For splitting up:
|
cd75501
to
4ea95f8
Compare
1da4adb
to
5e32b38
Compare
5e32b38
to
e342a67
Compare
Closing in favor of the unbundling at #6219 (comment) |
Jessie warns of violations of https://github.com/Agoric/agoric-sdk/wiki/No-Nested-Await . Before this PR, a
yarn lint
of agoric-sdk generated 188 of these warnings. This PR triages those warnings. 10 remain warnings and should be fixed subsequent to this PR, preferably by someone who better understands the code these occur in.For each warning, I either
For each explanation I inserted, I tried to state the principle I found myself using to figure out that the code is safe. For each of these, I coined a term for this principle, such as "terminal-control-flow", and explain it well the first time I encounter it. Other times, I use the name of the principle, explaining only why it applies to that individual case.
Of my fixes, if they were about correctness, I have filed one of "Fixes" issues listed below that also raises the possibility of cherry picking that fix, so that it could go in ahead of the rest of this PR.
Reviewers, after this PR goes through its first sanity check by you, based on your feedback I will then collect an explanation of the surviving principles and put them someplace more reusable, like our wiki.
Everyone, please look at my changes to the code you're most familiar with to see if I broke anything. I tried to be very careful not to, but... . Also, please see if my explanations of why your code was safe, or with my mods is now safe, make sense to you. Even better if you could send me less convoluted replacements than mine for your current code, for example code in which we no longer need to suppress the Jessie await warning.
Does not yet fix #6220 , but I expect to include it before merging if it has not already been fixed.
Still needing analysis, but could be addressed in separate PRs:
#6226
#6227
#6228
#6229
#6230
#6231
#6232
#6233
#6234
#6235
This PR would fix, if not already fixed by cherry picking from this PR:
Fixes #6237
Fixes #6239
Fixes #6240
Fixes #6242
Fixes #6243
Fixes #6244