Ensure ev/gather fibers are fully canceled on error #1181
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When an error occurs in an
ev/gather
form, it is possible that not all the sibling fibers are canceled. This can lead to process hangs, if an uncanceled fiber blocks.The problem is that
each
is not well-defined for tables that are being mutated. So, sometimes the(put fibers f nil)
incancel-all
causes iteration to skip entries. It's not deterministic since the hash of a fiber is based on its address. For integers, it is deterministic, and you can see the effect here:A script like this can be used to reproduce the process hangs:
This PR fixes the cancellation problem by first iterating the collection of fibers to call
ev/cancel
, then clearing the table all at once after. It also changes the semantics ofev/gather
to guarantee that all canceled fibers have completed before returning. I think this is desirable but could be convinced otherwise.