-
Notifications
You must be signed in to change notification settings - Fork 47
[DNM][fix]: refactor deferred action handling to sidestep reentrancy issue #379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
jamieQ
wants to merge
1
commit into
main
Choose a base branch
from
jquadri/reentrancy-minimal
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+138
−21
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mjohnson12
approved these changes
Sep 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree this looks pretty safe to add
jamieQ
added a commit
that referenced
this pull request
Oct 6, 2025
#### Terminology 1. `Sink`: public action handler type vended out to client code. these are the primary channel through which events from the 'outside world' are sent into a Workflow runtime. is a value type that wraps a (weak) reference to internal event handling infrastructure. 1. `ReusableSink`: internal action handler type that receives actions forwarded through from `Sink`s. it is a reference type that is weakly referenced by `Sink`s. 1. `EventPipe`: internal action handler type that implements an event handling state machine. it is a reference type that is owned by either a `ReusableSink` to propagate 'outside' events, or by a `SubtreeManager` to propagate outputs from child workflows. 1. `SubtreeManager`: type that coordinates interactions between a workflow node, its children, and the 'outside world'. #### Background the current event handling system in Workflow tracks event processing state locally – each node has a 'subtree manager' which is responsible for orchestrating interactions with both child workflow outputs and events from the 'outside world' sent to the corresponding node. each `SubtreeManager` coordinates a collection of 'event pipes' which handle event propagation; the 'outside world' has (indirect) access to them via 'sinks', and child workflow's communicate 'up' to their parents through `EventPipe` instances. these event pipes also encode the state machine transitions for event handling – if they're sent events in an invalid state they will generally halt the program. the `EventPipe` state machine consists of the following states: 1. `preparing`: the pipe has been created, but this state indicates a call to the corresponding node's `render()` method is still ongoing. 1. `pending`: the corresponding node's `render()` method is complete, but the full 'render pass' over the entire tree may not yet be. 1. `enabled`: the tree's render pass is finished, and the event pipe is valid to use. this state has a corresponding event handler closure that can be invoked when handling an event. 1. `invalid`: the event pipe is no longer valid to use. a node's event pipes are invalidated as one of the first parts of a render pass. the currently expected state transitions are: 1. <none> -> `preparing` - `preparing` is the initial state 1. `preparing` -> `pending` - after a node finishes `render()` 1. `pending` -> `enabled` - after the tree finishes a render pass 1. `enabled` -> `invalid` - when a node is about to be re-rendered[^1] [^1]: as an aside, it's reasonable to wonder if we may be allowing nodes to 'linger on' without invalidating their event handling infrastructure appropriately. what happens if a node is rendered in one pass and not rendered in the next? i think in most cases it just deallocates and so implicitly ends up ignoring any subsequent events, but... would probably be good to verify that and formalize what should be happening... the way current event pipe state machine works is: 1. during a render pass, every node gets new event pipes created, initially in the `preparing` state. any existing event pipes are set to the `invalid` state. 1. after a node is rendered, the event pipes that were created for that pass are moved to the `pending` state. 1. after a render pass is 'complete' and any Output and a new Rendering have been emitted, the nodes in the tree are walked and all event pipes are moved to the `enabled` state and hooked up with the right callbacks to invoke when they're sent events. some additional notes about the current implementation: 1. a number of invalid state transitions are banned outright and will cause a runtime trap if attempted. 1. the `ReusableSink` type contains logic to attempt to detect certain forms of reentrancy. specifically it will check if the event pipe state is `pending` and if it is, will enqueue the forwarding of the event into the future. 1. there is some limited reentrancy detection implemented when event pipes in the `invalid` state are reentrantly messaged. #### Issue the existing implementation seems to have generally worked reasonably well in practice, but there are cases when it falls down. in particular, two problematic cases that have been seen 'in the wild' are: 1. reentrant action emissions from synchronous side effects. perhaps the 'canonical' example of this is when an action is being processed that, during processing, leads to a change in `UIResponder` state (.e.g `resignFirstResponder()` is called), and some _other_ part of the Workflow tree responds to that change by emitting another action. 1. APIs that do manual `RunLoop` spinning, leading to reentrant action handling. one instance of this we've seen is when the UI layer responds to a new rendering by deriving an attributed string from HTML via some the `UIKit` extension methods it can end up spinning the main thread's run loop waiting for `WebKit` to do HTML processing, and if there are other Workflow events that have been enqueued they will cause the runtime to attempt to reentrantly process an event (generally leading to a `fatalError()`). as the existing implementation models the event handling state machine at the node level, it seems ill-equipped to deal with this problem holistically since the 'is the tree currently processing an event' bit of information is inherently a tree-level fact. we could try to augment the existing code with a state that represents 'the runtime is currently handling an event', but that seems somewhat awkward to fit into the existing model since a node would have to walk to the root and then update the whole tree with this new state[^2]. [^2]: except maybe _not_ the path from node to root? see, it seems kind of awkward... #### Proposed Solution in this PR, the approach taken to solve this is: - introduce a new `SinkEventHandler` type to track tree-level event processing state - plumb a new callback closure through the tree down to the `ReusableSink` instances (which are responsible to forwarding 'external' events into the `EventPipe` machinery) - the new callback takes two parameters: a closure to be immediately invoked if there is no event currently being handled, and a closure to be enqueued and called in the future if there is. - the callback implementation checks the current state and either invokes the 'run now' closure (adjusting processing state as appropriate) or enqueues the 'run later' closure. #### Alternatives i also drafted a more minimal change to address the new test case that i added which simulates the 'spinning the run loop during a render pass' problem in #379. instead of changing the plumbing and adding tree-level state tracking, it just changes how the enqueing logic in the `ReusableSink` implementation works. previously the enqueuing logic would defer handling and unconditionally forward the event through after the async block was processed. after the change, the method becomes fully recursive, so will check whether the original action should be enqueued again. while this approach requires changing less code, it is also less of a 'real' fix, as it won't solve cases in which someone, say, emits a second sink event targeting an ancestor node in the tree while a first sink event is being handled. --- #### Updates after initial feedback, made the following changes: - made the new event handling behavior conditional and added a `RuntimeConfig` value to opt into it - added a queue precondition check into the new `ReusableSink` event handling logic - removed the `initializing` state from `SinkEventHandler` in favor of just 'busy' and 'ready' - added a mechanism to explicitly enter the `busy` state while invoking a block so that the `WorkflowHost` can update the root node and ensure no event handling can synchronously occur during that process - added several new test cases (and minor refactoring of existing test utilities)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
this is a draft of small change we could make to automatically enqueue event handling code that is triggered while a node is already handling an event. i don't think this will handle all cases, but it is a pretty minor change that does seem to resolve at least one that appears to sometimes surface in client code. a more holistic approach is explored in #381