[DNM][fix]: refactor deferred action handling to sidestep reentrancy issue #379

jamieQ · 2025-09-16T22:13:41Z

this is a draft of small change we could make to automatically enqueue event handling code that is triggered while a node is already handling an event. i don't think this will handle all cases, but it is a pretty minor change that does seem to resolve at least one that appears to sometimes surface in client code. a more holistic approach is explored in #381

mjohnson12

Yeah I agree this looks pretty safe to add

#### Terminology 1. `Sink`: public action handler type vended out to client code. these are the primary channel through which events from the 'outside world' are sent into a Workflow runtime. is a value type that wraps a (weak) reference to internal event handling infrastructure. 1. `ReusableSink`: internal action handler type that receives actions forwarded through from `Sink`s. it is a reference type that is weakly referenced by `Sink`s. 1. `EventPipe`: internal action handler type that implements an event handling state machine. it is a reference type that is owned by either a `ReusableSink` to propagate 'outside' events, or by a `SubtreeManager` to propagate outputs from child workflows. 1. `SubtreeManager`: type that coordinates interactions between a workflow node, its children, and the 'outside world'. #### Background the current event handling system in Workflow tracks event processing state locally – each node has a 'subtree manager' which is responsible for orchestrating interactions with both child workflow outputs and events from the 'outside world' sent to the corresponding node. each `SubtreeManager` coordinates a collection of 'event pipes' which handle event propagation; the 'outside world' has (indirect) access to them via 'sinks', and child workflow's communicate 'up' to their parents through `EventPipe` instances. these event pipes also encode the state machine transitions for event handling – if they're sent events in an invalid state they will generally halt the program. the `EventPipe` state machine consists of the following states: 1. `preparing`: the pipe has been created, but this state indicates a call to the corresponding node's `render()` method is still ongoing. 1. `pending`: the corresponding node's `render()` method is complete, but the full 'render pass' over the entire tree may not yet be. 1. `enabled`: the tree's render pass is finished, and the event pipe is valid to use. this state has a corresponding event handler closure that can be invoked when handling an event. 1. `invalid`: the event pipe is no longer valid to use. a node's event pipes are invalidated as one of the first parts of a render pass. the currently expected state transitions are: 1. <none> -> `preparing` - `preparing` is the initial state 1. `preparing` -> `pending` - after a node finishes `render()` 1. `pending` -> `enabled` - after the tree finishes a render pass 1. `enabled` -> `invalid` - when a node is about to be re-rendered[^1] [^1]: as an aside, it's reasonable to wonder if we may be allowing nodes to 'linger on' without invalidating their event handling infrastructure appropriately. what happens if a node is rendered in one pass and not rendered in the next? i think in most cases it just deallocates and so implicitly ends up ignoring any subsequent events, but... would probably be good to verify that and formalize what should be happening... the way current event pipe state machine works is: 1. during a render pass, every node gets new event pipes created, initially in the `preparing` state. any existing event pipes are set to the `invalid` state. 1. after a node is rendered, the event pipes that were created for that pass are moved to the `pending` state. 1. after a render pass is 'complete' and any Output and a new Rendering have been emitted, the nodes in the tree are walked and all event pipes are moved to the `enabled` state and hooked up with the right callbacks to invoke when they're sent events. some additional notes about the current implementation: 1. a number of invalid state transitions are banned outright and will cause a runtime trap if attempted. 1. the `ReusableSink` type contains logic to attempt to detect certain forms of reentrancy. specifically it will check if the event pipe state is `pending` and if it is, will enqueue the forwarding of the event into the future. 1. there is some limited reentrancy detection implemented when event pipes in the `invalid` state are reentrantly messaged. #### Issue the existing implementation seems to have generally worked reasonably well in practice, but there are cases when it falls down. in particular, two problematic cases that have been seen 'in the wild' are: 1. reentrant action emissions from synchronous side effects. perhaps the 'canonical' example of this is when an action is being processed that, during processing, leads to a change in `UIResponder` state (.e.g `resignFirstResponder()` is called), and some _other_ part of the Workflow tree responds to that change by emitting another action. 1. APIs that do manual `RunLoop` spinning, leading to reentrant action handling. one instance of this we've seen is when the UI layer responds to a new rendering by deriving an attributed string from HTML via some the `UIKit` extension methods it can end up spinning the main thread's run loop waiting for `WebKit` to do HTML processing, and if there are other Workflow events that have been enqueued they will cause the runtime to attempt to reentrantly process an event (generally leading to a `fatalError()`). as the existing implementation models the event handling state machine at the node level, it seems ill-equipped to deal with this problem holistically since the 'is the tree currently processing an event' bit of information is inherently a tree-level fact. we could try to augment the existing code with a state that represents 'the runtime is currently handling an event', but that seems somewhat awkward to fit into the existing model since a node would have to walk to the root and then update the whole tree with this new state[^2]. [^2]: except maybe _not_ the path from node to root? see, it seems kind of awkward... #### Proposed Solution in this PR, the approach taken to solve this is: - introduce a new `SinkEventHandler` type to track tree-level event processing state - plumb a new callback closure through the tree down to the `ReusableSink` instances (which are responsible to forwarding 'external' events into the `EventPipe` machinery) - the new callback takes two parameters: a closure to be immediately invoked if there is no event currently being handled, and a closure to be enqueued and called in the future if there is. - the callback implementation checks the current state and either invokes the 'run now' closure (adjusting processing state as appropriate) or enqueues the 'run later' closure. #### Alternatives i also drafted a more minimal change to address the new test case that i added which simulates the 'spinning the run loop during a render pass' problem in #379. instead of changing the plumbing and adding tree-level state tracking, it just changes how the enqueing logic in the `ReusableSink` implementation works. previously the enqueuing logic would defer handling and unconditionally forward the event through after the async block was processed. after the change, the method becomes fully recursive, so will check whether the original action should be enqueued again. while this approach requires changing less code, it is also less of a 'real' fix, as it won't solve cases in which someone, say, emits a second sink event targeting an ancestor node in the tree while a first sink event is being handled. --- #### Updates after initial feedback, made the following changes: - made the new event handling behavior conditional and added a `RuntimeConfig` value to opt into it - added a queue precondition check into the new `ReusableSink` event handling logic - removed the `initializing` state from `SinkEventHandler` in favor of just 'busy' and 'ready' - added a mechanism to explicitly enter the `busy` state while invoking a block so that the `WorkflowHost` can update the root node and ensure no event handling can synchronously occur during that process - added several new test cases (and minor refactoring of existing test utilities)

[fix]: refactor deferred action handling to sidestep reentrancy issue

8066d25

jamieQ mentioned this pull request Sep 18, 2025

[fix]: add host-level logic for reentrant event handling #381

Merged

mjohnson12 approved these changes Sep 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DNM][fix]: refactor deferred action handling to sidestep reentrancy issue #379

[DNM][fix]: refactor deferred action handling to sidestep reentrancy issue #379

jamieQ commented Sep 16, 2025 •

edited

Loading

Uh oh!

mjohnson12 left a comment

Uh oh!

Uh oh!

[DNM][fix]: refactor deferred action handling to sidestep reentrancy issue #379

Are you sure you want to change the base?

[DNM][fix]: refactor deferred action handling to sidestep reentrancy issue #379

Conversation

jamieQ commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjohnson12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jamieQ commented Sep 16, 2025 •

edited

Loading