[fix]: add host-level logic for reentrant event handling #381

jamieQ · 2025-09-18T21:47:15Z

Terminology

Sink: public action handler type vended out to client code. these are the primary channel through which events from the 'outside world' are sent into a Workflow runtime. is a value type that wraps a (weak) reference to internal event handling infrastructure.
ReusableSink: internal action handler type that receives actions forwarded through from Sinks. it is a reference type that is weakly referenced by Sinks.
EventPipe: internal action handler type that implements an event handling state machine. it is a reference type that is owned by either a ReusableSink to propagate 'outside' events, or by a SubtreeManager to propagate outputs from child workflows.
SubtreeManager: type that coordinates interactions between a workflow node, its children, and the 'outside world'.

Background

the current event handling system in Workflow tracks event processing state locally – each node has a 'subtree manager' which is responsible for orchestrating interactions with both child workflow outputs and events from the 'outside world' sent to the corresponding node. each SubtreeManager coordinates a collection of 'event pipes' which handle event propagation; the 'outside world' has (indirect) access to them via 'sinks', and child workflow's communicate 'up' to their parents through EventPipe instances. these event pipes also encode the state machine transitions for event handling – if they're sent events in an invalid state they will generally halt the program.

the EventPipe state machine consists of the following states:

preparing: the pipe has been created, but this state indicates a call to the corresponding node's render() method is still ongoing.
pending: the corresponding node's render() method is complete, but the full 'render pass' over the entire tree may not yet be.
enabled: the tree's render pass is finished, and the event pipe is valid to use. this state has a corresponding event handler closure that can be invoked when handling an event.
invalid: the event pipe is no longer valid to use. a node's event pipes are invalidated as one of the first parts of a render pass.

the currently expected state transitions are:

-> preparing
- preparing is the initial state
preparing -> pending
- after a node finishes render()
pending -> enabled
- after the tree finishes a render pass
enabled -> invalid
- when a node is about to be re-rendered¹

the way current event pipe state machine works is:

during a render pass, every node gets new event pipes created, initially in the preparing state. any existing event pipes are set to the invalid state.
after a node is rendered, the event pipes that were created for that pass are moved to the pending state.
after a render pass is 'complete' and any Output and a new Rendering have been emitted, the nodes in the tree are walked and all event pipes are moved to the enabled state and hooked up with the right callbacks to invoke when they're sent events.

some additional notes about the current implementation:

a number of invalid state transitions are banned outright and will cause a runtime trap if attempted.
the ReusableSink type contains logic to attempt to detect certain forms of reentrancy. specifically it will check if the event pipe state is pending and if it is, will enqueue the forwarding of the event into the future.
there is some limited reentrancy detection implemented when event pipes in the invalid state are reentrantly messaged.

Issue

the existing implementation seems to have generally worked reasonably well in practice, but there are cases when it falls down. in particular, two problematic cases that have been seen 'in the wild' are:

reentrant action emissions from synchronous side effects. perhaps the 'canonical' example of this is when an action is being processed that, during processing, leads to a change in UIResponder state (.e.g resignFirstResponder() is called), and some other part of the Workflow tree responds to that change by emitting another action.
APIs that do manual RunLoop spinning, leading to reentrant action handling. one instance of this we've seen is when the UI layer responds to a new rendering by deriving an attributed string from HTML via some the UIKit extension methods it can end up spinning the main thread's run loop waiting for WebKit to do HTML processing, and if there are other Workflow events that have been enqueued they will cause the runtime to attempt to reentrantly process an event (generally leading to a fatalError()).

as the existing implementation models the event handling state machine at the node level, it seems ill-equipped to deal with this problem holistically since the 'is the tree currently processing an event' bit of information is inherently a tree-level fact. we could try to augment the existing code with a state that represents 'the runtime is currently handling an event', but that seems somewhat awkward to fit into the existing model since a node would have to walk to the root and then update the whole tree with this new state².

Proposed Solution

in this PR, the approach taken to solve this is:

introduce a new SinkEventHandler type to track tree-level event processing state
plumb a new callback closure through the tree down to the ReusableSink instances (which are responsible to forwarding 'external' events into the EventPipe machinery)
- the new callback takes two parameters: a closure to be immediately invoked if there is no event currently being handled, and a closure to be enqueued and called in the future if there is.
the callback implementation checks the current state and either invokes the 'run now' closure (adjusting processing state as appropriate) or enqueues the 'run later' closure.

Alternatives

i also drafted a more minimal change to address the new test case that i added which simulates the 'spinning the run loop during a render pass' problem in #379. instead of changing the plumbing and adding tree-level state tracking, it just changes how the enqueing logic in the ReusableSink implementation works. previously the enqueuing logic would defer handling and unconditionally forward the event through after the async block was processed. after the change, the method becomes fully recursive, so will check whether the original action should be enqueued again. while this approach requires changing less code, it is also less of a 'real' fix, as it won't solve cases in which someone, say, emits a second sink event targeting an ancestor node in the tree while a first sink event is being handled.

Updates

after initial feedback, made the following changes:

made the new event handling behavior conditional and added a RuntimeConfig value to opt into it
added a queue precondition check into the new ReusableSink event handling logic
removed the initializing state from SinkEventHandler in favor of just 'busy' and 'ready'
added a mechanism to explicitly enter the busy state while invoking a block so that the WorkflowHost can update the root node and ensure no event handling can synchronously occur during that process
added several new test cases (and minor refactoring of existing test utilities)

as an aside, it's reasonable to wonder if we may be allowing nodes to 'linger on' without invalidating their event handling infrastructure appropriately. what happens if a node is rendered in one pass and not rendered in the next? i think in most cases it just deallocates and so implicitly ends up ignoring any subsequent events, but... would probably be good to verify that and formalize what should be happening... ↩
except maybe not the path from node to root? see, it seems kind of awkward... ↩

jamieQ · 2025-09-23T18:49:25Z

Workflow/Sources/SubtreeManager.swift

-            if case .pending = eventPipe.validationState {
-                // Workflow is currently processing an `event`.
-                // Scheduling it to be processed after.
-                DispatchQueue.workflowExecution.async { [weak self] in
-                    self?.eventPipe.handle(event: output)
-                }
-                return


trying to think through if we need this check anymore... the pending state should only have been possible to be in after a node was rendered during a render pass, but before the rendering & output had finished being emitted. i guess it's maybe conceivable that a child node could do something that would trigger some side effect that sent a sink event to a parent node in the pending state... 🤔. i think the 'happy path' case is that it would just be enqueued b/c we'd be re-rendering due to handling an event (and so would hit the 'enqueue' case in the onSinkEvent callback), but there is maybe an edge case to consider where the workflow host updates the root node independently (no event being handled) so maybe we should be a bit more cautious here...

I think you've convinced me that it still makes sense to handle this possible edge case, what's the downside of leaving this? Does the dispatch approach not work with the new paradigm?

i agree, and think leaving something like this (at least for now¹) makes sense. i think we do still need to change the logic that gets enqueued – the way the existing code works fails in the 'someone spun the run loop before you finished a render pass' case, because those manual run loop turns can run that enqueued block, but it doesn't recurse into the ReusableSink method itself and re-check the validation state; it just unconditionally forwards through to the event pipe and we hope for the best (which seems to often crash in that edge case).

there's also an API design question here in my mind – who is responsible for making the check? i'm inclined to also move that logic out to the new SinkEventHandler type so the decision making is basically all in one place, but we'll need to pass the node-local state through to do that. it's a little awkward b/c the validation state enum is generic over various things, but we could just pass a isPending flag in the callback i suppose.

Footnotes

it's conceivable to me that much of the node-local state could probably be moved out to the 'tree level', but haven't thought of a compelling reason to work though that at the moment ↩

i made a couple changes to address this:

added a new method withEventHandlingSuspended (better names welcome) that can be used by the workflow host to explicitly ensure no synchronous event handlers will be run when updating the root node of the tree

restored the original handling logic & made the new event handling code paths conditional on a runtime configuration, so they will be opt-in and we can enable it via a feature flag

mjohnson12

Great description and refresher on the types involved.
I did remember how event handling worked but this was great overview.
I like the new SinkEventHandler and OnSinkEvent.
This looks good to me.

kcsiegal

I am totally new to Workflow's event handling but this seems very reasonable to me, appreciate the detailed context in the PR description and comments.

kcsiegal · 2025-09-23T19:50:46Z

Workflow/Sources/SubtreeManager.swift

-            if case .pending = eventPipe.validationState {
-                // Workflow is currently processing an `event`.
-                // Scheduling it to be processed after.
-                DispatchQueue.workflowExecution.async { [weak self] in
-                    self?.eventPipe.handle(event: output)
-                }
-                return


I think you've convinced me that it still makes sense to handle this possible edge case, what's the downside of leaving this? Does the dispatch approach not work with the new paradigm?

kcsiegal · 2025-09-23T20:01:01Z

Workflow/Sources/WorkflowHost.swift

+
+/// Handles events from 'Sinks' such that runtime-level event handling state is appropriately
+/// managed, and attempts to perform reentrant action handling can be detected and dealt with.
+final class SinkEventHandler {


I find this type's API and behavior very intuitive, I like it!

Workflow/Sources/WorkflowHost.swift

kcsiegal · 2025-09-23T20:04:05Z

Workflow/Tests/WorkflowHostTests.swift


        XCTAssertEqual(observedRenderCount, 1)
+
+        drainMainQueueBySpinningRunLoop()


nabs-m

I don't have much experience with Workflow, so some of these concepts are new to me, but thank you for the clear explanation in the description!

Workflow/Sources/WorkflowHost.swift

Copilot

Pull Request Overview

This PR implements host-level logic for reentrant event handling in the Workflow runtime by introducing a new SinkEventHandler type to track tree-level event processing state and properly handle reentrant action emissions.

Introduces SinkEventHandler to manage tree-level event processing state with ready and busy states
Adds runtime configuration flag useSinkEventHandler to conditionally enable the new behavior
Plumbs event handling callbacks through the runtime to ReusableSink instances for proper state management

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
WorkflowHost.swift	Adds `SinkEventHandler` implementation and integrates it with the host initialization and update flow
SubtreeManager.swift	Updates sink creation to accept and use the new sink event callback for proper event handling
RuntimeConfiguration.swift	Adds configuration flag to enable the new sink event handler behavior
SinkEventHandlerTests.swift	Comprehensive test coverage for the new `SinkEventHandler` functionality
WorkflowHostTests.swift	Tests for reentrant event scenarios and integration with the new event handler
TestUtilities.swift	Moves shared test utilities and adds utility functions for async testing
WorkflowObserverTests.swift	Removes duplicate test observer implementation in favor of shared utility

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Workflow/Tests/WorkflowHostTests.swift

Workflow/Sources/WorkflowHost.swift

Workflow/Sources/SubtreeManager.swift

[fix]: add host-level logic for reentrant event handling

208db31

jamieQ mentioned this pull request Sep 18, 2025

[DNM][fix]: refactor deferred action handling to sidestep reentrancy issue #379

Draft

jamieQ marked this pull request as ready for review September 23, 2025 18:28

jamieQ requested a review from a team as a code owner September 23, 2025 18:28

jamieQ commented Sep 23, 2025

View reviewed changes

mjohnson12 approved these changes Sep 23, 2025

View reviewed changes

kcsiegal approved these changes Sep 23, 2025

View reviewed changes

nabs-m approved these changes Sep 23, 2025

View reviewed changes

Workflow/Sources/WorkflowHost.swift Outdated Show resolved Hide resolved

Workflow/Sources/WorkflowHost.swift Outdated Show resolved Hide resolved

jamieQ added 2 commits September 24, 2025 17:27

[feedback]: renames, refactors, tests

5248408

[refactor]: put new event handling behind a runtime config

1c58ab6

jamieQ changed the title ~~[DNM][fix]: add host-level logic for reentrant event handling~~ [fix]: add host-level logic for reentrant event handling Sep 25, 2025

jamieQ requested a review from Copilot September 25, 2025 14:59

Copilot AI reviewed Sep 25, 2025

View reviewed changes

Workflow/Tests/WorkflowHostTests.swift Show resolved Hide resolved

Workflow/Sources/WorkflowHost.swift Show resolved Hide resolved

Workflow/Sources/SubtreeManager.swift Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fix]: add host-level logic for reentrant event handling #381

[fix]: add host-level logic for reentrant event handling #381

Uh oh!

jamieQ commented Sep 18, 2025 •

edited

Loading

Uh oh!

jamieQ Sep 23, 2025

Uh oh!

kcsiegal Sep 23, 2025

Uh oh!

jamieQ Sep 24, 2025

Uh oh!

jamieQ Sep 25, 2025

Uh oh!

mjohnson12 left a comment

Uh oh!

kcsiegal left a comment

Uh oh!

kcsiegal Sep 23, 2025

Uh oh!

kcsiegal Sep 23, 2025

Uh oh!

Uh oh!

kcsiegal Sep 23, 2025

Uh oh!

nabs-m left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!


		XCTAssertEqual(observedRenderCount, 1)

		drainMainQueueBySpinningRunLoop()

[fix]: add host-level logic for reentrant event handling #381

Are you sure you want to change the base?

[fix]: add host-level logic for reentrant event handling #381

Uh oh!

Conversation

jamieQ commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Terminology

Background

Issue

Proposed Solution

Alternatives

Updates

Footnotes

Uh oh!

jamieQ Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

kcsiegal Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

jamieQ Sep 24, 2025

Choose a reason for hiding this comment

Footnotes

Uh oh!

jamieQ Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

mjohnson12 left a comment

Choose a reason for hiding this comment

Uh oh!

kcsiegal left a comment

Choose a reason for hiding this comment

Uh oh!

kcsiegal Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

kcsiegal Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kcsiegal Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

nabs-m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jamieQ commented Sep 18, 2025 •

edited

Loading