Skip to content

Conversation

jamieQ
Copy link
Contributor

@jamieQ jamieQ commented Sep 18, 2025

Terminology

  1. Sink: public action handler type vended out to client code. these are the primary channel through which events from the 'outside world' are sent into a Workflow runtime. is a value type that wraps a (weak) reference to internal event handling infrastructure.
  2. ReusableSink: internal action handler type that receives actions forwarded through from Sinks. it is a reference type that is weakly referenced by Sinks.
  3. EventPipe: internal action handler type that implements an event handling state machine. it is a reference type that is owned by either a ReusableSink to propagate 'outside' events, or by a SubtreeManager to propagate outputs from child workflows.
  4. SubtreeManager: type that coordinates interactions between a workflow node, its children, and the 'outside world'.

Background

the current event handling system in Workflow tracks event processing state locally – each node has a 'subtree manager' which is responsible for orchestrating interactions with both child workflow outputs and events from the 'outside world' sent to the corresponding node. each SubtreeManager coordinates a collection of 'event pipes' which handle event propagation; the 'outside world' has (indirect) access to them via 'sinks', and child workflow's communicate 'up' to their parents through EventPipe instances. these event pipes also encode the state machine transitions for event handling – if they're sent events in an invalid state they will generally halt the program.

the EventPipe state machine consists of the following states:

  1. preparing: the pipe has been created, but this state indicates a call to the corresponding node's render() method is still ongoing.
  2. pending: the corresponding node's render() method is complete, but the full 'render pass' over the entire tree may not yet be.
  3. enabled: the tree's render pass is finished, and the event pipe is valid to use. this state has a corresponding event handler closure that can be invoked when handling an event.
  4. invalid: the event pipe is no longer valid to use. a node's event pipes are invalidated as one of the first parts of a render pass.

the currently expected state transitions are:

  1. -> preparing
    • preparing is the initial state
  2. preparing -> pending
    • after a node finishes render()
  3. pending -> enabled
    • after the tree finishes a render pass
  4. enabled -> invalid
    • when a node is about to be re-rendered1

the way current event pipe state machine works is:

  1. during a render pass, every node gets new event pipes created, initially in the preparing state. any existing event pipes are set to the invalid state.
  2. after a node is rendered, the event pipes that were created for that pass are moved to the pending state.
  3. after a render pass is 'complete' and any Output and a new Rendering have been emitted, the nodes in the tree are walked and all event pipes are moved to the enabled state and hooked up with the right callbacks to invoke when they're sent events.

some additional notes about the current implementation:

  1. a number of invalid state transitions are banned outright and will cause a runtime trap if attempted.
  2. the ReusableSink type contains logic to attempt to detect certain forms of reentrancy. specifically it will check if the event pipe state is pending and if it is, will enqueue the forwarding of the event into the future.
  3. there is some limited reentrancy detection implemented when event pipes in the invalid state are reentrantly messaged.

Issue

the existing implementation seems to have generally worked reasonably well in practice, but there are cases when it falls down. in particular, two problematic cases that have been seen 'in the wild' are:

  1. reentrant action emissions from synchronous side effects. perhaps the 'canonical' example of this is when an action is being processed that, during processing, leads to a change in UIResponder state (.e.g resignFirstResponder() is called), and some other part of the Workflow tree responds to that change by emitting another action.

  2. APIs that do manual RunLoop spinning, leading to reentrant action handling. one instance of this we've seen is when the UI layer responds to a new rendering by deriving an attributed string from HTML via some the UIKit extension methods it can end up spinning the main thread's run loop waiting for WebKit to do HTML processing, and if there are other Workflow events that have been enqueued they will cause the runtime to attempt to reentrantly process an event (generally leading to a fatalError()).

as the existing implementation models the event handling state machine at the node level, it seems ill-equipped to deal with this problem holistically since the 'is the tree currently processing an event' bit of information is inherently a tree-level fact. we could try to augment the existing code with a state that represents 'the runtime is currently handling an event', but that seems somewhat awkward to fit into the existing model since a node would have to walk to the root and then update the whole tree with this new state2.

Proposed Solution

in this PR, the approach taken to solve this is:

  • introduce a new SinkEventHandler type to track tree-level event processing state
  • plumb a new callback closure through the tree down to the ReusableSink instances (which are responsible to forwarding 'external' events into the EventPipe machinery)
    • the new callback takes two parameters: a closure to be immediately invoked if there is no event currently being handled, and a closure to be enqueued and called in the future if there is.
  • the callback implementation checks the current state and either invokes the 'run now' closure (adjusting processing state as appropriate) or enqueues the 'run later' closure.

Alternatives

i also drafted a more minimal change to address the new test case that i added which simulates the 'spinning the run loop during a render pass' problem in #379. instead of changing the plumbing and adding tree-level state tracking, it just changes how the enqueing logic in the ReusableSink implementation works. previously the enqueuing logic would defer handling and unconditionally forward the event through after the async block was processed. after the change, the method becomes fully recursive, so will check whether the original action should be enqueued again. while this approach requires changing less code, it is also less of a 'real' fix, as it won't solve cases in which someone, say, emits a second sink event targeting an ancestor node in the tree while a first sink event is being handled.


Updates

after initial feedback, made the following changes:

  • made the new event handling behavior conditional and added a RuntimeConfig value to opt into it
  • added a queue precondition check into the new ReusableSink event handling logic
  • removed the initializing state from SinkEventHandler in favor of just 'busy' and 'ready'
  • added a mechanism to explicitly enter the busy state while invoking a block so that the WorkflowHost can update the root node and ensure no event handling can synchronously occur during that process
  • added several new test cases (and minor refactoring of existing test utilities)

Footnotes

  1. as an aside, it's reasonable to wonder if we may be allowing nodes to 'linger on' without invalidating their event handling infrastructure appropriately. what happens if a node is rendered in one pass and not rendered in the next? i think in most cases it just deallocates and so implicitly ends up ignoring any subsequent events, but... would probably be good to verify that and formalize what should be happening...

  2. except maybe not the path from node to root? see, it seems kind of awkward...

@jamieQ jamieQ marked this pull request as ready for review September 23, 2025 18:28
@jamieQ jamieQ requested a review from a team as a code owner September 23, 2025 18:28
Comment on lines 363 to 369
if case .pending = eventPipe.validationState {
// Workflow is currently processing an `event`.
// Scheduling it to be processed after.
DispatchQueue.workflowExecution.async { [weak self] in
self?.eventPipe.handle(event: output)
}
return
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to think through if we need this check anymore... the pending state should only have been possible to be in after a node was rendered during a render pass, but before the rendering & output had finished being emitted. i guess it's maybe conceivable that a child node could do something that would trigger some side effect that sent a sink event to a parent node in the pending state... 🤔. i think the 'happy path' case is that it would just be enqueued b/c we'd be re-rendering due to handling an event (and so would hit the 'enqueue' case in the onSinkEvent callback), but there is maybe an edge case to consider where the workflow host updates the root node independently (no event being handled) so maybe we should be a bit more cautious here...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've convinced me that it still makes sense to handle this possible edge case, what's the downside of leaving this? Does the dispatch approach not work with the new paradigm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree, and think leaving something like this (at least for now1) makes sense. i think we do still need to change the logic that gets enqueued – the way the existing code works fails in the 'someone spun the run loop before you finished a render pass' case, because those manual run loop turns can run that enqueued block, but it doesn't recurse into the ReusableSink method itself and re-check the validation state; it just unconditionally forwards through to the event pipe and we hope for the best (which seems to often crash in that edge case).

there's also an API design question here in my mind – who is responsible for making the check? i'm inclined to also move that logic out to the new SinkEventHandler type so the decision making is basically all in one place, but we'll need to pass the node-local state through to do that. it's a little awkward b/c the validation state enum is generic over various things, but we could just pass a isPending flag in the callback i suppose.

Footnotes

  1. it's conceivable to me that much of the node-local state could probably be moved out to the 'tree level', but haven't thought of a compelling reason to work though that at the moment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made a couple changes to address this:

  1. added a new method withEventHandlingSuspended (better names welcome) that can be used by the workflow host to explicitly ensure no synchronous event handlers will be run when updating the root node of the tree
  2. restored the original handling logic & made the new event handling code paths conditional on a runtime configuration, so they will be opt-in and we can enable it via a feature flag

Copy link
Collaborator

@mjohnson12 mjohnson12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great description and refresher on the types involved.
I did remember how event handling worked but this was great overview.
I like the new SinkEventHandler and OnSinkEvent.
This looks good to me.

Copy link

@kcsiegal kcsiegal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am totally new to Workflow's event handling but this seems very reasonable to me, appreciate the detailed context in the PR description and comments.

Comment on lines 363 to 369
if case .pending = eventPipe.validationState {
// Workflow is currently processing an `event`.
// Scheduling it to be processed after.
DispatchQueue.workflowExecution.async { [weak self] in
self?.eventPipe.handle(event: output)
}
return

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've convinced me that it still makes sense to handle this possible edge case, what's the downside of leaving this? Does the dispatch approach not work with the new paradigm?


/// Handles events from 'Sinks' such that runtime-level event handling state is appropriately
/// managed, and attempts to perform reentrant action handling can be detected and dealt with.
final class SinkEventHandler {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this type's API and behavior very intuitive, I like it!


XCTAssertEqual(observedRenderCount, 1)

drainMainQueueBySpinningRunLoop()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helpful!

Copy link
Contributor

@nabs-m nabs-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much experience with Workflow, so some of these concepts are new to me, but thank you for the clear explanation in the description!

@jamieQ jamieQ changed the title [DNM][fix]: add host-level logic for reentrant event handling [fix]: add host-level logic for reentrant event handling Sep 25, 2025
@jamieQ jamieQ requested a review from Copilot September 25, 2025 14:59
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements host-level logic for reentrant event handling in the Workflow runtime by introducing a new SinkEventHandler type to track tree-level event processing state and properly handle reentrant action emissions.

  • Introduces SinkEventHandler to manage tree-level event processing state with ready and busy states
  • Adds runtime configuration flag useSinkEventHandler to conditionally enable the new behavior
  • Plumbs event handling callbacks through the runtime to ReusableSink instances for proper state management

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
WorkflowHost.swift Adds SinkEventHandler implementation and integrates it with the host initialization and update flow
SubtreeManager.swift Updates sink creation to accept and use the new sink event callback for proper event handling
RuntimeConfiguration.swift Adds configuration flag to enable the new sink event handler behavior
SinkEventHandlerTests.swift Comprehensive test coverage for the new SinkEventHandler functionality
WorkflowHostTests.swift Tests for reentrant event scenarios and integration with the new event handler
TestUtilities.swift Moves shared test utilities and adds utility functions for async testing
WorkflowObserverTests.swift Removes duplicate test observer implementation in favor of shared utility

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants