Skip to content

Conversation

jamieQ
Copy link
Contributor

@jamieQ jamieQ commented Sep 16, 2025

this is a draft of small change we could make to automatically enqueue event handling code that is triggered while a node is already handling an event. i don't think this will handle all cases, but it is a pretty minor change that does seem to resolve at least one that appears to sometimes surface in client code. a more holistic approach is explored in #381

Copy link
Collaborator

@mjohnson12 mjohnson12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree this looks pretty safe to add

jamieQ added a commit that referenced this pull request Oct 6, 2025
#### Terminology

1. `Sink`: public action handler type vended out to client code. these
are the primary channel through which events from the 'outside world'
are sent into a Workflow runtime. is a value type that wraps a (weak)
reference to internal event handling infrastructure.
1. `ReusableSink`: internal action handler type that receives actions
forwarded through from `Sink`s. it is a reference type that is weakly
referenced by `Sink`s.
1. `EventPipe`: internal action handler type that implements an event
handling state machine. it is a reference type that is owned by either a
`ReusableSink` to propagate 'outside' events, or by a `SubtreeManager`
to propagate outputs from child workflows.
1. `SubtreeManager`: type that coordinates interactions between a
workflow node, its children, and the 'outside world'.

#### Background

the current event handling system in Workflow tracks event processing
state locally – each node has a 'subtree manager' which is responsible
for orchestrating interactions with both child workflow outputs and
events from the 'outside world' sent to the corresponding node. each
`SubtreeManager` coordinates a collection of 'event pipes' which handle
event propagation; the 'outside world' has (indirect) access to them via
'sinks', and child workflow's communicate 'up' to their parents through
`EventPipe` instances. these event pipes also encode the state machine
transitions for event handling – if they're sent events in an invalid
state they will generally halt the program.

the `EventPipe` state machine consists of the following states:

1. `preparing`: the pipe has been created, but this state indicates a
call to the corresponding node's `render()` method is still ongoing.
1. `pending`: the corresponding node's `render()` method is complete,
but the full 'render pass' over the entire tree may not yet be.
1. `enabled`: the tree's render pass is finished, and the event pipe is
valid to use. this state has a corresponding event handler closure that
can be invoked when handling an event.
1. `invalid`: the event pipe is no longer valid to use. a node's event
pipes are invalidated as one of the first parts of a render pass.

the currently expected state transitions are:

1. <none> -> `preparing`
    - `preparing` is the initial state
1. `preparing` -> `pending`
    - after a node finishes `render()`
1. `pending` -> `enabled`
    - after the tree finishes a render pass
1. `enabled` -> `invalid`
    - when a node is about to be re-rendered[^1]

[^1]: as an aside, it's reasonable to wonder if we may be allowing nodes
to 'linger on' without invalidating their event handling infrastructure
appropriately. what happens if a node is rendered in one pass and not
rendered in the next? i think in most cases it just deallocates and so
implicitly ends up ignoring any subsequent events, but... would probably
be good to verify that and formalize what should be happening...

the way current event pipe state machine works is:

1. during a render pass, every node gets new event pipes created,
initially in the `preparing` state. any existing event pipes are set to
the `invalid` state.
1. after a node is rendered, the event pipes that were created for that
pass are moved to the `pending` state.
1. after a render pass is 'complete' and any Output and a new Rendering
have been emitted, the nodes in the tree are walked and all event pipes
are moved to the `enabled` state and hooked up with the right callbacks
to invoke when they're sent events.

some additional notes about the current implementation:

1. a number of invalid state transitions are banned outright and will
cause a runtime trap if attempted.
1. the `ReusableSink` type contains logic to attempt to detect certain
forms of reentrancy. specifically it will check if the event pipe state
is `pending` and if it is, will enqueue the forwarding of the event into
the future.
1. there is some limited reentrancy detection implemented when event
pipes in the `invalid` state are reentrantly messaged.

#### Issue

the existing implementation seems to have generally worked reasonably
well in practice, but there are cases when it falls down. in particular,
two problematic cases that have been seen 'in the wild' are:

1. reentrant action emissions from synchronous side effects. perhaps the
'canonical' example of this is when an action is being processed that,
during processing, leads to a change in `UIResponder` state (.e.g
`resignFirstResponder()` is called), and some _other_ part of the
Workflow tree responds to that change by emitting another action.

1. APIs that do manual `RunLoop` spinning, leading to reentrant action
handling. one instance of this we've seen is when the UI layer responds
to a new rendering by deriving an attributed string from HTML via some
the `UIKit` extension methods it can end up spinning the main thread's
run loop waiting for `WebKit` to do HTML processing, and if there are
other Workflow events that have been enqueued they will cause the
runtime to attempt to reentrantly process an event (generally leading to
a `fatalError()`).

as the existing implementation models the event handling state machine
at the node level, it seems ill-equipped to deal with this problem
holistically since the 'is the tree currently processing an event' bit
of information is inherently a tree-level fact. we could try to augment
the existing code with a state that represents 'the runtime is currently
handling an event', but that seems somewhat awkward to fit into the
existing model since a node would have to walk to the root and then
update the whole tree with this new state[^2].

[^2]: except maybe _not_ the path from node to root? see, it seems kind
of awkward...

#### Proposed Solution

in this PR, the approach taken to solve this is:

- introduce a new `SinkEventHandler` type to track tree-level event
processing state
- plumb a new callback closure through the tree down to the
`ReusableSink` instances (which are responsible to forwarding 'external'
events into the `EventPipe` machinery)
- the new callback takes two parameters: a closure to be immediately
invoked if there is no event currently being handled, and a closure to
be enqueued and called in the future if there is.
- the callback implementation checks the current state and either
invokes the 'run now' closure (adjusting processing state as
appropriate) or enqueues the 'run later' closure.

#### Alternatives

i also drafted a more minimal change to address the new test case that i
added which simulates the 'spinning the run loop during a render pass'
problem in #379. instead of
changing the plumbing and adding tree-level state tracking, it just
changes how the enqueing logic in the `ReusableSink` implementation
works. previously the enqueuing logic would defer handling and
unconditionally forward the event through after the async block was
processed. after the change, the method becomes fully recursive, so will
check whether the original action should be enqueued again. while this
approach requires changing less code, it is also less of a 'real' fix,
as it won't solve cases in which someone, say, emits a second sink event
targeting an ancestor node in the tree while a first sink event is being
handled.

---

#### Updates

after initial feedback, made the following changes:

- made the new event handling behavior conditional and added a
`RuntimeConfig` value to opt into it
- added a queue precondition check into the new `ReusableSink` event
handling logic
- removed the `initializing` state from `SinkEventHandler` in favor of
just 'busy' and 'ready'
- added a mechanism to explicitly enter the `busy` state while invoking
a block so that the `WorkflowHost` can update the root node and ensure
no event handling can synchronously occur during that process
- added several new test cases (and minor refactoring of existing test
utilities)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants