Fix dispatch ordering of wait_until handlers #189

jeffschoner · 2022-08-15T02:50:56Z

@dwillett discovered a bug in the ordering of the evaluation of wait_until handlers that was introduced in #183. The short version is that sometimes workflows end up stuck because a wait_until condition evaluates to false before another handler, such as a signal handler, is executed that would change the condition to true. The wait_until handler should always be called after the handlers for a specific event have been called.

More details of the bug and its behavior can be found in dwillett#3.

Summary of change

wait_until handlers are now called after event-specific handlers, undoing this part of the earlier PR. The behavior of signal handlers that do not specify a name, which also use dispatch target wildcards, has been preserved. Handlers for wait_untils have further separated from the other handlers to make this behavior difference clearer in code.

@chuckremes2 Some specs referencing dispatch_name and accompanying comments on the Dispatcher have been removed while I was in here. As far as I can tell, this is dead code. It appears to be left over from some changes you made at @antstorm's request on #157.

Testing

The dispatcher unit specs have been updated:

bundle exec rspec ./spec/unit/lib/temporal/workflow/dispatcher_spec.rb

The StartWithSignalWorkflow sample has been modified to cover @dwillett's repro of the issue. The sleeps in this workflow have also been replaced with wait_until which minorly improves execution performance. I've confirmed this modified test does indeed fail without the ordering fix in this PR.

This workflow runs as part of this existing integration spec:

cd examples
bundle exec rspec spec/integration/signal_with_start_spec.rb

antstorm

Great find, thank you both @jeffschoner and @dwillett!

The solution makes sense, however I'm a bit worried about exposing the Dispatcher to some of these problems. I wonder if this is telling us that we got the boundaries of the Dispatcher a bit wrong… I would think of it as something that just dispatches the events to handlers without caring about the types of the events or how it is used for the workflow processing

antstorm · 2022-08-18T18:02:30Z

lib/temporal/workflow/dispatcher.rb

+        RegistrationHandle.new(event_handlers[target], @next_id)
+      end
+
+      def register_wait_until_handler(&handler)


I think we should rename this method into something that isn't coupled to the wait_until name since that might change and also there might be other uses of this method in the future. Maybe something like a register_unscoped_handler or register_wildcard_handler?

There are two dimensions to each handler here: the order they run in and which events (ID/target and type) cause them to run. The wildcard aspect applies to both old-style signal handlers and wait_until handlers: they both match to all event targets. When they run is changing with this PR though. Before, the order was prescribed entirely by the order registered. Handlers reacting to a specific event must run first, then wait_until handlers need to run next.

I ended up naming this after the wait_until handlers, because they need this very specific behavior of running after the handlers that target a specific event. I suppose I could introduce a more generic approach here where the caller specifies a "priority" for the handler. When two handlers have the same priority, we would still break ties by using the order they were registered in.

Personally, I find the event-specific and wait_until-specific naming clearer because the ordering behavior is specific to these use cases. Another approach would be to lean into this approach, and move the wait_until handlers out of the dispatcher entirely. The code would be similar to the dispatcher, but much simpler since these do not need to match specific event IDs or types. I think I'd prefer this approach over specifying a priority, because I suspect the code would be easier to follow. I'll add another commit that gives this a try, assuming that it looks good to me otherwise.

jeffschoner · 2022-09-05T21:04:15Z

@antstorm I tried a few different approaches to improving this based on your feedback. I think this one works the best. The dispatcher no longer needs to know about wait_until, but is still able to support running wait_until handlers after the others. It also eliminates the assumption that target-wildcard handlers must always be called at the end. These dimensions can be set independently when a handler is registered.

I used a default method parameter to minimize the number of places I'd need to update the code, but happy to go back and change all the call sites if you'd prefer the AT_BEGINNING and AT_END to be explicit.

antstorm

@jeffschoner this works great, thank you, really appreciate the extra effort! 🙌

* Remove dead code from previous messy merge * Separate wait_until handlers, execute at end * Modify signal_with_start_workflow to cover dwillett's repro * Decouple register_handler from wait_until

jeffschoner added 3 commits August 14, 2022 19:08

Remove dead code from previous messy merge

a3f086b

Separate wait_until handlers, execute at end

453621b

Modify signal_with_start_workflow to cover dwillett's repro

37c452b

antstorm reviewed Aug 18, 2022

View reviewed changes

Decouple register_handler from wait_until

bb8bb4e

jeffschoner requested a review from antstorm September 5, 2022 21:04

antstorm approved these changes Sep 6, 2022

View reviewed changes

DeRauk merged commit bb3f330 into coinbase:master Sep 6, 2022

jeffschoner deleted the dispatch-ordering branch October 5, 2022 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix dispatch ordering of wait_until handlers #189

Fix dispatch ordering of wait_until handlers #189

Uh oh!

jeffschoner commented Aug 15, 2022

Uh oh!

antstorm left a comment

Uh oh!

antstorm Aug 18, 2022

Uh oh!

jeffschoner Aug 20, 2022

Uh oh!

jeffschoner commented Sep 5, 2022

Uh oh!

antstorm left a comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Fix dispatch ordering of wait_until handlers #189

Fix dispatch ordering of wait_until handlers #189

Uh oh!

Conversation

jeffschoner commented Aug 15, 2022

Summary of change

Testing

Uh oh!

antstorm left a comment

Choose a reason for hiding this comment

Uh oh!

antstorm Aug 18, 2022

Choose a reason for hiding this comment

Uh oh!

jeffschoner Aug 20, 2022

Choose a reason for hiding this comment

Uh oh!

jeffschoner commented Sep 5, 2022

Uh oh!

antstorm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants