-
Notifications
You must be signed in to change notification settings - Fork 107
Fix dispatch ordering of wait_until handlers #189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
antstorm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find, thank you both @jeffschoner and @dwillett!
The solution makes sense, however I'm a bit worried about exposing the Dispatcher to some of these problems. I wonder if this is telling us that we got the boundaries of the Dispatcher a bit wrong… I would think of it as something that just dispatches the events to handlers without caring about the types of the events or how it is used for the workflow processing
lib/temporal/workflow/dispatcher.rb
Outdated
| RegistrationHandle.new(event_handlers[target], @next_id) | ||
| end | ||
|
|
||
| def register_wait_until_handler(&handler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should rename this method into something that isn't coupled to the wait_until name since that might change and also there might be other uses of this method in the future. Maybe something like a register_unscoped_handler or register_wildcard_handler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two dimensions to each handler here: the order they run in and which events (ID/target and type) cause them to run. The wildcard aspect applies to both old-style signal handlers and wait_until handlers: they both match to all event targets. When they run is changing with this PR though. Before, the order was prescribed entirely by the order registered. Handlers reacting to a specific event must run first, then wait_until handlers need to run next.
I ended up naming this after the wait_until handlers, because they need this very specific behavior of running after the handlers that target a specific event. I suppose I could introduce a more generic approach here where the caller specifies a "priority" for the handler. When two handlers have the same priority, we would still break ties by using the order they were registered in.
Personally, I find the event-specific and wait_until-specific naming clearer because the ordering behavior is specific to these use cases. Another approach would be to lean into this approach, and move the wait_until handlers out of the dispatcher entirely. The code would be similar to the dispatcher, but much simpler since these do not need to match specific event IDs or types. I think I'd prefer this approach over specifying a priority, because I suspect the code would be easier to follow. I'll add another commit that gives this a try, assuming that it looks good to me otherwise.
|
@antstorm I tried a few different approaches to improving this based on your feedback. I think this one works the best. The dispatcher no longer needs to know about I used a default method parameter to minimize the number of places I'd need to update the code, but happy to go back and change all the call sites if you'd prefer the |
antstorm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeffschoner this works great, thank you, really appreciate the extra effort! 🙌
* Remove dead code from previous messy merge * Separate wait_until handlers, execute at end * Modify signal_with_start_workflow to cover dwillett's repro * Decouple register_handler from wait_until
@dwillett discovered a bug in the ordering of the evaluation of
wait_untilhandlers that was introduced in #183. The short version is that sometimes workflows end up stuck because await_untilcondition evaluates to false before another handler, such as a signal handler, is executed that would change the condition to true. Thewait_untilhandler should always be called after the handlers for a specific event have been called.More details of the bug and its behavior can be found in dwillett#3.
Summary of change
wait_untilhandlers are now called after event-specific handlers, undoing this part of the earlier PR. The behavior of signal handlers that do not specify a name, which also use dispatch target wildcards, has been preserved. Handlers forwait_untils have further separated from the other handlers to make this behavior difference clearer in code.@chuckremes2 Some specs referencing
dispatch_nameand accompanying comments on theDispatcherhave been removed while I was in here. As far as I can tell, this is dead code. It appears to be left over from some changes you made at @antstorm's request on #157.Testing
The dispatcher unit specs have been updated:
The
StartWithSignalWorkflowsample has been modified to cover @dwillett's repro of the issue. The sleeps in this workflow have also been replaced withwait_untilwhich minorly improves execution performance. I've confirmed this modified test does indeed fail without the ordering fix in this PR.This workflow runs as part of this existing integration spec: