-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically find failed connection manager workflows and restart them #14043
Comments
If going with the Micronaut approach, we probably want to create a new service (i.e. new kube pod) so that this ticket does not also include the work to convert an existing service into Micronaut |
Moving this one back to the backlog, since I have pulled some of the work here out into the other tickets that I linked in the description, e.g. this one which should be completed before this ticket is taken on: #14970 |
This story is pending Micronaut scheduler for the "automatic" part |
What
With the plan to automatically set connection manager workflows to failed when they encounter a NonDeterministicException (as described in the parent epic and in this issue), we need something that will automatically repair these failed workflows.
The goal of this issue is to implement a background process that runs as part of the platform, which finds failed connection manager workflows and terminates + restarts them automatically.
How
This can be implemented as a cron in a standalone pod, or a background process/thread in an existing pod. Micronaut's scheduled tasks may be a good candidate here, so that we do not need to manage the scheduling ourselves.
This implementation should utilize the TemporalClient method introduced in #14970 to perform the find+restart logic.
This process should emit a metric whenever it restarts a workflow, similar to those established in #13773
The text was updated successfully, but these errors were encountered: