-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Controller not updating workflow for 10+ mins #1416
Comments
I believe this bug is complaining about this time frame:
Based on the logs, the controller is not seeing any events from the watch. If the controller isn't getting events, it doesn't process the workflow and falls back to the resync interval to process workflows (it is currently hard-wired to 20 minutes). I suspect that the controller lost its watch to K8s API server at the time frame. Its currently not possible to adjust the resync period of the workflow controller , but we could expose this as a CLI flag so that if this is happening frequently in your environment, you can reduce this so that the controller will reprocess all workflows at a greater frequency to handle situations like yours. |
Thanks for looking into it. That explanation makes sense given our usage. This a self-managed cluster that we've been scaling up and down pretty aggressively; so if the control plane being overloaded would cause argo to experience this, that fits. That flag could be useful in the future, but I'll continue to give it a go with our since scaled-up control plane instances and if that doesn't fix it, I'll reopen this. |
Is this a BUG REPORT or FEATURE REQUEST?:
BUG REPORT
What happened:
My argo workflow is "Running" but the pod it's waiting on has "Completed". From looking at the logs below, it seems like the controller is not updating the workflow for more than 10 minutes.
What you expected to happen:
That the Argo workflow proceeds past the pod/step it's waiting for soon after that pod/step has completed.
How to reproduce it (as minimally and precisely as possible):
I haven't found a minimum reproducable example yet unfortunately, but I can try things if you have ideas.
Anything else we need to know?:
Environment:
Other debugging information (if applicable):
The text was updated successfully, but these errors were encountered: