-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mutex not being released on step completion #4832
Comments
I will look |
@sarabala1979 Something I didn't highlight in my initial report was that one/some of the steps are daemon'd steps, I have a feeling that this could be a good line of investigation of this issue. (I haven't confirmed this) |
Possibly related to #4835? |
Let me verify that. I am able to reproduce locally. I am investigating |
@davidcollom I was able to reproduce my old codebase. I found a fix to do |
@sarabala1979 We upgraded to We are in the process of investigating if we can remove the Something that we've noticed and put down to the daemon step/template was that the duration of the workflow continues to increase within the UI, however this issue has been more of a problem than that (will raise as separate issue if we find more info) |
@davidcollom I have a fix. But I couldn't reproduce on my local. I build docker image |
I can take a look first thing Monday and see if we find / have this issue and report back after a few days? Thanks for taking a look, it is greatly appreciated! |
@sarabala1979 I've deployed this release to the workflow-controller only in our cluster, will see how things get on over the next few days and report back. |
@sarabala1979 We had another instance of the locks not being released this morning I'm afraid. Unfortunately, I'm not able to provide examples without spending some considerable time to redact the output, I'll spend some time this morning to see if the example can reproduce the same issue. |
If its of any use, the following was present in the status:
with the workflow being Workflow controller logs referencing this mutext:
The second, line is after restarting the workflow controller. |
Is this instance using my eng build? Can you provide the workflow controller lock for this workflow? |
Yes, we're using your eng build. |
@davidcollom ok. can I get the entire workflow controller log for this workflow? |
@sarabala1979 Logs have been updated in my gist @ https://gist.github.com/davidcollom/9c6f7d9d1819fe922d21b3a69e561754#file-workflow-controller-logs-12-01-2021-txt |
@davidcollom I have added below extra log statement on eng build. Can you include it on your log.
|
@sarabala1979 Is this in a new release/image? as I'll need to restart the pod, I've not removed any lines in the previous log, only redacted infromation. |
@davidcollom it is only in my eng build. Just I want to debug who is waiting and who is holding the lock. |
Yes, confirmed we're using your eng build. |
@sarabala1979 As discussed in slack, the latest Eng Build shows some good signs of this being resolved. The example workflow provided initially (which became stuck) completed from end to end without introversion from myself or the team. We will continue running this build for a few days to ensure our production workflows don't show previous symptoms or any other issues ahead and report back (Monday?) with any findings. |
@sarabala1979 Sorry for not getting back to you yesterday. We've had no issues over the last few days and not had to restart the |
@davidcollom Thanks for your update and help. PR is already in review. It will be released next upcoming release |
Summary
When using a variable within the mutex name, Mutex's aren't released until workflow-controller is restarted.
Within the example below, the workflow never completes and waits pending on the final unlock.
Additionally all other workflows aren't able to continue from the initial mutex locked (from
gen-number-list
).Restarting the workflow-controller brings the workflow to life and all locks are released.
Diagnostics
GKE -
v1.17.12-gke.1504
Argo version:
v2.12.3
Workflow: https://gist.github.com/davidcollom/9c6f7d9d1819fe922d21b3a69e561754#file-workflow-yaml
Logs: https://gist.github.com/davidcollom/9c6f7d9d1819fe922d21b3a69e561754#file-logs
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: