-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock on operator update using helm #920
Comments
I was able to recreate this without using Helm just by modifying and re-applying It looks like the issue is that the pod doesn't go to ready until it has become the leader. When the deployment update is rolling out, it waits until the new pods are ready before terminating the old ones. Since the old pod maintains the lock, the new pod can't become the leader and therefore never becomes ready -- hence the deadlock. @mhrivnak Is there a reason the call to |
We only added the readiness check as a way to communicate that the pod has become the leader, primarily because it was convenient for automatically introspecting who is the leader in automated tests. But it sounds like that's causing this problem, so I would just remove the readiness probe entirely from operator.yaml and stop calling |
Does the deadlock happen with liveness probe as well? I tested and it appears to work fine with liveness probe, but just wanted to confirm.. |
Bug Report
What did you do?
helm update
What did you expect to see?
Old pod release the lock and the updated pod acquire the lock
What did you see instead? Under which circumstances?
It results a deadlock, old pod is still running, new pod tries to start and it cannot acquire a lock which is held by the old pod
Environment
operator-sdk version: 0.3.0
Kubernetes version information: 1.11
Kubernetes cluster kind: Amazon EKS
Are you writing your operator in ansible, helm, or go?
Go
The text was updated successfully, but these errors were encountered: