Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zookeeper Operator upgrade fails because of readiness check in deploy file #101

Closed
pbelgundi opened this issue Nov 20, 2019 · 1 comment · Fixed by #152
Closed

Zookeeper Operator upgrade fails because of readiness check in deploy file #101

pbelgundi opened this issue Nov 20, 2019 · 1 comment · Fixed by #152
Assignees
Labels
area/upgrade zookeeper operator upgrade or zookeeper upgrade Priority-P1

Comments

@pbelgundi
Copy link
Contributor

Tried upgrading zookeeper-operator from version 0.2.4 to 0.2.5-rc0 and upgrade got stuck when this readiness check for specified in the operator manifest:

 readinessProbe:
            exec:
              command:
                - stat
                - /tmp/operator-sdk-ready
            initialDelaySeconds: 4
            periodSeconds: 10
            failureThreshold: 1

On removing this check, initial deployment and upgrade both completed successfully.
This check needs to be removed/corrected in operator deployment manifest

@RaulGracia RaulGracia changed the title Zookeeper Operator upgrade fails becuase of readiness check in deploy file Zookeeper Operator upgrade fails because of readiness check in deploy file Nov 20, 2019
@pbelgundi pbelgundi assigned anishakj and unassigned Prabhaker24 Mar 26, 2020
@pbelgundi pbelgundi added area/upgrade zookeeper operator upgrade or zookeeper upgrade Priority-P1 labels Mar 26, 2020
@anishakj
Copy link
Contributor

In the case of rolling update of zookeeper-operator, a deadlock occurs that prevents the new pod to become the leader.
Rolling update works as follows:

a. A new Pod is created
b. A new Pod tries to be the leader with call leader.Become function.
c. But the new Pod keeps waiting, as the old Pod is now the leader
d. If a new Pod is not the leader, file /tmp/operator-sdk-ready is will be not created
e. ReadinessProbe will not succeed without file /tmp/operator-sdk-ready
f. This will create a deadlock and rolling updates get stopped.

This can be fixed by removing the readiness probe check from the operator. More details about this can be found at operator-framework/operator-sdk#932

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/upgrade zookeeper operator upgrade or zookeeper upgrade Priority-P1
Projects
None yet
3 participants