Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown pod do not release the configmap lock #3339

Closed
matthewygf opened this issue Jul 3, 2020 · 1 comment
Closed

Unknown pod do not release the configmap lock #3339

matthewygf opened this issue Jul 3, 2020 · 1 comment
Assignees
Labels
triage/support Indicates an issue that is a support question.

Comments

@matthewygf
Copy link

Bug Report

This is similar to previous issues where an evicted pod that was not deleted can causes a newly spin up pod waiting for the lock #1874
We encounter an issue where a pod state becomes unknown due to a crashed node and a newly spin up pod cannot obtain the lock.
While we agree the node issue should be handled by a system administrator, the new operator pod should be able to become the leader.

What did you do?

A crashed node caused the operator pod state become unknown.
A new pod spinned up.
What did you expect to see?

New pod should become the leader
What did you see instead? Under which circumstances?
New pod cannot acquire the lock

Environment

  • operator-sdk version:
    v0.11.0
  • go version:
    1.13
  • Kubernetes version information:
    16
  • Kubernetes cluster kind:

  • Are you writing your operator in ansible, helm, or go?

Possible Solution

Add an additional check to the leader package for pod state is unknown and its last transition time was a X time-window ago

Additional context
Add any other context about the problem here.

@camilamacedo86
Copy link
Contributor

Hi @matthewygf,

You are using a very old version. See that the fix for the issue points out by you #1874 was released in 0.13: https://github.com/operator-framework/operator-sdk/blob/master/CHANGELOG.md#v0130 and you are using 0.11. So, the solution here is to upgrade your project to use the newer releases with the fixes and solve the tech debts.

Could you please follow up the migration guide and keep your project updated using the latest version of SDK which is 0.18.2?

Closing this as duplicated of #1874 and sorted out.

@camilamacedo86 camilamacedo86 self-assigned this Jul 3, 2020
@camilamacedo86 camilamacedo86 added the triage/support Indicates an issue that is a support question. label Jul 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/support Indicates an issue that is a support question.
Projects
None yet
Development

No branches or pull requests

2 participants