Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad interaction between pod eviction and leader lock #265

Closed
joel-bluedata opened this issue Feb 7, 2020 · 5 comments
Closed

bad interaction between pod eviction and leader lock #265

joel-bluedata opened this issue Feb 7, 2020 · 5 comments

Comments

@joel-bluedata
Copy link
Member

operator-framework/operator-sdk#1305
operator-framework/operator-sdk#1874

We've hit this.

Fixed by operator-framework/operator-sdk#2210

We won't get this until operator SDK v0.13. Until then I think we will just have to comment out the invocation of the leader mechanism in main.go.

@joel-bluedata
Copy link
Member Author

Alternatively, the patch is simple enough that perhaps we could backport it into a fork of "our" version of the SDK.

But I'm not sure how much it buys us. The leader lock itself is just operating via K8s mechanisms and state (a configmap), so I don't think it helps in the case where there's a single deployment. It does protect against multiple deployments, which is nice but maybe not super compelling.

@joel-bluedata
Copy link
Member Author

Note that they've also previously had deadlock between leader election and a readiness probe: operator-framework/operator-sdk#920

We propose a liveness probe and not a readiness probe, so that could be fine, but certainly something to test (if we're not going to just comment out the leader election).

joel-bluedata added a commit to joel-bluedata/kubedirector that referenced this issue Feb 11, 2020
@joel-bluedata
Copy link
Member Author

For the 0.4.0 release we are just not doing leader election.

Removing the milestone on this one for now. We'll naturally get a fix if/when we update the SDK to v0.13, but maybe we'll also be compelled to try some fix for it sooner.

@joel-bluedata joel-bluedata removed this from the 0.4.0 milestone Feb 12, 2020
@joel-bluedata
Copy link
Member Author

This is still problematic even with the new operator code. Pod is not ever getting evicted when the node is turned off, undoubtedly because my test cluster is too small (less than large-cluster-size-threshold).

So hmm. I'll experiment with the leader-with-lease approach.

joel-bluedata added a commit to joel-bluedata/kubedirector that referenced this issue Mar 10, 2020
@joel-bluedata
Copy link
Member Author

Leader-with-lease looks good. Will close this once #282 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant