Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kubernetes] Release leader lock on shutdown for faster failover #21952

Closed
ChrsMark opened this issue Oct 19, 2020 · 4 comments · Fixed by #22919
Closed

[Kubernetes] Release leader lock on shutdown for faster failover #21952

ChrsMark opened this issue Oct 19, 2020 · 4 comments · Fixed by #22919
Assignees
Labels
bug Team:Platforms Label for the Integrations - Platforms team [zube]: Inbox

Comments

@ChrsMark
Copy link
Member

ChrsMark commented Oct 19, 2020

Follow up issue to keep track of know issue we hit when trying to release the leader lock on shut down:

Quoting @jsoriano who observed the issue

Killing the pod with the leadership seems to hang for some time, and it logs:

E1015 17:07:38.138511       1 leaderelection.go:296] Failed to release lock: Lease.coordination.k8s.io "metricbeat-cluster-leader" is invalid: spec.leaseDurationSeconds: Invalid value: 0: must be greater than 0
It doesn't seem serious because other pod correctly takes the leadership after some seconds.

I1015 17:07:54.577547       1 leaderelection.go:252] successfully acquired lease default/metricbeat-cluster-leader
But i wonder if the failover could be faster if the original leader succeeded releasing the lock.

Commented at #21896 (comment)
Related to kubernetes/client-go#762 kubernetes/client-go#754 kubernetes/kubernetes#85474

Update: should be fixed by kubernetes/kubernetes#80954

@ChrsMark ChrsMark added bug [zube]: Inbox Team:Platforms Label for the Integrations - Platforms team labels Oct 19, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@jsoriano
Copy link
Member

Options I see to solve this:

  • Fork and contribute a fix in go-client. I think the problem is in the structure used to encode the JSON used for the lock release request, LeaseDurationSeconds is an int there, but it should be a pointer to int so it can be omitted when not needed. In the server side this value is optional.
  • Try to send the request to release the lock using the REST API directly. This would avoid the need to maintain a fork. We could consider contributing a fix in any case.

@ChrsMark ChrsMark self-assigned this Oct 26, 2020
@ChrsMark
Copy link
Member Author

Should be fixed in upstream in 1.20 after kubernetes/kubernetes#80954.

@jsoriano
Copy link
Member

Should be fixed in upstream in 1.20 after kubernetes/kubernetes#80954.

Great! Let's update go-client when a new version is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Platforms Label for the Integrations - Platforms team [zube]: Inbox
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants