[Kubernetes] Release leader lock on shutdown for faster failover #21952

ChrsMark · 2020-10-19T08:09:19Z

Follow up issue to keep track of know issue we hit when trying to release the leader lock on shut down:

Quoting @jsoriano who observed the issue

Killing the pod with the leadership seems to hang for some time, and it logs:

E1015 17:07:38.138511       1 leaderelection.go:296] Failed to release lock: Lease.coordination.k8s.io "metricbeat-cluster-leader" is invalid: spec.leaseDurationSeconds: Invalid value: 0: must be greater than 0
It doesn't seem serious because other pod correctly takes the leadership after some seconds.

I1015 17:07:54.577547       1 leaderelection.go:252] successfully acquired lease default/metricbeat-cluster-leader
But i wonder if the failover could be faster if the original leader succeeded releasing the lock.

Commented at #21896 (comment)
Related to kubernetes/client-go#762 kubernetes/client-go#754 kubernetes/kubernetes#85474

Update: should be fixed by kubernetes/kubernetes#80954

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-10-19T08:09:23Z

Pinging @elastic/integrations-platforms (Team:Platforms)

jsoriano · 2020-10-19T11:59:36Z

Options I see to solve this:

Fork and contribute a fix in go-client. I think the problem is in the structure used to encode the JSON used for the lock release request, LeaseDurationSeconds is an int there, but it should be a pointer to int so it can be omitted when not needed. In the server side this value is optional.
Try to send the request to release the lock using the REST API directly. This would avoid the need to maintain a fork. We could consider contributing a fix in any case.

ChrsMark · 2020-10-27T08:04:11Z

Should be fixed in upstream in 1.20 after kubernetes/kubernetes#80954.

jsoriano · 2020-10-27T08:42:50Z

Should be fixed in upstream in 1.20 after kubernetes/kubernetes#80954.

Great! Let's update go-client when a new version is available.

ChrsMark added bug [zube]: Inbox Team:Platforms Label for the Integrations - Platforms team labels Oct 19, 2020

ChrsMark mentioned this issue Oct 19, 2020

Kubernetes leaderelection improvements #21896

Merged

ChrsMark self-assigned this Oct 26, 2020

ChrsMark mentioned this issue Dec 4, 2020

Update k8s client and release k8s lock gracefully #22919

Merged

ChrsMark closed this as completed in #22919 Dec 9, 2020

ChrsMark mentioned this issue Dec 9, 2020

Cherry-pick #22919 to 7.x: Update k8s client and release k8s lock gracefully #23013

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kubernetes] Release leader lock on shutdown for faster failover #21952

[Kubernetes] Release leader lock on shutdown for faster failover #21952

ChrsMark commented Oct 19, 2020 •

edited

Loading

elasticmachine commented Oct 19, 2020

jsoriano commented Oct 19, 2020

ChrsMark commented Oct 27, 2020

jsoriano commented Oct 27, 2020

[Kubernetes] Release leader lock on shutdown for faster failover #21952

[Kubernetes] Release leader lock on shutdown for faster failover #21952

Comments

ChrsMark commented Oct 19, 2020 • edited Loading

elasticmachine commented Oct 19, 2020

jsoriano commented Oct 19, 2020

ChrsMark commented Oct 27, 2020

jsoriano commented Oct 27, 2020

ChrsMark commented Oct 19, 2020 •

edited

Loading