Skip to content

Fix leader election request timeout #3027

Closed
@serathius

Description

@serathius

The controller-runtime uses the resourcelock.New function for configuring leader election. This function is known to have an incorrectly configured request timeout, which sets the request timeout equal to the leader election deadline. This allows a single request timeout to trigger a change in leadership.

Source:

return resourcelock.New(options.LeaderElectionResourceLock,
options.LeaderElectionNamespace,
options.LeaderElectionID,
corev1Client,
coordinationClient,
resourcelock.ResourceLockConfig{
Identity: id,
EventRecorder: recorderProvider.GetEventRecorderFor(id),
})

Impact:

This issue causes unnecessary leader changes, which can cause:

  • Lower availability - new leader might require reinitialization of informers which can take tens of seconds in large clusters.
  • Waste of resources - Increased API server load due to concurrent re-initializations, potentially triggering a KCP scale-up and wasting resources.

Fix:

Update controller-runtime to use resourcelock.NewFromKubeconfig for leader election. This will ensure that the request timeout is correctly configured and prevent unnecessary leadership changes due to transient network issues or API server unavailability. This change should involve approximately 10 lines of code.

Example:

kubernetes/kubernetes#98059

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions