Description
The controller-runtime
uses the resourcelock.New
function for configuring leader election. This function is known to have an incorrectly configured request timeout, which sets the request timeout equal to the leader election deadline. This allows a single request timeout to trigger a change in leadership.
Source:
controller-runtime/pkg/leaderelection/leader_election.go
Lines 101 to 109 in 8e44a43
Impact:
This issue causes unnecessary leader changes, which can cause:
- Lower availability - new leader might require reinitialization of informers which can take tens of seconds in large clusters.
- Waste of resources - Increased API server load due to concurrent re-initializations, potentially triggering a KCP scale-up and wasting resources.
Fix:
Update controller-runtime to use resourcelock.NewFromKubeconfig
for leader election. This will ensure that the request timeout is correctly configured and prevent unnecessary leadership changes due to transient network issues or API server unavailability. This change should involve approximately 10 lines of code.
Example: