📖 Add a design for supporting warm replicas. #3121

godwinpang · 2025-02-18T12:02:13Z

This change describes the motivation and implementation details for supporting warm replicas in controller-runtime. I have floated this idea offline with @alvaroaleman to address some really slow sources that we work with that take 10s of minutes to serve the initial list.There is no open issue discussing it. Let me know if that is preferred and I can open one.

Previously discussed in #2005 and #2600

k8s-ci-robot · 2025-02-18T12:02:22Z

Welcome @godwinpang!

It looks like this is your first PR to kubernetes-sigs/controller-runtime 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/controller-runtime has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-02-18T12:02:22Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: godwinpang
Once this PR has been reviewed and has the lgtm label, please assign vincepri for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-02-18T12:02:23Z

Hi @godwinpang. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

alvaroaleman · 2025-02-18T20:58:30Z

designs/warmreplicas.md

+behavior when manager is not in leader mode. This interface should be as follows:
+```go
+type WarmupRunnable interface {
+    NeedWarmup() bool


Why add a boolean marker on top of the actual interface?

alvaroaleman · 2025-02-18T20:58:46Z

designs/warmreplicas.md

+}
+
+// GetWarmupRunnable implements WarmupRunnable
+func (c *Controller[request]) GetWarmupRunnable() Runnable {


Why return a runnable rather than just have a synchronous Warmup()? Runnable is expected to keep running, this is expected to terminate. There is also a timeout for this in the controller which we should re-use here

alvaroaleman · 2025-02-19T01:26:56Z

designs/warmreplicas.md

+
+## Concerns/Questions
+1. Controllers opted into this feature will break the workqueue.depth metric as the controller will
+   have a pre filled queue before it starts processing items.


As discussed offline, one way to avoid this is to use a metrics wrapper that supresses them until the leader election is won. But not sure if its worth bothering

Does this actually break the metric? Sounds like the metric will just show the reality

It might break alerts that assume the queue length should be pretty low, but that's an issue of the alerts.

It might break alerts that assume the queue length should be pretty low, but that's an issue of the alerts.

Not sure I quite agree with that. The alerts work today, if we change the behavior here, we break them. To my knowledge there also isn't a metric that indicates if a given replica is the leader, so I don't even see a good way to unbreak them

Yeah, but the current definition of the metric is that it should show the length of the queue

To my knowledge there also isn't a metric that indicates if a given replica is the leader

That could be solved, I hope :)

Yeah, but the current definition of the metric is that it should show the length of the queue

Be that as it may, the reality is that there are a lot of ppl that use controller-runtime so Hyrums Law applies - That we always have an empty workqueue when not leader is observable behavior and changing that will break ppl

Then we maybe shouldn't store the items in the queue at this time because that's observable behavior as well (not only through the metric) and not just make it look like the queue is empty through the metric

Then we maybe shouldn't store the items in the queue at this time because that's observable behavior as well (not only through the metric) and not just make it look like the queue is empty through the metric

How would that be obseveable except for through the metric if we don't start the controller?

logs via logState

other workqueue metrics: adds_total, queue_duration_seconds

Although I guess we can also fake these. What would happen when the controller starts? I assume we would set the length metric immediately to it's correct value. Similar for adds_total and probably also queue_duration_seconds

I also think folks have programmatic access to the queue (at least by instantiating the queue in controller.Options.NewQueue)

So we don't know what kind of things folks are doing with the queue, e.g. accessing queue length or taking items out of the queue even if the leader election controllers are not running.

alvaroaleman · 2025-02-19T01:27:43Z

designs/warmreplicas.md

+## Concerns/Questions
+1. Controllers opted into this feature will break the workqueue.depth metric as the controller will
+   have a pre filled queue before it starts processing items.
+2. Ideally, non-leader runnables should block readyz and healthz checks until they are in sync. I am


This will break conversion webhooks. I don't know if there is a good way to figure out if the binary contains a conversion webhook, but if in doubt we have to retain the current behavior

alvaroaleman · 2025-02-19T01:30:08Z

designs/warmreplicas.md

+2. Ideally, non-leader runnables should block readyz and healthz checks until they are in sync. I am
+   not sure what the best way of implementing this is, because we would have to add a healthz check
+   that blocks on WaitForSync for all the sources started as part of the non-leader runnables.
+3. An alternative way of implementing the above is to moving the source starting / management code


That is kind-of already the case, the source uses the cache which is a runnable to get the actual informer and the cache is shared and started before anything else except conversion webhooks (as conversion webhooks might be needed to start it). The problem is just that the cache does not actually cache anything for resources for which no informer was requested and that in turn only happens after the source is started which happens post controller start and thus post leader election

sbueringer · 2025-02-19T07:59:56Z

designs/warmreplicas.md

+
+## Motivation
+Controllers reconcile all objects during startup / leader election failover to account for changes
+in the reconciliation logic. For certain sources, the time to serve the initial list can be


Just curious. Is it literally the list call that takes minutes or the subsequent reconciliation?

Does this change when the new ListWatch feature is used?

This is purely about the time it takes to start reconciling after a replica went down because of a rollout or an unforseen event. Right now, that means we first acquire the leader lease, then sync all caches, then start reconciling. The goal of this doc is to do the second step before we even try to aquire the leader lease, as that will take the time it takes to sync the caches out of the transition time.
Agree the description could be a bit clearer.

Yeah that's fine. I was just curious about the list call :) (aka the cache sync)

sbueringer · 2025-02-19T08:04:44Z

designs/warmreplicas.md

+downtime as even after leader election, the controller has to wait for the initial list to be served
+before it can start reconciling.
+
+## Proposal


What if there's a lot of churn and the controller doesn't become leader maybe not at all or maybe after a few days?

The queue length will increase while there is nothing that takes items out of the queue.

I know the queue doesn't require significant memory to store an item but is there something we should be concerned about if the queue has eg millions of items (let's say we watch pods and we don't become leader for a month)

Worst case it at some point gets oom killed, restarts, does everything again, I don't think this is likely to become an actual issue

Yeah definitely not a big problem. Maybe just good to know

sbueringer · 2025-02-19T08:09:44Z

designs/warmreplicas.md

+func (cm *controllerManager) Start(ctx context.Context) (err error) {
+    // ...
+
+    // Start the warmup runnables


At which exact place in the current Start func would this go? (what do we do directly before/after)

Initially I thought before we start the caches but that would mean that we can not respect the startup timeout in the controller so I guess it has to come after

sbueringer · 2025-02-19T08:10:58Z

designs/warmreplicas.md

+
+## Concerns/Questions
+1. Controllers opted into this feature will break the workqueue.depth metric as the controller will
+   have a pre filled queue before it starts processing items.


Does this actually break the metric? Sounds like the metric will just show the reality

It might break alerts that assume the queue length should be pretty low, but that's an issue of the alerts.

📖 Add a design for supporting warm replicas.

002e3d8

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 18, 2025

k8s-ci-robot requested review from joelanford and vincepri February 18, 2025 12:02

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 18, 2025

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 18, 2025

alvaroaleman reviewed Feb 19, 2025

View reviewed changes

sbueringer reviewed Feb 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📖 Add a design for supporting warm replicas. #3121

📖 Add a design for supporting warm replicas. #3121

godwinpang commented Feb 18, 2025

k8s-ci-robot commented Feb 18, 2025

k8s-ci-robot commented Feb 18, 2025

k8s-ci-robot commented Feb 18, 2025

alvaroaleman Feb 18, 2025

alvaroaleman Feb 18, 2025

alvaroaleman Feb 19, 2025

sbueringer Feb 19, 2025

alvaroaleman Feb 19, 2025

sbueringer Feb 19, 2025

sbueringer Feb 19, 2025

alvaroaleman Feb 19, 2025

sbueringer Feb 20, 2025 •

edited

Loading

alvaroaleman Feb 20, 2025

sbueringer Feb 21, 2025

alvaroaleman Feb 19, 2025

alvaroaleman Feb 19, 2025

sbueringer Feb 19, 2025

alvaroaleman Feb 19, 2025

sbueringer Feb 19, 2025 •

edited

Loading

sbueringer Feb 19, 2025

alvaroaleman Feb 19, 2025

sbueringer Feb 19, 2025

sbueringer Feb 19, 2025

alvaroaleman Feb 19, 2025

sbueringer Feb 19, 2025

📖 Add a design for supporting warm replicas. #3121

Are you sure you want to change the base?

📖 Add a design for supporting warm replicas. #3121

Conversation

godwinpang commented Feb 18, 2025

k8s-ci-robot commented Feb 18, 2025

k8s-ci-robot commented Feb 18, 2025

k8s-ci-robot commented Feb 18, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer Feb 20, 2025 •

edited

Loading

sbueringer Feb 19, 2025 •

edited

Loading