📖 Add designs/multi-cluster.md #2746

sttts · 2024-03-31T13:38:14Z

Controller-runtime today allows to write controllers against one cluster only.
Multi-cluster use-cases require the creation of multiple managers and/or cluster
objects. This proposal is about adding native support for multi-cluster use-cases
to controller-runtime.

The proposed changes are prototyped in #3019.

Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>

alvaroaleman · 2024-03-31T14:22:31Z

designs/multi-cluster.md

+}
+
+// pkg/handler
+type DeepCopyableEventHandler interface {


The eventhandlers are stateless, why do we need the deepcopy for them?

Looking at the propotype. I think this is because EventHandler then would store the Cluster (it is using that info to set the ClusterName field in the request)

This is gone now in #2726.

Will update the design here.

With the BYO request/eventhandler changes in #3019, I brought this back after remembering it was mentioned in the proposal. The previous version of my prototype had a weird second layer of event handler that was wrapping the actual event wrapper and was using the event object to communicate the cluster name in. That felt all kinds of weird.

Because we now have BYO EventHandlers, it's possible that they are not entirely stateless (as @sbueringer pointed out, some event handlers might have to store the cluster name in absence of any information on the event object itself). So I think this approach is the most clean, to be honest. It's entirely optional in #3019, existing EventHandlers don't need to be changed.

alvaroaleman · 2024-03-31T14:31:07Z

designs/multi-cluster.md

+	Disengage(context.Context, Cluster) error
+}
+```
+In particular, controllers implement the `AwareRunnable` interface. They react


Rather than changing the controller type directly and requiring all its dependencies to known how to deepcopy themselves, how about having something like a controllerconstructor (name tbd) in between that is filled with a []watchConstructor{source func(Cluster) source.Source, handler func(Cluster) handler.Handler, predicate func(cluster) []predicate.Predicate}?

I think this would require more invasive changes to our public API (the Controller interface)

No, you can call Watch on an existing controller. The idea is to not let the Controller or its dependencies have any knowledge about this but instead have a thing on top of the Controller that is configured with constructors that take a cluster.Cluster and return a source/predicate/handler and then uses those to call Watch when a new cluster appears.

When one disappears, it would cancel the context on the Source.

The idea really is the opposite, I do not want the Controller to know how to extend itself like this, IMHO this is a higher-level abstraction.

Compare #2726 after latest push. I have implemented @alvaroaleman's idea via a MultiClusterController wrapper implementing cluster.AwareRunnable and just calling Watch on the actual controller. All the deepcopy'ing is gone 🎉 Much nicer IMO. @alvaroaleman great intuition!

alvaroaleman · 2024-03-31T14:33:58Z

designs/multi-cluster.md

+// pkg/cluster
+type Provider interface {
+   Get(ctx context.Context, clusterName string, opts ...Option) (Cluster, error)
+   List(ctx context.Context) ([]string, error)


Why return []string here rather than []Cluster?

+1 Would be good for consistency with the Get func

There is a misunderstanding of the interface. The getter is actually the constructor. The life-cycle of the returned clusters is owned by the manager (they are added as runnables). Hence, the List returns names, not clusters. We should rather rename Get to Create or Connect.

alvaroaleman · 2024-03-31T14:37:16Z

designs/multi-cluster.md

+}
+```
+
+The `ctrl.Manager` will use the provider to watch clusters coming and going, and


I'll have to think about if and how this is doable, but ideally the "thing that comes and goes" wouldn't be typed to cluster.Cluster but can be anything, so this mechanism can also be used if folks have sources that are not kube watches

Would this be mostly about a more generic name? (can't think of much that would work, maybe something like scope)

designs/multi-cluster.md

elmiko

i think this is an interesting idea and i could see using it, i just have a question about some of the mechanics.

for context, i am investigating a cluster-api provider for karpenter and it would be nice to have the controllers discriminate between objects in the management cluster and objects in the workload clusters.

elmiko · 2024-04-08T17:29:42Z

designs/multi-cluster.md

+### Examples
+
+- Run a controller-runtime controller against a kubeconfig with arbitrary many contexts, all being reconciled.
+- Run a controller-runtime controller against cluster-managers like kind, Cluster-API, Open-Cluster-Manager or Hypershift.


given the cluster-api example here, is the intention that controllers will be able to reconcile CRDs in clusters that they know about that may only exist in a subset of clusters (e.g. Machine objects in the management cluster but not in the workload cluster) ?

Good point. I think that has to be possible. Otherwise we need all resources that we watch in all clusters

(especially good point because today a controller crashes if a resource doesn't exist)

EDIT: Further down:

For example, it can well be that every cluster has different REST mapping because installed CRDs are different. Without a context, we cannot return the right REST mapper.

Good point. Question is whether one would rather group them in managers such that every manager has a uniform set of clusters.

See my updated PR #2726. You can now opt into provider and/or the default cluster per controller via options:

// EngageWithDefaultCluster indicates whether the controller should engage // with the default cluster of a manager. This defaults to false through the // global controller options of the manager if a cluster provider is set, // and to true otherwise. Here it can be overridden. EngageWithDefaultCluster *bool // EngageWithProvidedClusters indicates whether the controller should engage // with the provided clusters of a manager. This defaults to true through the // global controller options of the manager if a cluster provider is set, // and to false otherwise. Here it can be overridden. EngageWithProviderClusters *bool

There is no logic yet for a controller to decide whether to engage with a provider cluster or not. Now it's with all of them. If the setup is more diverse, we might want such a functionality, e.g. some kind of pre-check: ctrl.WantsToEngage(ctx, cluster) bool`.

i'm still understanding the changes in #2726, but i think what you are saying here makes sense to me and would solve the issue.

some kind of pre-check: ctrl.WantsToEngage(ctx, cluster) bool`.

+1, i think we definitely need some way for the client user to specify when it should check a specific cluster for a resource.

I somehow think it should be the author's and managers responsibility (for now) to group them into groups which are working with the pattern. At this point, we don't know what we don't know. Once this is released, we can gather some feedback on edge cases and take it from there. I suspect the majority of use cases will be still single cluster reconcile loops.

Maybe document this edge case and mark this feature overall as experimental? This way we not committing to full production level stability, and allow to gather more feedback?

designs/multi-cluster.md

sttts · 2024-05-28T13:05:45Z

For those reading, this is currently a little outdated. #2726 has a changed design proposed by @alvaroaleman. Will come back soon to both PRs.

k8s-triage-robot · 2024-08-26T13:36:34Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-09-25T14:27:24Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-10-25T15:12:48Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2024-10-25T15:12:54Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

embik · 2024-10-28T11:20:28Z

/reopen

We'd like to continue working on this, time is simply a bit scarce at the moment.

k8s-ci-robot · 2024-10-28T11:20:34Z

@embik: Reopened this PR.

In response to this:

/reopen

We'd like to continue working on this, time is simply a bit scarce at the moment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Gomaya · 2024-11-13T06:13:58Z

Hi, could you please share the future plans for this feature? Thank you!
@embik

embik · 2024-11-13T08:16:28Z

@Gomaya I'm working on a prototype that attempts to address the review comments in #2726. Once everyone is back from KubeCon, I plan to run this by everyone involved and try to move the feature forward.

embik · 2024-12-03T11:42:59Z

/remove-lifecycle rotten

Signed-off-by: Marvin Beckers <marvin@kubermatic.com>

📖 Update multi-cluster proposal with new implementation details

k8s-ci-robot · 2025-01-07T14:22:02Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sttts
Once this PR has been reviewed and has the lgtm label, please assign sbueringer for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>

…iform Add notes about uniform controllers

embik · 2025-01-07T14:53:09Z

Hi @alvaroaleman @sbueringer, we (@sttts and I) finally got around to refreshing this document. Could you please take a look and let us know what you think? The (new) implementation PR at #3019 is functional if you want to look at implementation details, but it's of course not finalised yet (e.g. missing tests).

yastij · 2025-01-30T07:50:57Z

designs/multi-cluster.md

+// pkg/cluster
+type Aware interface {
+	// Engage gets called when the component should start operations for the given Cluster.
+	// The given context is tied to the Cluster's lifecycle and will be cancelled when the


how would a context be selected from the kubeconfig passed to the controller?

Just to avoid any confusion, this description is talking about a context.Context, not a kubeconfig context.

But in general: This would be something to implement in a Provider, a kubeconfig provider could be a very simple implementation (although the focus is on dynamic providers, and a kubeconfig would probably be fairly static).

How that provider translates the clusterName parameter in the Get method (see the Provider interface above) to a kubeconfig context would be up to the implementation, but I could see the context name being the identifier here (since Get returns a cluster.Cluster, we need credentials for the embedded client, so a context makes a lot of sense here).

Does that make sense? 🤔

sttts · 2025-02-03T09:39:53Z

Linked the updated implementation #3019.

maximilianbraun · 2025-02-03T09:55:42Z

+1 highly appreciated in the name of @SAP & is needed in context of our open source efforts with our european partners.

mirzakopic · 2025-02-04T11:02:06Z

+1 We are very excited about this and are looking forward us it in our open source efforts with https://apeirora.eu/ @SAP

alvaroaleman

There are actually two subcases in here that we don't seem to be differentiating well:
a) Controller whose request will always contain the cluster the event originated from
b) Controller whose request may contain a cluster that is not the one the event originated from

I think you are mostly thinking about a) with this proposal. You write:

We could deepcopy the builder instead of the sources and handlers. This would
lead to one controller and one workqueue per cluster. For the reason outlined
in the previous alternative, this is not desireable.

I am having trouble finding the "reasons outlined" and I do think instantiating a controller per cluster rather than adding/removing eventsources is preferrable whenever possible, because:

Otherwise workqueue metrics get dilluted, this is a big issue for ops, workqueue_depth is usually one of the main metrics people alert on
It requires zero changes in existing reconcilers

I'd love to hear your thoughts on this and I do think we should clearly differentiate the two cases in this doc. What also seems to get lost a bit is that the main issue today is with doing all of this dynamically - pkg/cluster already exists, if the cluster set is static, this design adds very little.

alvaroaleman · 2025-02-08T21:13:34Z

designs/multi-cluster.md

+  Consequently, there is no multi-cluster controller ecosystem, but could and
+  should be.
+- kcp maintains a [controller-runtime fork with multi-cluster support](https://github.com/kcp-dev/controller-runtime)
+  because adding support on-top leads to an inefficient controller design, and


leads to an inefficient controller design

Could you elaborate on this point?

We explicitly don't want one controller (with its own workqueue) per cluster.

Example: forget about workspaces. Imagine controller-runtime only supported controllers per one (!) namespace, i.e. another controller with another namespace for every namespace you want to serve. Same argument here, just a level higher.

And independently you could imagine cases where the same is true e.g. for cluster-api cases where the workqueue should be shared. That's what this enhancement is enabling.

This is a decision that shouldn't be forced onto customers. I can see the case where a workqueue per cluster is desired as it provides some built in fairness

We explicitly don't want one controller (with its own workqueue) per cluster.

Others might disagree. This question needs a proper evaluation with pro/cons of both approaches rather than jumping to conclusions.

I agree that both topologies can have their place. Am not even sure pro/con is helpful. We shouldn't be opinionated, but give the primitives for the developer to decide.

Copying my comment from here since this is the bigger thread:

Would you be comfortable with a way to tell this design to either use a shared queue (e.g. start sources) or to start controllers with a config switch [in the manager] or similar?

I agree that both topologies can have their place. Am not even sure pro/con is helpful. We shouldn't be opinionated, but give the primitives for the developer to decide.

A stated goal of this doc is to avoid divergence in the ecosystem, writing that down while at the same time handwaving away comments about this not actually being a good approach and saying we shouldn't be opinionated is not particularly convincing.

Our goal is be make the majority use case simple and other use-cases possible. This is not possible if we refuse to even look into the question what the majority use-case is and default to assuming that the use-case of the author of a design must be the majority use-case.

Fair enough. I didn't mean we shouldn't think about which workqueue topology is useful when. I meant that there are good reasons for a joint workqueue in some situations (like when I want to throttle all reconciles in a process because that throughput it limited), and independent ones in other situations (like when e.g. writes to a cluster are the limited factor).

I played with a TypedFair queue:

// Fair is a queue that ensures items are dequeued fairly across different // fairness keys while maintaining FIFO order within each key. type Fair TypedFair[any] // FairnessKeyFunc is a function that returns a string key for a given item. // Items with different keys are dequeued fairly. type FairnessKeyFunc[T comparable] func(T) string // NewFair creates a new Fair instance. func NewFair(keyFunc FairnessKeyFunc[any]) *Fair { return (*Fair)(NewTypedFair[any](keyFunc)) }

that could be plugged in here, wrapped by throttling and delays.

alvaroaleman · 2025-02-08T21:14:30Z

designs/multi-cluster.md

+  object to these kind of changes.
+
+Here we call a controller to be multi-cluster-compatible if the reconcilers get
+reconcile requests in cluster `X` and do all reconciliation in cluster `X`. This 


So "Start/Stop a controller for each cluster" is out of scope, this is purely about "Add/Remove sources to/from controller on cluster arrival/departure"?

Related to "one workqueue" I guess. Start/Stop means another workqueue, which we don't want.

alvaroaleman · 2025-02-08T21:20:34Z

designs/multi-cluster.md

+```
+
+`ctrl.Manager` will implement `cluster.Aware`. As specified in the `Provider` interface,
+it is the cluster provider's responsibility to call `Engage` and `Disengage` on a `ctrl.Manager`


Doesn't that mean the current interface specification for cluster.Provider is insufficient, as this entails that the cluster.Provider needs a reference to the manager?

That's out of scope of the goal of the interface. Wiring the manager in will happen when starting the provider:

prov := NewProvider() mgr := NewManager(..., Options: {provider: prov}) go mgr.Start(ctx) go prov.Start(ctx, mgr)

alvaroaleman · 2025-02-08T21:23:08Z

designs/multi-cluster.md

+	// The given context is tied to the Cluster's lifecycle and will be cancelled when the
+	// Cluster is removed or an error occurs.
+	//
+	// Implementers should return an error if they cannot start operations for the given Cluster,


If its non-blocking, how is the controller supposed to surface errors here?

The idea is that anything that needs to be done for starting operations on a cluster is "blocking" (and thus would return an error), but the operations on the engaged cluster themselves are not blocking.

Can you give an example?

I have some trouble understanding the difference between "anything that needs to be done for starting operations on a cluster" vs "operations on the engaged cluster" that both seems to be done or aysnchronously triggered (maybe?) in the Engage func

One of the things that happen in the prototype implementation (#3019) for the typedMultiClusterController is starting the new watches on the newly engaged cluster. So if that fails, Engage returns an error, but it's not blocking for processing items.

alvaroaleman · 2025-02-08T21:26:29Z

designs/multi-cluster.md

+
+The multi-cluster controller implementation reacts to engaged clusters by starting
+a new `TypedSyncingSource` that also wraps the context passed down from the call to `Engage`,
+which _MUST_ be canceled by the cluster provider at the end of a cluster's lifecycle.


So you are saying the ctx passed by cluster.Provider when calling Engage on the manager needs to be stored by the manager and re-used when calling Engage on any multi-cluster runnable which in turn needs to use it to control the lifecycle of the source? What is the point of having Disengage then?

I think you are right, and we can do without Disengage. I will try to change the prototype implementation to eliminate it.

This seems to imply that the context used to call Start() on a Source will be used to stop the Source?

I think stopping the Source is not possible by just canceling this context for our current source implementations (or at least some of them)

IIRC the only similar functionality that we have today is that cancelling the context passed into Cache.Start() will stop all informers.

I also wonder how this works when multiple controllers are sharing the same underlying informer. I think if a controller doesn't exclusively own an informer it also shouldn't just shut it down. Or is my assumption wrong that they usually would share informers? (like it works today when multiple controllers are sharing the same cache & underlying informers).

For additional context. If I apply what we currently do in Cluster API to this proposal, it would be the cluster provider that shuts down the cache and all underlying informers.

I think stopping the Source is not possible by just canceling this context for our current source implementations (or at least some of them)

Am curious which you have in mind.

At least source.Kind. The ctx passed into Start can be used to cancel the start process (e.g. WaitForCacheSync) but not the informer

alvaroaleman · 2025-02-08T21:33:13Z

designs/multi-cluster.md

+type TypedDeepCopyableEventHandler[object any, request comparable] interface {
+	TypedEventHandler[object, request]
+	DeepCopyFor(c cluster.Cluster) TypedDeepCopyableEventHandler[object, request]


Why teach handlers to copy themselves rather than just layering this:

type HandlerConstructor[object any, request comparable] func(cluster.Cluster) TypedHandler[object, request]

?
(name likely needs improvement but you get the idea)

The reason for this was the attempt to keep existing function signatures stable while enabling them to be multi-cluster aware. A HandlerConstructor would probably need a new separate builder function to be passed as argument, so e.g. something Watches vs ClusterAwareWatches (or whatever).

I'm totally open to changing this if you prefer it.

alvaroaleman · 2025-02-08T21:48:33Z

designs/multi-cluster.md

+The builder will chose the correct `EventHandler` implementation for both `For` and `Owns`
+depending on the `request` type used.
+
+With the described changes (use `GetCluster(ctx, req.ClusterName)`, making `reconciler`
+a `TypedFunc[reconcile.ClusterAwareRequest]`) an existing controller will automatically act as
+*uniform multi-cluster controller* if a cluster provider is configured.
+It will reconcile resources from cluster `X` in cluster `X`.


It is worth pointing out that this can also be achieved by instantiating a controller per target cluster rather than adding/removing sources to/from an existing controller.

IMHO if you ever actually want to operate the resulting component, you likely want "create/remove controller" rather than "create/remove source", because otherwise a single problematic cluster can completely mess up the workqueue metrics and on-calls can't tell if one cluster has an issue or all, which is going to be a big difference in term of severity.

Would you be comfortable with a way to tell this design to either use a shared queue (e.g. start sources) or to start controllers with a config switch or similar?

alvaroaleman · 2025-02-08T21:52:34Z

designs/multi-cluster.md

+	Owns(&v1.ReplicaSet{}).
+    Complete(reconciler)
+```
+


This doc is completely missing info on:

The actual implementation of a multi-cluster controller (i.E. Engage/Disengage in the Controller) - We are not expecting users to do that, right?

The same for source.Source, but arguably a subtask of the above

See implementation proposal #3019.

alvaroaleman · 2025-02-08T21:56:45Z

designs/multi-cluster.md

+        EngageWithDefaultCluster: ptr.To(true),
+        EngageWithProviderClusters: ptr.To(false),


This double binary distinction is pretty much guaranted to be too little. It would be better if we somehow tell the builder if this is a multi-cluster controller or not and then the cluster.Provider calls Engage for all clusters that should be engaged and its up to the implementor of the provider if they want to include the cluster the manager has a kubeconfig for or not.

If this is insufficient, we need a way for the cluster.Provider to decide if a given Aware runnable should be engaged or not.

its up to the implementor of the provider

Hm not sure if it should be up to the provider. Let's say I implement a cluster provider for Cluster API. I think it shouldn't be my call if all or none of the controllers that are used with this provider also watch the hub cluster or not.

I could make this a configuration option of the cluster provider but if this doesn't work because we only have one cluster provider per manager and I think it's valid that only some controllers of a manager watch the hub cluster and others do not.

So I think it should be a per-controller decision.

Wondering if it would make sense to just always call Engage and then the Engage function always can just do nothing if it doesn't want to engage a Cluster. This seems the most flexible option.

(If necessary Engage could have a bool return parameter signalling if the controller actually engaged a cluster or not)

alvaroaleman · 2025-02-08T21:58:15Z

designs/multi-cluster.md

+## Alternatives
+
+- Multi-cluster support could be built outside of core controller-runtime. This would
+  lead likely to a design with one manager per cluster. This has a number of problems:


No - This is precicesly why pkg/cluster exists

I'm curious about details why it is not possible to implement this outside of CR. (I got the points around adoption across the ecosystem, just wondering about the technical reasons)

We have implemented a component (called ClusterCache) in Cluster API that seems to come very close to what this design is trying to achieve (apart from that it is Cluster API specific of course). Especially since the generic support was added to CR.

Basically ClusterCache in CAPI:

discovers Clusters

maintains a Cache per Cluster

allows retrieving Clients for a Cluster

allows adding Watches (kind Sources) for a Cluster

this also allows mapping events that we get from these sources back to the one controller with the one shared work queue

xref: https://github.com/kubernetes-sigs/cluster-api/tree/main/controllers/clustercache

P.S. we are not creating multiple Cluster objects, instead we have our own simplified version that only contains what we need (https://github.com/kubernetes-sigs/cluster-api/blob/main/controllers/clustercache/cluster_accessor.go#L85-L89)

P.S.2. I don't understand the last two points in this list

The main "blocker" for this is that the builder is pretty common in 3rdparty controller code. If we do all of this outside of CR, this will very likely mean a fork of pkg/builder. I think everything else in the implementation PR could be done outside of CR.

vincepri · 2025-02-10T16:08:34Z

designs/multi-cluster.md

+  Consequently, there is no multi-cluster controller ecosystem, but could and
+  should be.
+- kcp maintains a [controller-runtime fork with multi-cluster support](https://github.com/kcp-dev/controller-runtime)
+  because adding support on-top leads to an inefficient controller design, and


This is a decision that shouldn't be forced onto customers. I can see the case where a workqueue per cluster is desired as it provides some built in fairness

vincepri · 2025-02-10T16:09:44Z

designs/multi-cluster.md

+- writing controllers for upper systems in a **portable** way is hard today.
+  Consequently, there is no multi-cluster controller ecosystem, but could and
+  should be.
+- kcp maintains a [controller-runtime fork with multi-cluster support](https://github.com/kcp-dev/controller-runtime)


On a general note, for the purpose of this proposal we should focus on general controller runtime users. While we can keep kcp as a reference along other implementation. I'd rephrase the motivation at a high level "setup controllers and watches across multiple Kubernetes clusters in a transparent way"

I think we tried to cover the high-level part with the first bullet point. The kcp controller-runtime fork is just mentioned to give personal motivation, but I don't think we have to mention it here if that is preferred.

vincepri · 2025-02-10T16:14:21Z

designs/multi-cluster.md

+- Run a controller-runtime controller against a kubeconfig with arbitrary many contexts, all being reconciled.
+- Run a controller-runtime controller against cluster managers like kind, Cluster API, Open-Cluster-Manager or Hypershift.
+- Run a controller-runtime controller against a kcp shard with a wildcard watch.


Would it be possible to focus the first iteration of this proposal to how Kubernetes works today? Adding uncommon use cases at this point in time increase overall complexity of the implementation. Other use cases should be pluggable imo

We agree things should be highly pluggable, that's why this is just an (incomplete) list of things that you could eventually plug in. Agreed that kcp is an uncommon use case, but so far we've made (recent) design decisions with Kubernetes clusters in general in mind. The "kcp" provider we'd like to built is itself far from ready yet.

That wouldn't be helpful for us though. I don't think the design now is influenced much by the kcp requirements, maybe with exception of the shared workqueue. Other than that the fleet-namespace example (which kind of reflects the kcp requirements) shows that the kcp use-case can be covered by a pretty generic design.

vincepri · 2025-02-10T16:14:59Z

designs/multi-cluster.md

+  out-of-tree subprojects that can individually evolve and vendor'ed by controller authors.
+- Make controller-runtime controllers "binary pluggable".
+- Manage one manager per cluster.
+- Manage one controller per cluster with dedicated workqueues.


Suggested change

- Manage one controller per cluster with dedicated workqueues.

This should be a goal

vincepri · 2025-02-10T16:19:08Z

designs/multi-cluster.md

+type Provider interface {
+	// Get returns a cluster for the given identifying cluster name. Get
+	// returns an existing cluster if it has been created before.
+	Get(ctx context.Context, clusterName string) (Cluster, error)


The second argument should probably be a typed reference, like we have for ObjectReference, even if it contains a single Name field, it would help with expanding it later, wdyt?

Are you thinking of logical.Name here or more a struct?

vincepri · 2025-02-10T16:24:19Z

designs/multi-cluster.md

+The embedded `cluster.Cluster` corresponds to `GetCluster(ctx, "")`. We call the
+clusters with non-empty name "provider clusters" or "enganged clusters", while
+the embedded cluster of the manager is called the "default cluster" or "hub 
+cluster".


Let's go by being explicit here, this was one of the main issues is that an empty string is also a default value. We can set a cluster.Reference to a specific value that's very specific, which in turns is used across the entire codebase

vincepri · 2025-02-10T16:24:59Z

designs/multi-cluster.md

+// pkg/reconcile
+type ClusterAwareRequest struct {
+	Request
+	ClusterName string


Suggested change

ClusterName string

Cluster cluster.Reference

vincepri · 2025-02-10T16:26:56Z

designs/multi-cluster.md

+can be used as `request` type even for controllers that do not have an active cluster provider.
+The cluster name will simply be an empty string, which is compatible with calls to `mgr.GetCluster`.
+
+**Note:** controller-runtime must provide this cluster-aware request type to


Nit: we're saying must here but SHOULD in the text right underneath the Cluster-Aware Request title

vincepri · 2025-02-10T16:29:37Z

designs/multi-cluster.md

+	Complete(reconciler)
+
+// new
+builder.TypedControllerManagedBy[reconcile.ClusterAwareRequest](mgr).


This is a bit too easy to mess up. What's stopping a reconcile.ClusterAwareRequest being used in the wrong place or vice-versa?

In general: reconcile.ClusterAwareRequest could totally be used in a non-multi-cluster setup and it wouldn't change a thing since it embeds a reconcile.Request. If the ClusterName field is empty, req.ClusterName would imply the "default cluster", which is the single-cluster use-case today (e.g. we changed the fleet example in #3019 to have a flag that lets you toggle multi-cluster vs single-cluster usage).

If you end up using reconcile.Request you would quickly notice that you don't have the cluster name to pass to mgr.GetCluster.

vincepri · 2025-02-10T16:30:55Z

designs/multi-cluster.md

+For a manager with `cluster.Provider`, the builder _SHOULD_ create a controller
+that sources events **ONLY** from the provider clusters that got engaged with
+the controller.


Does this mean that the "default" cluster we won't get events for?

It can be configured by controller. We default to writing a uniform multi-cluster controller, i.e. one that only reacts to provider clusters. It is not common afaik to have the same semantics for both a local cluster (the hub usually) and provider clusters.

sbueringer · 2025-02-11T10:12:35Z

designs/multi-cluster.md

+
+- Ship integration for different multi-cluster setups. This should become 
+  out-of-tree subprojects that can individually evolve and vendor'ed by controller authors.
+- Make controller-runtime controllers "binary pluggable".


What does "binary pluggable" mean in this context?

Something like https://pkg.go.dev/plugin to dynamically load providers.

sbueringer · 2025-02-12T12:24:00Z

designs/multi-cluster.md

+### Controller Author without self-interest in multi-cluster, but open for adoption in multi-cluster setups
+
+- Replace `mgr.GetClient()` and `mgr.GetCache` with `mgr.GetCluster(req.ClusterName).GetClient()` and `mgr.GetCluster(req.ClusterName).GetCache()`.
+- Make manager and controller plumbing vendor'able to allow plugging in multi-cluster provider and BYO request type.


Is the idea that something like cert-manager would decide on startup which cluster provider should be used and can then only work with one cluster provider at a time?

Phrased differently. Do we also want to support using multiple cluster providers at the same time?

One provider per manager at a time.

Low friction to make reconcilers uniform-multi-cluster capable (this basically means using the cluster-enabled request and to call mgr.GetCluster(name) instead of accessing cluster methods directly).

If a controller project wants to add support for a number of provider in their repository, this is fine, but not necessarily the goal.

Instead it should be easy to instantiate the controllers from an alternative main.go with a provider of your choice.

sttts · 2025-02-16T17:55:49Z

I already discussed this with @sbueringer and others:

I went ahead earlier this week to implement the design outside of controller-runtime: https://github.com/multicluster-runtime/multicluster-runtime.

As written in the upper comment, everything but the builder is easy to implement by wrapping controller-runtime concepts. Thanks to the extensive work to get Go generics into CR, those wrappers are easy and natural.

The builder consists of roughly 500 lines of code with mostly mechanical changes, which should be feasible to regularly rebase onto the latest state in controller-runtime.

Our plan for the moment is to use https://github.com/multicluster-runtime/multicluster-runtime to prove the design with non-trivial projects. Last but not least, it will help to iterate fast and learn on the way. It is not ruled out that parts of it should flow back into controller-runtime at some point. Definitely it can and should influence the design.

Add designs/multi-cluster.md

eb207cb

Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 31, 2024

k8s-ci-robot requested review from varshaprasad96 and vincepri March 31, 2024 13:38

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 31, 2024

alvaroaleman reviewed Mar 31, 2024

View reviewed changes

embik reviewed Apr 4, 2024

View reviewed changes

designs/multi-cluster.md Show resolved Hide resolved

sbueringer mentioned this pull request Apr 4, 2024

Multi Cluster Example / Pattern #2755

Closed

elmiko reviewed Apr 8, 2024

View reviewed changes

sbueringer reviewed Apr 11, 2024

View reviewed changes

designs/multi-cluster.md Outdated Show resolved Hide resolved

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 26, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 25, 2024

k8s-ci-robot closed this Oct 25, 2024

k8s-ci-robot reopened this Oct 28, 2024

embik mentioned this pull request Nov 22, 2024

✨ WIP: Cluster provider and cluster-aware controllers #3019

Closed

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 3, 2024

embik added 3 commits December 4, 2024 15:08

Update multi-cluster proposal with new implementation details

120cef5

Signed-off-by: Marvin Beckers <marvin@kubermatic.com>

Update with feedback

799a911

Signed-off-by: Marvin Beckers <marvin@kubermatic.com>

Adjust for reconcile.ClusterAwareRequest

3f2fa4f

Signed-off-by: Marvin Beckers <marvin@kubermatic.com>

Merge pull request #3 from embik/embik-sttts-cluster-support-enhancement

e4cac69

📖 Update multi-cluster proposal with new implementation details

sttts added 2 commits January 7, 2025 15:35

Add notes about uniform controllers

7469ce9

Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>

Merge pull request #4 from sttts/sttts-cluster-support-enhancement-un…

34a44f9

…iform Add notes about uniform controllers

yastij reviewed Jan 30, 2025

View reviewed changes

sttts requested review from sbueringer and alvaroaleman February 6, 2025 19:19

alvaroaleman reviewed Feb 8, 2025

View reviewed changes

vincepri requested changes Feb 11, 2025

View reviewed changes

k8s-ci-robot assigned vincepri Feb 11, 2025

sbueringer reviewed Feb 12, 2025

View reviewed changes

		EngageWithDefaultCluster: ptr.To(true),
		EngageWithProviderClusters: ptr.To(false),

📖 Add designs/multi-cluster.md #2746

Are you sure you want to change the base?

📖 Add designs/multi-cluster.md #2746

Conversation

sttts commented Mar 31, 2024 • edited Loading

Choose a reason for hiding this comment

sbueringer Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

embik Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elmiko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sttts commented May 28, 2024

k8s-triage-robot commented Aug 26, 2024

k8s-triage-robot commented Sep 25, 2024

k8s-triage-robot commented Oct 25, 2024

k8s-ci-robot commented Oct 25, 2024

embik commented Oct 28, 2024

k8s-ci-robot commented Oct 28, 2024

Gomaya commented Nov 13, 2024

embik commented Nov 13, 2024

embik commented Dec 3, 2024

k8s-ci-robot commented Jan 7, 2025

embik commented Jan 7, 2025

Choose a reason for hiding this comment

embik Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

sttts commented Feb 3, 2025

maximilianbraun commented Feb 3, 2025 • edited Loading

mirzakopic commented Feb 4, 2025

alvaroaleman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sttts Feb 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer Feb 12, 2025 • edited Loading

sttts commented Mar 31, 2024 •

edited

Loading

sbueringer Apr 11, 2024 •

edited

Loading

embik Dec 4, 2024 •

edited

Loading

sbueringer Apr 10, 2024 •

edited

Loading

embik Jan 30, 2025 •

edited

Loading

maximilianbraun commented Feb 3, 2025 •

edited

Loading

sttts Feb 10, 2025 •

edited

Loading

sbueringer Feb 12, 2025 •

edited

Loading

sbueringer Feb 12, 2025 •

edited

Loading