EVEREST-1654 Use controller-runtime client instead of Rest #1132

maxkondr · 2025-02-22T11:10:42Z

This is a PoC.

The idea is the following:

in Everest server our own rest client (pkg/kubernetes/client/ and pkg/kubernetes/client/customresources) can be removed. Instead, the logic can be placed in pkg/kubernetes.Kubernetes.
controller-runtime.Client with cache is used in pkg/kubernetes.Kubernetes in the same way it is used in Everest-operator.
cache allows to use FieldIndexer to index fields in our CRs. It allows later to lookup our CRs by such fields.
-- this allows us to remove the logic in everest-operator that copies values from .spec. fields to metadata.labels for our CRs.

maxkondr · 2025-02-22T11:15:36Z

pkg/kubernetes/backup_storage.go

@@ -82,34 +83,49 @@ func (k *Kubernetes) IsBackupStorageUsed(ctx context.Context, namespace, name st
 	}

 	// Check if it is in use by clusters?
-	clusters, err := k.client.ListDatabaseClusters(ctx, namespace, metav1.ListOptions{})
-	if err != nil {
+	listOpts := &ctrlclient.ListOptions{


this is an example of FieldSelector usage over .spec.backup.schedules.backupStorageName in DatabaseCluster CR

maxkondr · 2025-02-22T11:16:25Z

pkg/kubernetes/client/customresources/databaseclusters.go

this file is not needed any more since all logic is placed in pkg/kubernetes/database_cluster.go

maxkondr · 2025-02-22T11:16:48Z

pkg/kubernetes/client/database_cluster.go

this file is not needed any more since all logic is placed in pkg/kubernetes/database_cluster.go

maxkondr · 2025-02-22T11:17:22Z

pkg/kubernetes/client/kubeclient_interface.go

Once all logic is moved to pkg/kubernetes/ - this file can be removed as well

maxkondr · 2025-02-22T11:19:56Z

pkg/kubernetes/database_cluster.go

@@ -29,27 +30,39 @@ import (

 // ListDatabaseClusters returns list of managed database clusters.
 func (k *Kubernetes) ListDatabaseClusters(ctx context.Context, namespace string) (*everestv1alpha1.DatabaseClusterList, error) {
-	return k.client.ListDatabaseClusters(ctx, namespace, metav1.ListOptions{})
+	result := &everestv1alpha1.DatabaseClusterList{}


this is an example of the logic that replaces one from pkg/kubernetes/client/ + pkg/kubernetes/client/customresources layers

maxkondr · 2025-02-22T11:21:10Z

pkg/kubernetes/database_cluster.go

 }

 // GetDatabaseCluster returns database clusters by provided name.
 func (k *Kubernetes) GetDatabaseCluster(ctx context.Context, namespace, name string) (*everestv1alpha1.DatabaseCluster, error) {
-	return k.client.GetDatabaseCluster(ctx, namespace, name)
+	result := &everestv1alpha1.DatabaseCluster{}
+	err := k.k8sClient.Get(ctx, types.NamespacedName{Namespace: namespace, Name: name}, result)


k.k8sClient.Get and k.k8sClient.List operations use cache that is being updated in background and in case resource in K8S is changed

maxkondr · 2025-02-22T11:22:42Z

pkg/kubernetes/kubernetes.go

@@ -90,7 +93,10 @@ var ErrEmptyVersionTag = errors.New("got an empty version tag from Github")

 // Kubernetes is a client for Kubernetes.
 type Kubernetes struct {
-	client     client.KubeClientConnector
+	// client client.KubeClientConnector
+	client     *client.Client


this client will be removed once all logic from pkg/kubernetes/client/ + pkg/kubernetes/client/customresources layers is moved to pkg/kubernetes/ layer

maxkondr · 2025-02-22T11:26:13Z

pkg/kubernetes/kubernetes.go

+// initIndexers initializes the field indexers for the DatabaseCluster.
+func (k *Kubernetes) initIndexers(ctx context.Context) error {
+	if err := k.k8sCache.IndexField(
+		ctx, &everestv1alpha1.DatabaseCluster{}, ".spec.backup.schedules.backupStorageName",


this is an example of indexing field in .spec for our DatabaseCluster CR. This index is later used for DBs lookup by field

recharte · 2025-02-25T19:05:13Z

Hi @maxkondr in general I like this idea but I also have a few questions/concerns.

First of all, besides the FieldIndexer benefit you already identified in this PR's description, I see the following pros:

Gets rid of the kubernetes/client abstraction. This sounds good because, currently, we are not taking advantage of the KubeClientConnector interface. Unless I'm missing something, it just leads to boilerplate code.
(this can also be a con see point 2. below) Using the controller-runtime's client forces us to use a single interface for managing all k8s resources regardless of their API groups and versions. This will allow us to get rid of this mess 🙈

Here are the concerns that came through my mind that might merit some discussion:

Does the controller-runtime's client and cache have any limitations when running outside the k8s cluster? This will be important in the future when we add support for connecting multiple k8s clusters to a single everest instance.
From my initial assessment, I couldn't find any relevant limitations. In the end, it seems like both clients communicate similarly with the k8s API server, they just provide different interfaces for users.
(this can also be a pro see point 2. above) Using the controller-runtime's client forces us to use a single interface for managing all k8s resources regardless of their API groups and versions. This means that we'll lose access to the (arguably) useful abstractions the respective client sets introduce. Being used to the generic interface provided by the controller-runtime's client, I personally don't feel like these client set interfaces are that useful but some might disagree.
Using the controller-runtime's client forces us to maintain a runtime Scheme. The DB operators can be installed/uninstalled during the lifetime of our API server, meaning that the k8s API's registered CRDs can change. This means that if we ever need to interface with such resources we'll need to mutate the scheme accordingly. However, since our API server doesn't interface directly with the DB operators' CRs we should probably be fine and can avoid this complexity.
Introducing caching will definitely increase the memory footprint. The question is: by how much? Our API server queries a multitude of resources, not just our CRs, e.g. we fetch pods and secrets. These resources can be plentiful in a large cluster. I believe that this can potentially have a big impact the memory consumption.

Here's an interesting read that, among other things, discusses the differences from the client-go and controller-runtime clients and cache mechanisms. Even though, the author approaches it from the point of view of a controller/operator we can still use this info to make a more informed decision about what to use in our API server.

maxkondr · 2025-02-26T16:33:16Z

Here are the concerns that came through my mind that might merit some discussion:

Does the controller-runtime's client and cache have any limitations when running outside the k8s cluster? This will be important in the future when we add support for connecting multiple k8s clusters to a single everest instance.

controller-runtime's client and cache need kubeconfig to initialise outside of K8S. Once we decide to implement multi-K8S support we anyway need kubeconfigs to access external clusters, so there are no issues with this.

(this can also be a pro see point 2. above) Using the controller-runtime's client forces us to use a single interface for managing all k8s resources regardless of their API groups and versions. This means that we'll lose access to the (arguably) useful abstractions the respective client sets introduce. Being used to the generic interface provided by the controller-runtime's client, I personally don't feel like these client set interfaces are that useful but some might disagree.

Well, first of all most of currently used client sets are generated and we utilise them as external dependency so that we statically tie up with them and they can be replaced by generic client.Object-based approach that controller-runtime uses.
The rest of client sets (for example dynamicClientset) is not used in our code (i.e. the only call is here that is "dead" code).
And even if we need some outstanding staff from particular clientset - we still can keep it and use in such situations but remove the rest of the clientsets.

controller-runtime client does not rely on generated clientsets. Internally, it still relies on client-go primitives, but we don’t need to setup code generation in order to use it for custom resources. Instead, a single controller-runtime client can be used for managing all Kubernetes resources (built-in and extended) homogeneously.

Introducing caching will definitely increase the memory footprint. The question is: by how much? Our API server queries a multitude of resources, not just our CRs, e.g. we fetch pods and secrets. These resources can be plentiful in a large cluster. I believe that this can potentially have a big impact the memory consumption.

But at the same time, our current server implementation creates load on K8S API.

Regarding caching resources.

we can configure cache to avoid caching of some kinds of resources: Secrets, ConfigMaps, Pods, etc.
and even for Secrets, ConfigMaps, Pods we still can configure cache to fetch by our labels only.
we can configure client to fetch only metadata from particular resource (this is for cases when we need just to check the existence of particular resource)
we can configure cache to store objects for particular namespaces only

And one more point that worth to be outlined. The cache may contain slightly out-of-date state of the object (due to some lag between object updated in K8S API and object update in cache). So that we have some chance that we present to user on UI/API not "the latest" state of object. But we have a remediation mechanisms for this:

first of all - do not query object just after it's creation. Instead we shall return to API user object returned by controller-runtime client Create()or Update() calls.
our UI has polling mechanism - even if on first fetch server returns outdated object, on next try there is very high chance that server returns up-2-date state.

I think we shall start to cache only our CRs (DatabaseCluster, DatabaseClusterBackup, etc) since there won't be a huge number of them in K8S (in comparing to Secrets,Pods, etc).

So it provides us with certain flexibility by the cost of an additional memory consumption.

Here's an interesting read

Yep, good article, thanks

recharte · 2025-02-27T14:46:23Z

Thanks @maxkondr for the follow-up on this discussion. I agree with you and I think it's probably worth going ahead with implementing your proposal but before you jump to a full implementation let's hear from @mayankshah1607.

mayankshah1607 · 2025-02-28T06:58:32Z

In general I also like the idea of using the controller-runtime client mainly because we can get rid of the KubeClientConnector, etc. mess and use the simpler interface it provides (Get, Update, Create, etc.).

But I have concerns about the caching:

Cache updates are not always instant, and these delays become more evident in larger clusters. Before performing any action (such as checking if BackupStorage is in use before deleting it), we conduct checks to verify the state of relevant objects. If the cache is inconsistent, these checks may give incorrect results. Relying on eventual consistency is not a viable solution here, as the required action might already have been performed by the time the cache is updated. Our code was originally written under the assumption that it would receive consistent results, but, since this assumption no longer holds, we must exercise additional caution to ensure correctness.
As pointed out above, memory footprint is also another issue. I'm not sure its enough to just skip ConfigMaps and Secrets (we have certain OLM objects as well that are huge). We need to be very thoughtful about which resources we are requesting for otherwise we can overwhelm the cache
When using a cached client, we will start observing problems when reading an object immediately after writing it. This is related to problem (1) -- the cache is used only for reading, writes are performed on the Kube API directly and later reconciled using the informers. This means that reading immediately after writing an object will likely give us an outdated result. I don't know for sure if we have such a scenario in the code already, but even if not, we need to be careful in the future, otherwise this can be hard to troubleshoot.
How will this caching work in HA mode? Will we have duplicate caches for each replica? In controller-runtime based operators, they follow master-follower architecture, will we follow the same? Won't this just add more complexity?

tl;dr - controller-runtime.Client is good, but I am not convinced we need caching. Yes, our current implementation makes direct API calls but I would not try to optimise without knowing how much to optimise for, especially considering the added complexity. Maybe we should at least provide an option to disable client cache

maxkondr · 2025-02-28T12:50:15Z

@mayankshah1607 thanks for your comments.

In such a case, I would suggest the following:

we switch from REST to controller-runtime.Client -> get rid of KubeClientConnector mess
we disable caching at all for now -> achieve the same results (i.e. we call K8S API intensively and get consistent state) but with less code and mess.
later we will come back to caching topic again and move step by step

@mayankshah1607 , @recharte - comments/objections?

mayankshah1607 · 2025-03-01T07:35:04Z

Yes, that sounds good. 👍🏻

EVEREST-1654 Use controller-runtime client instead of Rest

d906485

maxkondr self-assigned this Feb 22, 2025

maxkondr commented Feb 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVEREST-1654 Use controller-runtime client instead of Rest #1132

EVEREST-1654 Use controller-runtime client instead of Rest #1132

maxkondr commented Feb 22, 2025 •

edited by jira bot

Loading

maxkondr Feb 22, 2025

maxkondr Feb 22, 2025

maxkondr Feb 22, 2025

maxkondr Feb 22, 2025

maxkondr Feb 22, 2025

maxkondr Feb 22, 2025

maxkondr Feb 22, 2025

maxkondr Feb 22, 2025

recharte commented Feb 25, 2025

maxkondr commented Feb 26, 2025 •

edited

Loading

recharte commented Feb 27, 2025

mayankshah1607 commented Feb 28, 2025 •

edited

Loading

maxkondr commented Feb 28, 2025

mayankshah1607 commented Mar 1, 2025

EVEREST-1654 Use controller-runtime client instead of Rest #1132

Are you sure you want to change the base?

EVEREST-1654 Use controller-runtime client instead of Rest #1132

Conversation

maxkondr commented Feb 22, 2025 • edited by jira bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

recharte commented Feb 25, 2025

maxkondr commented Feb 26, 2025 • edited Loading

recharte commented Feb 27, 2025

mayankshah1607 commented Feb 28, 2025 • edited Loading

maxkondr commented Feb 28, 2025

mayankshah1607 commented Mar 1, 2025

maxkondr commented Feb 22, 2025 •

edited by jira bot

Loading

maxkondr commented Feb 26, 2025 •

edited

Loading

mayankshah1607 commented Feb 28, 2025 •

edited

Loading