Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVEREST-1654 Use controller-runtime client instead of Rest #1132

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

maxkondr
Copy link
Contributor

@maxkondr maxkondr commented Feb 22, 2025

EVEREST-1654 Powered by Pull Request Badge

This is a PoC.

The idea is the following:

  • in Everest server our own rest client (pkg/kubernetes/client/ and pkg/kubernetes/client/customresources) can be removed. Instead, the logic can be placed in pkg/kubernetes.Kubernetes.
  • controller-runtime.Client with cache is used in pkg/kubernetes.Kubernetes in the same way it is used in Everest-operator.
  • cache allows to use FieldIndexer to index fields in our CRs. It allows later to lookup our CRs by such fields.
    -- this allows us to remove the logic in everest-operator that copies values from .spec. fields to metadata.labels for our CRs.

@maxkondr maxkondr self-assigned this Feb 22, 2025
@@ -82,34 +83,49 @@ func (k *Kubernetes) IsBackupStorageUsed(ctx context.Context, namespace, name st
}

// Check if it is in use by clusters?
clusters, err := k.client.ListDatabaseClusters(ctx, namespace, metav1.ListOptions{})
if err != nil {
listOpts := &ctrlclient.ListOptions{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an example of FieldSelector usage over .spec.backup.schedules.backupStorageName in DatabaseCluster CR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is not needed any more since all logic is placed in pkg/kubernetes/database_cluster.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is not needed any more since all logic is placed in pkg/kubernetes/database_cluster.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once all logic is moved to pkg/kubernetes/ - this file can be removed as well

@@ -29,27 +30,39 @@ import (

// ListDatabaseClusters returns list of managed database clusters.
func (k *Kubernetes) ListDatabaseClusters(ctx context.Context, namespace string) (*everestv1alpha1.DatabaseClusterList, error) {
return k.client.ListDatabaseClusters(ctx, namespace, metav1.ListOptions{})
result := &everestv1alpha1.DatabaseClusterList{}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an example of the logic that replaces one from pkg/kubernetes/client/ + pkg/kubernetes/client/customresources layers

}

// GetDatabaseCluster returns database clusters by provided name.
func (k *Kubernetes) GetDatabaseCluster(ctx context.Context, namespace, name string) (*everestv1alpha1.DatabaseCluster, error) {
return k.client.GetDatabaseCluster(ctx, namespace, name)
result := &everestv1alpha1.DatabaseCluster{}
err := k.k8sClient.Get(ctx, types.NamespacedName{Namespace: namespace, Name: name}, result)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k.k8sClient.Get and k.k8sClient.List operations use cache that is being updated in background and in case resource in K8S is changed

@@ -90,7 +93,10 @@ var ErrEmptyVersionTag = errors.New("got an empty version tag from Github")

// Kubernetes is a client for Kubernetes.
type Kubernetes struct {
client client.KubeClientConnector
// client client.KubeClientConnector
client *client.Client
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this client will be removed once all logic from pkg/kubernetes/client/ + pkg/kubernetes/client/customresources layers is moved to pkg/kubernetes/ layer

// initIndexers initializes the field indexers for the DatabaseCluster.
func (k *Kubernetes) initIndexers(ctx context.Context) error {
if err := k.k8sCache.IndexField(
ctx, &everestv1alpha1.DatabaseCluster{}, ".spec.backup.schedules.backupStorageName",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an example of indexing field in .spec for our DatabaseCluster CR. This index is later used for DBs lookup by field

@recharte
Copy link
Collaborator

Hi @maxkondr in general I like this idea but I also have a few questions/concerns.

First of all, besides the FieldIndexer benefit you already identified in this PR's description, I see the following pros:

  1. Gets rid of the kubernetes/client abstraction. This sounds good because, currently, we are not taking advantage of the KubeClientConnector interface. Unless I'm missing something, it just leads to boilerplate code.
  2. (this can also be a con see point 2. below) Using the controller-runtime's client forces us to use a single interface for managing all k8s resources regardless of their API groups and versions. This will allow us to get rid of this mess 🙈

Here are the concerns that came through my mind that might merit some discussion:

  1. Does the controller-runtime's client and cache have any limitations when running outside the k8s cluster? This will be important in the future when we add support for connecting multiple k8s clusters to a single everest instance.
    From my initial assessment, I couldn't find any relevant limitations. In the end, it seems like both clients communicate similarly with the k8s API server, they just provide different interfaces for users.
  2. (this can also be a pro see point 2. above) Using the controller-runtime's client forces us to use a single interface for managing all k8s resources regardless of their API groups and versions. This means that we'll lose access to the (arguably) useful abstractions the respective client sets introduce. Being used to the generic interface provided by the controller-runtime's client, I personally don't feel like these client set interfaces are that useful but some might disagree.
  3. Using the controller-runtime's client forces us to maintain a runtime Scheme. The DB operators can be installed/uninstalled during the lifetime of our API server, meaning that the k8s API's registered CRDs can change. This means that if we ever need to interface with such resources we'll need to mutate the scheme accordingly. However, since our API server doesn't interface directly with the DB operators' CRs we should probably be fine and can avoid this complexity.
  4. Introducing caching will definitely increase the memory footprint. The question is: by how much? Our API server queries a multitude of resources, not just our CRs, e.g. we fetch pods and secrets. These resources can be plentiful in a large cluster. I believe that this can potentially have a big impact the memory consumption.

Here's an interesting read that, among other things, discusses the differences from the client-go and controller-runtime clients and cache mechanisms. Even though, the author approaches it from the point of view of a controller/operator we can still use this info to make a more informed decision about what to use in our API server.

@maxkondr
Copy link
Contributor Author

maxkondr commented Feb 26, 2025

Here are the concerns that came through my mind that might merit some discussion:

  1. Does the controller-runtime's client and cache have any limitations when running outside the k8s cluster? This will be important in the future when we add support for connecting multiple k8s clusters to a single everest instance.

controller-runtime's client and cache need kubeconfig to initialise outside of K8S. Once we decide to implement multi-K8S support we anyway need kubeconfigs to access external clusters, so there are no issues with this.

  1. (this can also be a pro see point 2. above) Using the controller-runtime's client forces us to use a single interface for managing all k8s resources regardless of their API groups and versions. This means that we'll lose access to the (arguably) useful abstractions the respective client sets introduce. Being used to the generic interface provided by the controller-runtime's client, I personally don't feel like these client set interfaces are that useful but some might disagree.

Well, first of all most of currently used client sets are generated and we utilise them as external dependency so that we statically tie up with them and they can be replaced by generic client.Object-based approach that controller-runtime uses.
The rest of client sets (for example dynamicClientset) is not used in our code (i.e. the only call is here that is "dead" code).
And even if we need some outstanding staff from particular clientset - we still can keep it and use in such situations but remove the rest of the clientsets.

controller-runtime client does not rely on generated clientsets. Internally, it still relies on client-go primitives, but we don’t need to setup code generation in order to use it for custom resources. Instead, a single controller-runtime client can be used for managing all Kubernetes resources (built-in and extended) homogeneously.

  1. Introducing caching will definitely increase the memory footprint. The question is: by how much? Our API server queries a multitude of resources, not just our CRs, e.g. we fetch pods and secrets. These resources can be plentiful in a large cluster. I believe that this can potentially have a big impact the memory consumption.

But at the same time, our current server implementation creates load on K8S API.

Regarding caching resources.

  • we can configure cache to avoid caching of some kinds of resources: Secrets, ConfigMaps, Pods, etc.
  • and even for Secrets, ConfigMaps, Pods we still can configure cache to fetch by our labels only.
  • we can configure client to fetch only metadata from particular resource (this is for cases when we need just to check the existence of particular resource)
  • we can configure cache to store objects for particular namespaces only

And one more point that worth to be outlined. The cache may contain slightly out-of-date state of the object (due to some lag between object updated in K8S API and object update in cache). So that we have some chance that we present to user on UI/API not "the latest" state of object. But we have a remediation mechanisms for this:

  • first of all - do not query object just after it's creation. Instead we shall return to API user object returned by controller-runtime client Create()or Update() calls.
  • our UI has polling mechanism - even if on first fetch server returns outdated object, on next try there is very high chance that server returns up-2-date state.

I think we shall start to cache only our CRs (DatabaseCluster, DatabaseClusterBackup, etc) since there won't be a huge number of them in K8S (in comparing to Secrets,Pods, etc).

So it provides us with certain flexibility by the cost of an additional memory consumption.

Here's an interesting read

Yep, good article, thanks

@recharte
Copy link
Collaborator

Thanks @maxkondr for the follow-up on this discussion. I agree with you and I think it's probably worth going ahead with implementing your proposal but before you jump to a full implementation let's hear from @mayankshah1607.

@mayankshah1607
Copy link
Member

mayankshah1607 commented Feb 28, 2025

In general I also like the idea of using the controller-runtime client mainly because we can get rid of the KubeClientConnector, etc. mess and use the simpler interface it provides (Get, Update, Create, etc.).

But I have concerns about the caching:

  1. Cache updates are not always instant, and these delays become more evident in larger clusters. Before performing any action (such as checking if BackupStorage is in use before deleting it), we conduct checks to verify the state of relevant objects. If the cache is inconsistent, these checks may give incorrect results. Relying on eventual consistency is not a viable solution here, as the required action might already have been performed by the time the cache is updated. Our code was originally written under the assumption that it would receive consistent results, but, since this assumption no longer holds, we must exercise additional caution to ensure correctness.
  2. As pointed out above, memory footprint is also another issue. I'm not sure its enough to just skip ConfigMaps and Secrets (we have certain OLM objects as well that are huge). We need to be very thoughtful about which resources we are requesting for otherwise we can overwhelm the cache
  3. When using a cached client, we will start observing problems when reading an object immediately after writing it. This is related to problem (1) -- the cache is used only for reading, writes are performed on the Kube API directly and later reconciled using the informers. This means that reading immediately after writing an object will likely give us an outdated result. I don't know for sure if we have such a scenario in the code already, but even if not, we need to be careful in the future, otherwise this can be hard to troubleshoot.
  4. How will this caching work in HA mode? Will we have duplicate caches for each replica? In controller-runtime based operators, they follow master-follower architecture, will we follow the same? Won't this just add more complexity?

tl;dr - controller-runtime.Client is good, but I am not convinced we need caching. Yes, our current implementation makes direct API calls but I would not try to optimise without knowing how much to optimise for, especially considering the added complexity. Maybe we should at least provide an option to disable client cache

@maxkondr
Copy link
Contributor Author

@mayankshah1607 thanks for your comments.

In such a case, I would suggest the following:

  • we switch from REST to controller-runtime.Client -> get rid of KubeClientConnector mess
  • we disable caching at all for now -> achieve the same results (i.e. we call K8S API intensively and get consistent state) but with less code and mess.
  • later we will come back to caching topic again and move step by step

@mayankshah1607 , @recharte - comments/objections?

@mayankshah1607
Copy link
Member

Yes, that sounds good. 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants