Retrieve every k8s resource generated by a Carvel package installation #4062

antgamdia · 2022-01-11T16:45:34Z

Description of the change

This PR removes the workaround we implemented when we released it last time and implements a generic way to retrieve all the kubernetes resources matching a certain label. This way we can fetch every k8s resource created as a result of a Carvel PackageInstall.

Benefits

The retrieved resources for a carvel package will now include every possible one, not a fixed list of pods, svc and deployments.

Possible drawbacks

Hitting the k8s APIs for querying every possible available resource is expensive in time. I guess it is also due to my Kubelet config in kind with limited resources, but I'm still worried about it.

Example: an app with 97 resources (in a cluster with 88 different resource types) takes around 15 seconds to retrieve the resource refs.
- With plenty of Waited for <some time> ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/<api resource>?labelSelector=kapp.k14s.io%2Fapp%3D<my-id> console messages

Note that the kapp CLI is already is able to fetch its generated resources; however, solely using a Kubernetes client (not the kapp binary), there is no CR holding this information.
See how the kapp inspect command under the hood here: https://github.com/vmware-tanzu/carvel-kapp/blob/v0.44.0/pkg/kapp/cmd/app/inspect.go#L76

Edit: they are doing a similar approach, but with a cache layer above: https://github.com/vmware-tanzu/carvel-kapp/blob/v0.44.0/pkg/kapp/resources/resource_types.go#L111

Applicable issues

fixes Add a proper implementation for GetInstalledPackageResourceRefs #3854
requires Fix namespace when watching (rather than getting) resources. #4057 for the UI to be fully working.

Additional information

Some resources from a Habor installation:

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

antgamdia · 2022-01-11T21:33:27Z

cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server.go

+			// have a look at the CachedDiscoveryClient in k8s
+			// https://github.com/kubernetes/client-go/blob/release-1.22/discovery/cached/disk/cached_discovery.go
+			discoveryClient := typedClient.Discovery()
+			if discoveryClient == nil {


Adding the discoveryClient as part of the clientGetter so that we can swap it and mock it later for testing

antgamdia · 2022-01-11T21:36:28Z

cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server.go

+	if err != nil {
+		return nil, err
+	}
+	discoveryClient := discovery.NewDiscoveryClient(client)


Here we still create a new discovery, but as Michael did in the past, we are using the RESTMapper just in production. For testing, we mock the entire kindToResource function

antgamdia · 2022-01-11T21:37:54Z

cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server_data_resources.go

+			}
+			wg.Add(1)
+			// spawn a goroutine to fetch the matching k8s resources
+			go func(resource metav1.APIResource, ch chan resourceAndGvk, wg *sync.WaitGroup) {


I've tried to make it as parallel as possible, but the bottleneck seems to be in the k8s apiserver itself (in my local environment, haven't tested in a real cluster yet).

Yes, unless I'm missing something, this code is calling List on every listable resource in all namespaces on the cluster? I'm skeptical that we should be doing this?

Mmm, yes, it's exactly what the code is doing :S
However, I can't think of another way of "getting an unknown number of resources matching a label in an unknown number of different namespaces". Perhaps by banning some resources we certainly know they can't be created (like Nodes ?)

Yeah, OK. If this is what the kapp code does as well, I assume they've thought about it more than us (and would prefer to just use their code, thanks for the second version).

antgamdia · 2022-01-11T21:38:55Z

cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server_data_resources.go

+
+	// create channel to collect the matching k8s resources thus avoid blocking the waitgroup
+	// ussing a buffered channel with size 100; assumption: 60% native k8s resources + 40 CRDs
+	ch := make(chan resourceAndGvk, 100)


I'm not used to channels yet, I've made this assumption, but please feel free to tell me more about the buffer sizes here if it is wrong.

Yeah, it's not clear to me why you're using a buffered channel, especially of a size 100. It implies that you think your consumers of the channel data will get very behind the producers.

absoludity

I've gone through the channel code that you mentioned you were uncertain about and left a few thoughts, as well as taken a look through the rest of the code. My main question or concern is that this appears to be getting all resources across all namespaces of the cluster to filter them by whatever the kapp selector is. If that's the case, then yep, I think your concern about the timing is a valid one. Is that the only option we've got?

I had a brief look at the kapp code that you pointed to, I didn't think it appeared to be doing so, but rather doing something much more specific (it appears to be listing identified resources only, but could be reading it incorrectly). Then I read the docs and tried the following on my dev cluster:

kapp inspect -a 'label:'
...
878 resources

Wow. So it's possible to use kapp to do that, but is that what it's doing by default when you inspect an app? (Haven't checked but assume you have).

Either way, what I'd hope we can do here is use kapp itself to get the resources for an app, so we're not implementing our own version. Why can we not create a NewInspectionCommand and run it, similar to how we run Helm commands? That way, we'd not be re-implementing kapp functionality (listing all resources for an app), in the same way we try to run the Helm CLI commands where we can?

absoludity · 2022-01-11T23:41:40Z

cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server.go

 	}
 }

 // GetClients ensures a client getter is available and uses it to return both a typed and dynamic k8s client.
-func (s *Server) GetClients(ctx context.Context, cluster string) (kubernetes.Interface, dynamic.Interface, error) {
+func (s *Server) GetClients(ctx context.Context, cluster string) (kubernetes.Interface, dynamic.Interface, discovery.DiscoveryInterface, error) {


I'm not sure that it makes sense to include the discovery client in GetClients here (or the private getClients)? The reason that GetClients exists is because we need k8s clients that are dependent on the request context, as they use the credentials/token of the request. This isn't the case for the discovery client, I would think?

Mmm, that's true... we can just use the typedClient and later on, call typedClient.Discovery(). However, I wasn't able to swap it by the fake.FakeDiscovery in the tests (that's why c7004f8).
If we go this way (ie not using the carvel code), I'll give it another thought anyway

absoludity · 2022-01-11T23:51:47Z

cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server_data_resources.go

+// Apparently, App CRs not being created by a "kapp deploy" command don't have the proper annotations.
+// So, in order to retrieve the annotation value,
+// we have to get the ConfigMap <AppName>-ctrl and, then, fetch the value of the key "labelValue" in data.spec.
+// See https://kubernetes.slack.com/archives/CH8KCCKA5/p1637842398026700
+// https://github.com/vmware-tanzu/carvel-kapp-controller/issues/430


So this sounds pretty temporary? Will the longer term solution be an upstream fix to the App CR?

I don't really know. The issue is not very active and I haven't heard about it on the community slack recently.
As of today (v 0.44 got released), it still needs the -ctrl suffix.

absoludity · 2022-01-11T23:54:14Z

cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server_data_resources.go

+
+	// create channel to collect the matching k8s resources thus avoid blocking the waitgroup
+	// ussing a buffered channel with size 100; assumption: 60% native k8s resources + 40 CRDs
+	ch := make(chan resourceAndGvk, 100)


Yeah, it's not clear to me why you're using a buffered channel, especially of a size 100. It implies that you think your consumers of the channel data will get very behind the producers.

absoludity · 2022-01-11T23:55:14Z

cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server_data_resources.go

+			continue
+		}
+		gv, err := schema.ParseGroupVersion(apiResourceList.GroupVersion)
+		if err != nil {


Not sure why we'd just ignore an error like this? (There could be a reason, but it should be documented here).

Idem as in another comment, just using the k8s code directly (https://github.com/kubernetes/kubectl/blob/release-1.22/pkg/cmd/apiresources/apiresources.go#L172) :S
However, you're right, we should do something (at least a warning msg in the console)

absoludity · 2022-01-11T23:58:09Z

cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server_data_resources.go

+				continue
+			}
+			// filter to resources that support the specified "List" verb
+			if !sets.NewString(resource.Verbs...).HasAll([]string{"list"}...) {


Hmm, I would have thought !sets.NewString(resource.Verbs...).Has("list") would be simpler and clearer?

Yep, I just used the same code as in the kubectl client, which is aimed at supporting any set of verbs. In our scenario, your suggestion is way simpler. Thanks for pointing it out

absoludity · 2022-01-12T00:05:44Z

cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server_data_resources.go

+		close(ch)
+	}()
+
+	// once closed, iterate over the closed channel and push the objects to the response


I don't think this comment is correct? That is, the channel will be closed when the wg.Wait() returns, but that's correctly happening inside a go routine. So the code below will start processing the data on the channel as it arrives (as it should) and will only complete (the range ch) when the channel is closed and all data has been drained.

Which still leaves me wondering why you need a large buffer on your channel? The work of the consumer here should be very quick compared to the producer (which is doing network fetches), so I'd not think you'd need a buffer at all.

Ah, right! I mixed it up... initially, I had an unbuffered channel, so I needed the goroutine encapsulating the wg.wait() and close(ch). Then I moved to a buffered one, but left this goroutine back.

Yes.. the consumer is just aggregating data, which should be really fast; not really needing a buffer here.

absoludity · 2022-01-12T00:08:34Z

cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server_data_resources.go

+			}
+			wg.Add(1)
+			// spawn a goroutine to fetch the matching k8s resources
+			go func(resource metav1.APIResource, ch chan resourceAndGvk, wg *sync.WaitGroup) {


Yes, unless I'm missing something, this code is calling List on every listable resource in all namespaces on the cluster? I'm skeptical that we should be doing this?

absoludity · 2022-01-12T00:23:28Z

cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server_data_resources.go

+				}
+
+				// ignore the namespace as the resources may appear in different ns from which the PackageInstall is created
+				resources, err := dynamicClient.Resource(gvr).List(ctx, listOptions)


Interesting, I'd thought the above syntax was for non-namespaced (ie. cluster-level) resources, rather than resources across all namespaces, but see that the dynamic client apparently supports both (NamespaceableResourceInterface which is implemented to do both)

Yes indeed! I initially thought the same, even I had an if for both cases. After I hit the issue I mentioned (getting resources from other namespaces rather than the one in which the PackageInstall is), I noticed the dynamic client actually supports both.

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com> Conflicts: cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server_ctrl_packages.go

antgamdia · 2022-01-12T17:01:05Z

Either way, what I'd hope we can do here is use kapp itself to get the resources for an app, so we're not implementing our own version. Why can we not create a NewInspectionCommand and run it, similar to how we run Helm commands? That way, we'd not be re-implementing kapp functionality (listing all resources for an app), in the same way we try to run the Helm CLI commands where we can?

Quick notes, I'm in a rush; will try to explain myself better later:

TLDR: I don't see any performance increase when using Kapp's code directly (I'd say it's slower)

View alternative version using code from Kapp at https://github.com/kubeapps/kubeapps/pull/4068/files

I have successfully used the Kapp code as in the Helm scenario.
There is no way to configure it with a custom rest.Config for generating the actions. You have to copy some parts (creating the clients: NewResourceTypesImpl, NewResourcesImpl, NewIdentifiedResources, NewApps)
Then using the same code as in the cmd inspect action

Perhaps it's a matter of our Kubernetes client default config; the QPS and other params. I'll have a look later.
Carvel ones: https://github.com/vmware-tanzu/carvel-kapp/blob/develop/pkg/kapp/cmd/core/kube_api_flags.go

Edit: Increasing QPS/Burst does the trick... we were throttling our calls to 5req/s; meaning ~100 resources/5s each =~ 20s, which is consistent with my results. The carvel qps is set to 1000, that's why my approach seemed slow 🤦‍♂️

carvelResourcesMine.mp4

My plan is:

Create a PR allowing customizing the QPS/Burst for the whole kubeappsapis server
Evaluate both approaches with the same configuration
- Mainly because of maintainability reasons, we probably go for the Kapp way

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

absoludity · 2022-01-13T00:23:13Z

Excellent, nice work Antonio! I'm +1 whichever way you decide, though I personally lean the same way (use Kapp and get upstream fixed so it can be configured, so we don't maintain more code unnecessarily).

absoludity · 2022-01-13T00:32:16Z

Excellent, nice work Antonio! I'm +1 whichever way you decide, though I personally lean the same way (use Kapp and get upstream fixed so it can be configured, so we don't maintain more code unnecessarily).

Looks like you haven't yet pushed your changes here (no rush, I just wanted to leave a +1 after seeing those, but will do so tonight / tomorrow)

antgamdia · 2022-01-13T09:06:43Z

Looks like you haven't yet pushed your changes here (no rush, I just wanted to leave a +1 after seeing those, but will do so tonight / tomorrow)

Not yet, I wanted to discuss both alternatives; that's why I pushed the changes in a separate PR (#4068).

The current situation is: my approach is fully working (with the QPS fix) but the alternative is not yet (not getting resources from other namespaces yet).

My plan: working today on having the carvel approach fully working and tested and benchmark both alternatives.

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

antgamdia · 2022-01-13T18:52:10Z

Closing this PR in favor of #4068.
Thanks for your comments and feedback on this version!

antgamdia added 2 commits January 11, 2022 17:02

Minor rename

d6e1cc9

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

Get every resource generated by a carvel pkg installation

544f75e

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

antgamdia changed the title ~~3854 get resources~~ Retrieve every k8s resource generated by a Carvel package installation Jan 11, 2022

antgamdia added 3 commits January 11, 2022 19:53

Extract the discoveryClient to the clientGetter for testing

c7004f8

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

Add TestGetInstalledPackageResourceRefs

78b2115

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

Fix bug returning duplicated refs

3e19d61

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

antgamdia mentioned this pull request Jan 11, 2022

Remaining work for running Kubeapps with Carvel #3849

Closed

21 tasks

antgamdia commented Jan 11, 2022

View reviewed changes

antgamdia marked this pull request as ready for review January 11, 2022 21:39

absoludity reviewed Jan 12, 2022

View reviewed changes

Merge branch 'master'

5bf6148

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com> Conflicts: cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server_ctrl_packages.go

antgamdia mentioned this pull request Jan 12, 2022

Retrieve every k8s resource generated by a Carvel package installation (alt) #4068

Merged

antgamdia added 4 commits January 12, 2022 22:41

Add QPS and Burst params in api server

42fd4f7

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

Add qps/burst params in the chart

db69d5b

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

Minor fixes

0b8b189

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

Merge branch 'kubeappsapisQPS' into 3854-getResources

4271c85

antgamdia mentioned this pull request Jan 12, 2022

Allow customizing QPS/Burst in the kubeapps-api service #4069

Merged

antgamdia added 2 commits January 13, 2022 10:33

Merge branch 'master' into 3854-getResources

6155338

Remove wrongly commited changes

621b1fa

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

antgamdia marked this pull request as draft January 13, 2022 17:43

antgamdia closed this Jan 13, 2022

antgamdia mentioned this pull request Jan 25, 2022

Wait until CM is created when installing a carvel pkg #4139

Merged

antgamdia deleted the 3854-getResources branch May 31, 2022 11:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieve every k8s resource generated by a Carvel package installation #4062

Retrieve every k8s resource generated by a Carvel package installation #4062

antgamdia commented Jan 11, 2022 •

edited

Loading

antgamdia Jan 11, 2022

antgamdia Jan 11, 2022

antgamdia Jan 11, 2022

absoludity Jan 12, 2022

antgamdia Jan 12, 2022

absoludity Jan 13, 2022

antgamdia Jan 11, 2022

absoludity Jan 11, 2022

absoludity left a comment

absoludity Jan 11, 2022

antgamdia Jan 12, 2022

absoludity Jan 11, 2022

antgamdia Jan 12, 2022

absoludity Jan 11, 2022

absoludity Jan 11, 2022

antgamdia Jan 12, 2022 •

edited

Loading

absoludity Jan 11, 2022

antgamdia Jan 12, 2022

absoludity Jan 12, 2022

antgamdia Jan 12, 2022

absoludity Jan 12, 2022

absoludity Jan 12, 2022

antgamdia Jan 12, 2022

antgamdia commented Jan 12, 2022 •

edited

Loading

absoludity commented Jan 13, 2022

absoludity commented Jan 13, 2022

antgamdia commented Jan 13, 2022

antgamdia commented Jan 13, 2022

Retrieve every k8s resource generated by a Carvel package installation #4062

Retrieve every k8s resource generated by a Carvel package installation #4062

Conversation

antgamdia commented Jan 11, 2022 • edited Loading

Description of the change

Benefits

Possible drawbacks

Applicable issues

Additional information

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

absoludity left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antgamdia Jan 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antgamdia commented Jan 12, 2022 • edited Loading

absoludity commented Jan 13, 2022

absoludity commented Jan 13, 2022

antgamdia commented Jan 13, 2022

antgamdia commented Jan 13, 2022

antgamdia commented Jan 11, 2022 •

edited

Loading

antgamdia Jan 12, 2022 •

edited

Loading

antgamdia commented Jan 12, 2022 •

edited

Loading