-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieve every k8s resource generated by a Carvel package installation #4062
Conversation
Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>
Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>
Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>
Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>
Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>
// have a look at the CachedDiscoveryClient in k8s | ||
// https://github.com/kubernetes/client-go/blob/release-1.22/discovery/cached/disk/cached_discovery.go | ||
discoveryClient := typedClient.Discovery() | ||
if discoveryClient == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the discoveryClient
as part of the clientGetter so that we can swap it and mock it later for testing
if err != nil { | ||
return nil, err | ||
} | ||
discoveryClient := discovery.NewDiscoveryClient(client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we still create a new discovery, but as Michael did in the past, we are using the RESTMapper just in production. For testing, we mock the entire kindToResource
function
} | ||
wg.Add(1) | ||
// spawn a goroutine to fetch the matching k8s resources | ||
go func(resource metav1.APIResource, ch chan resourceAndGvk, wg *sync.WaitGroup) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to make it as parallel as possible, but the bottleneck seems to be in the k8s apiserver itself (in my local environment, haven't tested in a real cluster yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, unless I'm missing something, this code is calling List
on every listable resource in all namespaces on the cluster? I'm skeptical that we should be doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, yes, it's exactly what the code is doing :S
However, I can't think of another way of "getting an unknown number of resources matching a label in an unknown number of different namespaces". Perhaps by banning some resources we certainly know they can't be created (like Nodes
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, OK. If this is what the kapp code does as well, I assume they've thought about it more than us (and would prefer to just use their code, thanks for the second version).
|
||
// create channel to collect the matching k8s resources thus avoid blocking the waitgroup | ||
// ussing a buffered channel with size 100; assumption: 60% native k8s resources + 40 CRDs | ||
ch := make(chan resourceAndGvk, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not used to channels yet, I've made this assumption, but please feel free to tell me more about the buffer sizes here if it is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not clear to me why you're using a buffered channel, especially of a size 100. It implies that you think your consumers of the channel data will get very behind the producers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone through the channel code that you mentioned you were uncertain about and left a few thoughts, as well as taken a look through the rest of the code. My main question or concern is that this appears to be getting all resources across all namespaces of the cluster to filter them by whatever the kapp selector is. If that's the case, then yep, I think your concern about the timing is a valid one. Is that the only option we've got?
I had a brief look at the kapp code that you pointed to, I didn't think it appeared to be doing so, but rather doing something much more specific (it appears to be listing identified resources only, but could be reading it incorrectly). Then I read the docs and tried the following on my dev cluster:
kapp inspect -a 'label:'
...
878 resources
Wow. So it's possible to use kapp to do that, but is that what it's doing by default when you inspect an app? (Haven't checked but assume you have).
Either way, what I'd hope we can do here is use kapp itself to get the resources for an app, so we're not implementing our own version. Why can we not create a NewInspectionCommand
and run it, similar to how we run Helm commands? That way, we'd not be re-implementing kapp functionality (listing all resources for an app), in the same way we try to run the Helm CLI commands where we can?
} | ||
} | ||
|
||
// GetClients ensures a client getter is available and uses it to return both a typed and dynamic k8s client. | ||
func (s *Server) GetClients(ctx context.Context, cluster string) (kubernetes.Interface, dynamic.Interface, error) { | ||
func (s *Server) GetClients(ctx context.Context, cluster string) (kubernetes.Interface, dynamic.Interface, discovery.DiscoveryInterface, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that it makes sense to include the discovery client in GetClients
here (or the private getClients
)? The reason that GetClients
exists is because we need k8s clients that are dependent on the request context, as they use the credentials/token of the request. This isn't the case for the discovery client, I would think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, that's true... we can just use the typedClient
and later on, call typedClient.Discovery()
. However, I wasn't able to swap it by the fake.FakeDiscovery
in the tests (that's why c7004f8).
If we go this way (ie not using the carvel code), I'll give it another thought anyway
// Apparently, App CRs not being created by a "kapp deploy" command don't have the proper annotations. | ||
// So, in order to retrieve the annotation value, | ||
// we have to get the ConfigMap <AppName>-ctrl and, then, fetch the value of the key "labelValue" in data.spec. | ||
// See https://kubernetes.slack.com/archives/CH8KCCKA5/p1637842398026700 | ||
// https://github.com/vmware-tanzu/carvel-kapp-controller/issues/430 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this sounds pretty temporary? Will the longer term solution be an upstream fix to the App CR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know. The issue is not very active and I haven't heard about it on the community slack recently.
As of today (v 0.44 got released), it still needs the -ctrl
suffix.
|
||
// create channel to collect the matching k8s resources thus avoid blocking the waitgroup | ||
// ussing a buffered channel with size 100; assumption: 60% native k8s resources + 40 CRDs | ||
ch := make(chan resourceAndGvk, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not clear to me why you're using a buffered channel, especially of a size 100. It implies that you think your consumers of the channel data will get very behind the producers.
continue | ||
} | ||
gv, err := schema.ParseGroupVersion(apiResourceList.GroupVersion) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we'd just ignore an error like this? (There could be a reason, but it should be documented here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem as in another comment, just using the k8s code directly (https://github.com/kubernetes/kubectl/blob/release-1.22/pkg/cmd/apiresources/apiresources.go#L172) :S
However, you're right, we should do something (at least a warning msg in the console)
continue | ||
} | ||
// filter to resources that support the specified "List" verb | ||
if !sets.NewString(resource.Verbs...).HasAll([]string{"list"}...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I would have thought !sets.NewString(resource.Verbs...).Has("list")
would be simpler and clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I just used the same code as in the kubectl client, which is aimed at supporting any set of verbs. In our scenario, your suggestion is way simpler. Thanks for pointing it out
close(ch) | ||
}() | ||
|
||
// once closed, iterate over the closed channel and push the objects to the response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this comment is correct? That is, the channel will be closed when the wg.Wait()
returns, but that's correctly happening inside a go routine. So the code below will start processing the data on the channel as it arrives (as it should) and will only complete (the range ch
) when the channel is closed and all data has been drained.
Which still leaves me wondering why you need a large buffer on your channel? The work of the consumer here should be very quick compared to the producer (which is doing network fetches), so I'd not think you'd need a buffer at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right! I mixed it up... initially, I had an unbuffered channel, so I needed the goroutine encapsulating the wg.wait()
and close(ch)
. Then I moved to a buffered one, but left this goroutine back.
Yes.. the consumer is just aggregating data, which should be really fast; not really needing a buffer here.
} | ||
wg.Add(1) | ||
// spawn a goroutine to fetch the matching k8s resources | ||
go func(resource metav1.APIResource, ch chan resourceAndGvk, wg *sync.WaitGroup) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, unless I'm missing something, this code is calling List
on every listable resource in all namespaces on the cluster? I'm skeptical that we should be doing this?
} | ||
|
||
// ignore the namespace as the resources may appear in different ns from which the PackageInstall is created | ||
resources, err := dynamicClient.Resource(gvr).List(ctx, listOptions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I'd thought the above syntax was for non-namespaced (ie. cluster-level) resources, rather than resources across all namespaces, but see that the dynamic client apparently supports both (NamespaceableResourceInterface
which is implemented to do both)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed! I initially thought the same, even I had an if
for both cases. After I hit the issue I mentioned (getting resources from other namespaces rather than the one in which the PackageInstall is), I noticed the dynamic client actually supports both.
Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com> Conflicts: cmd/kubeapps-apis/plugins/kapp_controller/packages/v1alpha1/server_ctrl_packages.go
Quick notes, I'm in a rush; will try to explain myself better later: TLDR: I don't see any performance increase when using Kapp's code directly (I'd say it's slower) View alternative version using code from Kapp at https://github.com/kubeapps/kubeapps/pull/4068/files
Perhaps it's a matter of our Kubernetes client default config; the QPS and other params. I'll have a look later. Edit: Increasing QPS/Burst does the trick... we were throttling our calls to 5req/s; meaning carvelResourcesMine.mp4My plan is:
|
Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>
Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>
Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>
Excellent, nice work Antonio! I'm +1 whichever way you decide, though I personally lean the same way (use Kapp and get upstream fixed so it can be configured, so we don't maintain more code unnecessarily). |
Looks like you haven't yet pushed your changes here (no rush, I just wanted to leave a +1 after seeing those, but will do so tonight / tomorrow) |
Not yet, I wanted to discuss both alternatives; that's why I pushed the changes in a separate PR (#4068). The current situation is: my approach is fully working (with the QPS fix) but the alternative is not yet (not getting resources from other namespaces yet). My plan: working today on having the carvel approach fully working and tested and benchmark both alternatives. |
Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>
Closing this PR in favor of #4068. |
Description of the change
This PR removes the workaround we implemented when we released it last time and implements a generic way to retrieve all the kubernetes resources matching a certain label. This way we can fetch every k8s resource created as a result of a Carvel PackageInstall.
Benefits
The retrieved resources for a carvel package will now include every possible one, not a fixed list of pods, svc and deployments.
Possible drawbacks
Hitting the k8s APIs for querying every possible available resource is expensive in time. I guess it is also due to my Kubelet config in kind with limited resources, but I'm still worried about it.
Waited for <some time> ms due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/<api resource>?labelSelector=kapp.k14s.io%2Fapp%3D<my-id>
console messagesNote that the
kapp
CLI is already is able to fetch its generated resources; however, solely using a Kubernetes client (not thekapp
binary), there is no CR holding this information.See how the
kapp inspect
command under the hood here: https://github.com/vmware-tanzu/carvel-kapp/blob/v0.44.0/pkg/kapp/cmd/app/inspect.go#L76Edit: they are doing a similar approach, but with a cache layer above: https://github.com/vmware-tanzu/carvel-kapp/blob/v0.44.0/pkg/kapp/resources/resource_types.go#L111
Applicable issues
GetInstalledPackageResourceRefs
#3854Additional information
Some resources from a Habor installation: