Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add k8s util functions for aws container insights #3367

Merged
merged 1 commit into from
Jun 1, 2021
Merged

Add k8s util functions for aws container insights #3367

merged 1 commit into from
Jun 1, 2021

Conversation

pxaws
Copy link
Contributor

@pxaws pxaws commented May 11, 2021

Description:
Add the util functions to call k8s api server to get information about

  • endpoint
  • job
  • pod
  • node
  • replica set

The "List & Watch" api of cache.Reflector is used to monitor the incremental changes of resources. The benefit is that we can get resource information update in time. If the resources just occasionally slightly changes, the impact to API server is supposed to be minimal.

Link to tracking Issue:
#2307

Testing:
Unit tests

@pxaws pxaws marked this pull request as ready for review May 11, 2021 16:13
@pxaws pxaws requested a review from anuraaga as a code owner May 11, 2021 16:13
@pxaws pxaws requested a review from a team May 11, 2021 16:13
Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe what is AWS specific about this code? Also, what are the differences vs k8sprocessor? https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sprocessor

@pxaws
Copy link
Contributor Author

pxaws commented May 12, 2021

Can you describe what is AWS specific about this code? Also, what are the differences vs k8sprocessor? https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sprocessor

Some background:

In the upstream, there are two components that calls k8s api server:

Difference

Container insights implementation is different on the following two aspects:

@anuraaga
Copy link
Contributor

So the main question becomes how can we share between k8sclusterreceiver and this processor - there should really be only a single source of information about a k8s cluster in any OTel collector, having multiple seems like a lot of duplication. Changes needed to the k8sclusterreceiver that could allow more sharing (or even benefit it) should be considered. For example

Instead of using the kubernetes informers api, we use a lower-level cache.reflector api to monitor the resources change.

Is this a useful optimization for the k8sclusterreceiver? If so we should do it

collects more metrics and metadata about k8s resources and requires more permission than container insights

If these permissions are too open, then as a best practice allowing this to be configured in k8sclusterreceiver seems useful. Though if extracting a library to read k8s information then it could just be that library that's configurable, not the receiver.

doesn't provide info for v1.Endpoints which container insights used to find service to pod mapping

Seems like this can be added incrementally rather than having duplication of every k8s resource fetch

Our current container insights implementation binds the api call with the metric or metadata that we want to use

This is a sort of transformation. It'd be great to have a single source of truth about k8s cluster information in the collector codebase, probably outside the scope of a receiver (so e.g., github.com/open-telemetry/opentelemetry-collector-contrib/internal/...) and this receiver could just transform the information into the correct metric names.

Since this code is being added to this upstream repo now, we need to take advantage of it to make sure we add in a way that fits in with the rest of the components. kubernetes is a standard, so it's sort of a "code smell" to see so much duplication between k8s parts.

@pxaws
Copy link
Contributor Author

pxaws commented May 12, 2021

I agree that otel should have some common utils for get metrics/metadata about kubernetes resources. But given the current situation in which three components (including aws container insights) all has some duplicate functionalities there, it would be quite a time-consuming task to refactor them into a single place. In addition, we might also need to involve the owners of k8sclusterreceiver and k8sprocessor to work out a plan to do the refactoring together. Considering the timeline for launching container insights with otel, I suggest we take an iterative approach:

  1. get the current code inside the aws internal folder (so that only container insights can consume it)
  2. create a task to acknowledge the issue and plans for further improvement

Does it sound ok to you?

@anuraaga
Copy link
Contributor

I would want some sort of roadmap for 2) I think. We can't have the collector just be a dumping ground of code, it's a place where we commit to working on making the collector a better place. Issues such as the use of OpenCensus APIs in AWS components have been known for a long time but are only now being fixed after being forced by API removal. I'm skeptical the duplication would ever get fixed if just filing an issue.

@bogdandrutu @tigrannajaryan Does this way of thinking align with where you want the collector to go?

@pxaws
Copy link
Contributor Author

pxaws commented May 18, 2021

I would want some sort of roadmap for 2) I think. We can't have the collector just be a dumping ground of code, it's a place where we commit to working on making the collector a better place. Issues such as the use of OpenCensus APIs in AWS components have been known for a long time but are only now being fixed after being forced by API removal. I'm skeptical the duplication would ever get fixed if just filing an issue.

@bogdandrutu @tigrannajaryan Does this way of thinking align with where you want the collector to go?

I created a feature request #3424 about this issue. As it seems to be a big task which might take a long time, I think we should focus on the current code and plan for future improvement.

internal/aws/k8s/k8sclient/clientset.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/clientset.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/job.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/job.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/obj_store.go Show resolved Hide resolved
@anuraaga anuraaga requested review from Aneurysm9 and mxiamxia May 19, 2021 00:57
internal/aws/k8s/k8sclient/clientset.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/endpoint.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/endpoint.go Show resolved Hide resolved
internal/aws/k8s/k8sclient/clientset.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/job.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/job.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/job.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/job.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/job.go Outdated Show resolved Hide resolved
@pxaws
Copy link
Contributor Author

pxaws commented May 19, 2021

I discussed the feature request #3424 in otel sync up meeting. It seems like refactoring out a common k8s utils for otel is a long term goal. That involves two aspects:

  • reduce the code duplication
  • have a shared layer for multiple components to reduce the connections to k8s api server (rather than each component maintaining its own connection)

internal/aws/k8s/k8sclient/clientset.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/clientset.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/clientset.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/clientset.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/endpoint.go Outdated Show resolved Hide resolved
internal/aws/k8s/k8sclient/obj_store.go Show resolved Hide resolved
internal/aws/k8s/go.mod Show resolved Hide resolved
internal/aws/k8s/k8sclient/obj_store.go Show resolved Hide resolved
internal/aws/k8s/k8sclient/obj_store.go Show resolved Hide resolved
Copy link
Member

@mxiamxia mxiamxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@bogdandrutu bogdandrutu merged commit 984264b into open-telemetry:main Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants