Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/k8scluster] add support for observing resources for a specific namespace #35727

Open
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

bacherfl
Copy link
Contributor

@bacherfl bacherfl commented Oct 10, 2024

Description

This PR extends the k8scluster receiver with an option to limit the observed resources to a specific namespace.

Link to tracking issue

Fixes #9401

Testing

added unit and e2e tests

Documentation

Added section about how to make use of Roles and RoleBindings instead of ClusterRoles and ClusterRoleBindings

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
…s been specified

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
@bacherfl bacherfl marked this pull request as ready for review October 17, 2024 11:49
@bacherfl bacherfl requested a review from a team as a code owner October 17, 2024 11:49
@bacherfl
Copy link
Contributor Author

bacherfl commented Nov 6, 2024

I see e2e tests are failing on main anyways: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/11701936581/job/32589392145

I wonder if #36114 introduced some flakiness.

yes i think part of the flakiness might be due to the addition of the cronjob object which causes multiple pods to spin up during the duration of the test, leading to a varying number of metrics received in the sink

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
@ChrsMark
Copy link
Member

ChrsMark commented Nov 6, 2024

yes i think part of the flakiness might be due to the addition of the cronjob object which causes multiple pods to spin up during the duration of the test, leading to a varying number of metrics received in the sink

Sounds plausible, we should restrict the number of active jobs/pods to 1 to avoid having scheduling/timing related flakiness: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/36114/files#diff-c08d5cc648ae8a9f364187a6dbf9a8b1e01070249b7a8669537009ccc56e701fR147.

I will try to send a PR for this.

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait for #36235 to get merged and re-base this one. Meanwhile I'm a bit worried about the changes to the existing tests.

@bacherfl
Copy link
Contributor Author

bacherfl commented Nov 7, 2024

Let's wait for #36235 to get merged and re-base this one. Meanwhile I'm a bit worried about the changes to the existing tests.

agree, i did some changes in the meantime and got the tests working on this PR, but some changes may have been unneccessary - I will try out everything after rebasing and keep the changes as minimal as possible

TylerHelmuth pushed a commit that referenced this pull request Nov 7, 2024
…to 1 (#36235)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
Fixes what was described at
#35727 (comment).

After
#36114
the `k8scluster` receiver's e2e tests started showing some flakiness
([example](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/11701936581/job/32589392145)).

With this change we ensure that only 1 active job/pod of the cronjob
will be present for the whole lifetime of the test to avoid hitting
timing/scheduling related flakiness.

@bacherfl could you also take a look here?

<!-- Issue number (e.g. #1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes 

<!--Describe what testing was performed and which tests were added.-->
#### Testing

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
pull bot pushed a commit to abaguas/opentelemetry-collector-contrib that referenced this pull request Nov 7, 2024
…to 1 (open-telemetry#36235)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
Fixes what was described at
open-telemetry#35727 (comment).

After
open-telemetry#36114
the `k8scluster` receiver's e2e tests started showing some flakiness
([example](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/11701936581/job/32589392145)).

With this change we ensure that only 1 active job/pod of the cronjob
will be present for the whole lifetime of the test to avoid hitting
timing/scheduling related flakiness.

@bacherfl could you also take a look here?

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes 

<!--Describe what testing was performed and which tests were added.-->
#### Testing

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
# Conflicts:
#	receiver/k8sclusterreceiver/testdata/e2e/cluster-scoped/expected.yaml
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
@bacherfl
Copy link
Contributor Author

bacherfl commented Nov 8, 2024

Alright this should now be ready again - @ChrsMark I reverted the changes I made earlier in the waitForData function as it turned out with the recent fix being merged they were not required anymore

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a nit comment. Thank's @bacherfl!

(it will need a make generate to make the CI happy)

receiver/k8sclusterreceiver/e2e_test.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

k8sclusterreceiver: Fetch namespace scoped metrics from the Master API
4 participants