Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering old data out of samples #279

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

jdhudson3
Copy link
Contributor

@jdhudson3 jdhudson3 commented Nov 5, 2024

What does this PR do?

On especially large clusters, we sometimes see a significant number of old/unused resources such as:

  • Jobs that have finished, but are retained
  • Pods that are finished (often as a result of the above), but are retained
  • Replicasets that have been cycled out, and are not in use

This data is not useful to us when no active pods are running, and there is no reason to add it to the export or process it on the Apptio side. This PR removes the sending of data in these cases.

Unfortunately this cannot be easily filtered out of the informer stored data (would likely require building our own watchers plus other behind the scenes components), but that is a potential future optimization for agent utilization on larger clusters.

Where should the reviewer start?

How should this be manually tested?

Ran on a test cluster, validated sample data did not contain:

  • Pods that were terminated
  • Replicasets with 0 replicas
  • jobs that had completed a while ago

Which were all present in previous samples.

Any background context you want to provide?

Large clusters tend to orphan more resources, currently we package, upload and store a bunch of stuff that's useless to us.

What picture best describes this PR (optional but encouraged)?

What are the relevant Github Issues?

Developer Done List

  • Tests Added/Updated
  • Updated README.md
  • Verified backward compatible
  • Verified database migrations will not be catastrophic
  • Considered Security, Availability and Confidentiality

For the Reviewer:

By approving this PR, the reviewer acknowledges that they have checked all items in this done list.

Reviewer/Approval Done List

  • Tests Pass Locally
  • CI Build Passes
  • Verified README.md is updated
  • Verified changes are backward compatible
  • Reviewed impact to Security, Availability and Confidentiality (if issue found, add comments and request changes)

@jdhudson3 jdhudson3 marked this pull request as ready for review November 6, 2024 18:07
Comment on lines +146 to +150
for _, v := range resource.Status.ContainerStatuses {
if v.State.Terminated != nil && v.State.Terminated.FinishedAt.After(previousHour) {
canSkip = false
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interest why do we need to this check as Succeeded/Failed are defined below
Succeeded: All containers in the Pod have terminated in success, and will not be restarted.
Failed: All containers in the Pod have terminated, and at least one container has terminated in failure. That is, the container either exited with non-zero status or was terminated by the system, and is not set for automatic restarting.

Copy link
Contributor Author

@jdhudson3 jdhudson3 Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought for keeping this data was that we may want information related to recently shut down pods since we can detect that here, and we would want to evaluate the timestamp to ensure that we capture recent shutdowns

We could remove all entries and not impact current allocation methodology.

retrieval/k8s/k8s_stats.go Outdated Show resolved Hide resolved
retrieval/k8s/k8s_stats.go Outdated Show resolved Hide resolved
jyin-apptio
jyin-apptio previously approved these changes Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants