Filtering old data out of samples #279

jdhudson3 · 2024-11-05T15:43:26Z

What does this PR do?

On especially large clusters, we sometimes see a significant number of old/unused resources such as:

Jobs that have finished, but are retained
Pods that are finished (often as a result of the above), but are retained
Replicasets that have been cycled out, and are not in use

This data is not useful to us when no active pods are running, and there is no reason to add it to the export or process it on the Apptio side. This PR removes the sending of data in these cases.

Unfortunately this cannot be easily filtered out of the informer stored data (would likely require building our own watchers plus other behind the scenes components), but that is a potential future optimization for agent utilization on larger clusters.

Where should the reviewer start?

How should this be manually tested?

Ran on a test cluster, validated sample data did not contain:

Pods that were terminated
Replicasets with 0 replicas
jobs that had completed a while ago

Which were all present in previous samples.

Any background context you want to provide?

Large clusters tend to orphan more resources, currently we package, upload and store a bunch of stuff that's useless to us.

What picture best describes this PR (optional but encouraged)?

What are the relevant Github Issues?

Developer Done List

Tests Added/Updated
Updated README.md
Verified backward compatible
Verified database migrations will not be catastrophic
Considered Security, Availability and Confidentiality

For the Reviewer:

By approving this PR, the reviewer acknowledges that they have checked all items in this done list.

Reviewer/Approval Done List

Tests Pass Locally
CI Build Passes
Verified README.md is updated
Verified changes are backward compatible
Reviewed impact to Security, Availability and Confidentiality (if issue found, add comments and request changes)

retrieval/k8s/k8s_stats.go

jyin-apptio · 2024-11-06T19:26:35Z

retrieval/k8s/k8s_stats.go

+			for _, v := range resource.Status.ContainerStatuses {
+				if v.State.Terminated != nil && v.State.Terminated.FinishedAt.After(previousHour) {
+					canSkip = false
+				}
+			}


Interest why do we need to this check as Succeeded/Failed are defined below
Succeeded: All containers in the Pod have terminated in success, and will not be restarted.
Failed: All containers in the Pod have terminated, and at least one container has terminated in failure. That is, the container either exited with non-zero status or was terminated by the system, and is not set for automatic restarting.

My thought for keeping this data was that we may want information related to recently shut down pods since we can detect that here, and we would want to evaluate the timestamp to ensure that we capture recent shutdowns

We could remove all entries and not impact current allocation methodology.

retrieval/k8s/k8s_stats.go

jdhudson3 added 3 commits November 5, 2024 10:38

filtering old data out of samples

82f392c

Update k8s_stats.go

c648b4f

comment

778a31a

daniel-spray reviewed Nov 5, 2024

View reviewed changes

retrieval/k8s/k8s_stats.go Outdated Show resolved Hide resolved

jdhudson3 added 2 commits November 6, 2024 11:48

version

9356ba9

explicit test

dbd5620

jdhudson3 marked this pull request as ready for review November 6, 2024 18:07

jyin-apptio reviewed Nov 6, 2024

View reviewed changes

retrieval/k8s/k8s_stats.go Outdated Show resolved Hide resolved

jdhudson3 added 3 commits November 6, 2024 14:59

Update k8s_stats.go

106afe8

Update k8s_stats.go

597d27a

Update kubernetes_test.go

4dbb48e

jdhudson3 requested review from jyin-apptio and daniel-spray November 7, 2024 14:43

jyin-apptio previously approved these changes Nov 7, 2024

View reviewed changes

Merge branch 'master' into filter-old-data

ffa286c

jdhudson3 dismissed jyin-apptio’s stale review via ffa286c December 4, 2024 15:02

jdhudson3 requested a review from jyin-apptio December 4, 2024 16:12

jyin-apptio approved these changes Dec 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filtering old data out of samples #279

Filtering old data out of samples #279

jdhudson3 commented Nov 5, 2024 •

edited

Loading

jyin-apptio Nov 6, 2024

jdhudson3 Nov 6, 2024 •

edited

Loading

Filtering old data out of samples #279

Are you sure you want to change the base?

Filtering old data out of samples #279

Conversation

jdhudson3 commented Nov 5, 2024 • edited Loading

What does this PR do?

Where should the reviewer start?

How should this be manually tested?

Any background context you want to provide?

What picture best describes this PR (optional but encouraged)?

What are the relevant Github Issues?

Developer Done List

For the Reviewer:

By approving this PR, the reviewer acknowledges that they have checked all items in this done list.

Reviewer/Approval Done List

jyin-apptio Nov 6, 2024

Choose a reason for hiding this comment

jdhudson3 Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

jdhudson3 commented Nov 5, 2024 •

edited

Loading

jdhudson3 Nov 6, 2024 •

edited

Loading