-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering old data out of samples #279
base: master
Are you sure you want to change the base?
Conversation
for _, v := range resource.Status.ContainerStatuses { | ||
if v.State.Terminated != nil && v.State.Terminated.FinishedAt.After(previousHour) { | ||
canSkip = false | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interest why do we need to this check as Succeeded/Failed are defined below
Succeeded: All containers in the Pod have terminated in success, and will not be restarted.
Failed: All containers in the Pod have terminated, and at least one container has terminated in failure. That is, the container either exited with non-zero status or was terminated by the system, and is not set for automatic restarting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought for keeping this data was that we may want information related to recently shut down pods since we can detect that here, and we would want to evaluate the timestamp to ensure that we capture recent shutdowns
We could remove all entries and not impact current allocation methodology.
What does this PR do?
On especially large clusters, we sometimes see a significant number of old/unused resources such as:
This data is not useful to us when no active pods are running, and there is no reason to add it to the export or process it on the Apptio side. This PR removes the sending of data in these cases.
Unfortunately this cannot be easily filtered out of the informer stored data (would likely require building our own watchers plus other behind the scenes components), but that is a potential future optimization for agent utilization on larger clusters.
Where should the reviewer start?
How should this be manually tested?
Ran on a test cluster, validated sample data did not contain:
Which were all present in previous samples.
Any background context you want to provide?
Large clusters tend to orphan more resources, currently we package, upload and store a bunch of stuff that's useless to us.
What picture best describes this PR (optional but encouraged)?
What are the relevant Github Issues?
Developer Done List
For the Reviewer:
By approving this PR, the reviewer acknowledges that they have checked all items in this done list.
Reviewer/Approval Done List