-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditional gatherer of logs of unhealthy pods #509
Conditional gatherer of logs of unhealthy pods #509
Conversation
Skipping CI for Draft Pull Request. |
507a66d
to
1760e64
Compare
4f9d16c
to
acdb78b
Compare
Is log collection something we really want in 'insights'? cc: @smarterclayton - I know you and I have talked on this in the past; have we changed our stance/scope on what insights should focus on? |
@sferich888 I think the answer is yes. We already have some gatherers collecting various container logs and I believe some of them are really important - e.g https://github.com/openshift/insights-operator/blob/master/docs/gathered-data.md#clusteroperatorpodsandevents. Other examples are:
Also note that these logs in this PR are gathered only when the corresponding alert is firing. |
docs/insights-archive-sample/insights-operator/conditional-gatherer-rules.json
Show resolved
Hide resolved
containerFilter := "" | ||
if alertContainer != "" { | ||
containerFilter = fmt.Sprintf("^%s$", alertContainer) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just cosmetic detail, but maybe doing:
if alertContainer, ok := alertLabels["container"]; ok {
containerFilter = fmt.Sprintf("^%s$", alertContainer)
}
would be more straightforward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tremes You're right. The process that lead to this code had many steps where this was all done very differently, so as I was moving parts of the code around, I didn't realize the alertContainer
was no longer used further in the code (because it used to be).
I didn't try to reproduce the alerts to trigger this new gatherer. I added few minor suggestions, but it looks very good! Thanks. You need to satisfy the linting. |
/test lint |
/retest |
/test unit |
1 similar comment
/test unit |
8509358
to
345c9e6
Compare
@@ -27,6 +27,8 @@ const ( | |||
GatherAPIRequestCounts GatheringFunctionName = "api_request_counts_of_resource_from_alert" | |||
) | |||
|
|||
const GatherLogsOfUnhealthyPods GatheringFunctionName = "logs_of_unhealthy_pods" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably make it more specific as in api_request_counts_of_resource_from_alert
, but dunno.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I would like if this gatherer could be used in general for any case of an alert where we want to gather logs from the corresponding pod, so the only specification I could add would be that it's based on alerts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great
/test e2e-agnostic-upgrade |
1 similar comment
/test e2e-agnostic-upgrade |
/label docs-approved |
/label px-approved |
/label qe-approved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/label lgtm |
@rluders: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: natiiix, rluders The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
This PR adds a new conditional gatherer that gathers pod logs based on firing alerts. It looks for
KubePodCrashlooping
andKubePodNotReady
, and collects the appropriate log from their corresponding pods/containers.Categories
Sample Archive
pkg/gatherers/conditional/gathering_rule.schema.json
Documentation
docs/gathered-data.md
Unit Tests
pkg/gatherers/conditional/gather_logs_of_unhealthy_pods_test.go
Privacy
Yes. There are no sensitive data in the newly collected information.
We are already gathering pod logs elsewhere and this gatherer should only be triggered for pods/alerts from
openshift-*
namespaces.Changelog
No.
Breaking Changes
No, there are only additions; no modification to existing gathering.
References
Jira Task: https://issues.redhat.com/browse/CCXDEV-5499