-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics for configuration policy errors #84
Metrics for configuration policy errors #84
Conversation
3130758
to
6d3ee40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be able to add a test that creates a policy referencing a CRD that doesn't exist and then making sure the error shows up as expected? I don't have an idea for validating the system error path right now.
096619a
to
1b8207e
Compare
1b8207e
to
635201e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these metrics are only being added to, should they be Counters rather than Gauges?
This PR creates new metrics for user and system errors and sends them when a config policy is invalid (no spec or remediationAction) [user], when the object template CRD cannot be found [user] and when the API call to update status fails [system]. It also adds the check for config policies not having a spec, which didn't exist before and caused some weird undefined behavior in policies without specs. - ref: https://issues.redhat.com/browse/ACM-2216?filter=-1 Signed-off-by: Will Kutler <wkutler@redhat.com>
26039a8
to
49723a1
Compare
@dhaiducek Does the Edit: Thinking about this more... I guess after a fix there will just no longer be entries created for the metric. Right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the test! This looks good even though I have some fuzziness on this being a counter. Look forward to trying it out.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gparvin, willkutler The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
https://issues.redhat.com/browse/ACM-2216?filter=-1
Adds 2 metrics - user error and system error gauges that increment when certain errors are found. They are labeled with the parent policy and policy template that caused the error, as well as the reason for the error.
Also adds a check to ensure the spec of a config policy is not nil - when testing the metrics, I found that if a config policy did not have the spec field, the error message would be incorrect and the parent policy would be compliant