Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metric to monitor common related objects #77

Conversation

dhaiducek
Copy link
Member

@dhaiducek dhaiducek commented Nov 4, 2022

This adds a common_related_objects metric, which is a Gauge Vector sliced by the related object and the policy. Each policy and related object pair has a gauge set to the total number of instances of that related object. For example, if two policies point to a related object, there would be two gauges for each policy/related object pair, each set to 2. Gauges set to 1 are ignored/cleaned up.

ref:

@dhaiducek
Copy link
Member Author

I'm curious about the performance hit for this implementation and wondering whether there should be a flag to shut off this metric (or maybe all metrics...).

@dhaiducek dhaiducek force-pushed the related-obj-metric branch 20 times, most recently from 2dd7bc9 to 4cbf4af Compare November 10, 2022 15:50
@dhaiducek
Copy link
Member Author

Sorry for the churn here. CI is passing and this PR is ready for review.

@dhaiducek
Copy link
Member Author

I also wound up adding in a flag, enable-metrics to disable this metric if desired.

Copy link
Member

@gparvin gparvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only had a minor comment about a comment. Looks good.

test/utils/utils.go Outdated Show resolved Hide resolved
controllers/metric.go Outdated Show resolved Hide resolved
@gparvin
Copy link
Member

gparvin commented Nov 28, 2022

/hold

Signed-off-by: Dale Haiducek <19750917+dhaiducek@users.noreply.github.com>
Signed-off-by: Dale Haiducek <19750917+dhaiducek@users.noreply.github.com>
Signed-off-by: Dale Haiducek <19750917+dhaiducek@users.noreply.github.com>
This gauge records any related objects monitored by multiple policies.

ref: stolostron/backlog#25357

Signed-off-by: Dale Haiducek <19750917+dhaiducek@users.noreply.github.com>
@openshift-ci
Copy link

openshift-ci bot commented Nov 30, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhaiducek, gparvin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JustinKuli
Copy link
Member

I'm slightly worried about the performance impact of all our metrics, not just this one. So I'm glad this is configurable.

From https://prometheus.io/docs/practices/naming/#labels :

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

Each of our metrics has a policy_name label of some kind, plus this new one and compare_objects* have basically an object_name label. Those are the kind of "unbounded sets of values" that are warned against... I don't have a better solution, especially for this exact feature, I'm just commenting that we need to be careful.

@dhaiducek
Copy link
Member Author

I agree--this definitely felt like a "square peg in a round hole" kind of thing since Prometheus isn't really designed for the goals of the metric. (We almost need a separate controller that can handle this sort of object duplication, but that might be a one-off at this point.) Hopefully this metric wouldn't occur frequently, but I definitely wanted to be able to turn it off if it became unruly.

@dhaiducek
Copy link
Member Author

@JustinKuli @gparvin Are we prepared to unhold this PR, particularly given that it can be disabled?

I think we might revisit this metric and ones like it in the future to see if we can generate them, potentially outside of Prometheus, in a manner more fitting to their intentions.

@JustinKuli
Copy link
Member

I think it's acceptable.

Maybe a future improvement would be to put the metrics we're worried about at a separate endpoint, so they could still be generated and scraped by some processes, but not the default prometheus configuration (which I assume gets everything at /metrics). But I'd be surprised if there isn't a way in Prometheus to ignore certain metrics, so maybe that would be a better approach

@gparvin
Copy link
Member

gparvin commented Dec 5, 2022

/unhold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants