Add metric to monitor common related objects #77

dhaiducek · 2022-11-04T22:08:35Z

This adds a common_related_objects metric, which is a Gauge Vector sliced by the related object and the policy. Each policy and related object pair has a gauge set to the total number of instances of that related object. For example, if two policies point to a related object, there would be two gauges for each policy/related object pair, each set to 2. Gauges set to 1 are ignored/cleaned up.

ref:

https://github.com/stolostron/backlog/issues/25357

dhaiducek · 2022-11-05T00:19:38Z

I'm curious about the performance hit for this implementation and wondering whether there should be a flag to shut off this metric (or maybe all metrics...).

dhaiducek · 2022-11-10T16:51:45Z

Sorry for the churn here. CI is passing and this PR is ready for review.

dhaiducek · 2022-11-10T16:54:29Z

I also wound up adding in a flag, enable-metrics to disable this metric if desired.

gparvin

Only had a minor comment about a comment. Looks good.

test/utils/utils.go

controllers/metric.go

gparvin · 2022-11-28T17:25:40Z

/hold

Signed-off-by: Dale Haiducek <19750917+dhaiducek@users.noreply.github.com>

This gauge records any related objects monitored by multiple policies. ref: stolostron/backlog#25357 Signed-off-by: Dale Haiducek <19750917+dhaiducek@users.noreply.github.com>

openshift-ci · 2022-11-30T19:42:28Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhaiducek, gparvin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dhaiducek,gparvin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

JustinKuli · 2022-11-30T20:57:46Z

I'm slightly worried about the performance impact of all our metrics, not just this one. So I'm glad this is configurable.

From https://prometheus.io/docs/practices/naming/#labels :

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

Each of our metrics has a policy_name label of some kind, plus this new one and compare_objects* have basically an object_name label. Those are the kind of "unbounded sets of values" that are warned against... I don't have a better solution, especially for this exact feature, I'm just commenting that we need to be careful.

dhaiducek · 2022-11-30T21:09:21Z

I agree--this definitely felt like a "square peg in a round hole" kind of thing since Prometheus isn't really designed for the goals of the metric. (We almost need a separate controller that can handle this sort of object duplication, but that might be a one-off at this point.) Hopefully this metric wouldn't occur frequently, but I definitely wanted to be able to turn it off if it became unruly.

dhaiducek · 2022-12-02T21:49:03Z

@JustinKuli @gparvin Are we prepared to unhold this PR, particularly given that it can be disabled?

I think we might revisit this metric and ones like it in the future to see if we can generate them, potentially outside of Prometheus, in a manner more fitting to their intentions.

JustinKuli · 2022-12-05T15:05:06Z

I think it's acceptable.

Maybe a future improvement would be to put the metrics we're worried about at a separate endpoint, so they could still be generated and scraped by some processes, but not the default prometheus configuration (which I assume gets everything at /metrics). But I'd be surprised if there isn't a way in Prometheus to ignore certain metrics, so maybe that would be a better approach

gparvin · 2022-12-05T21:12:10Z

/unhold

openshift-ci bot added the dco-signoff: yes label Nov 4, 2022

openshift-ci bot requested review from gparvin and willkutler November 4, 2022 22:08

openshift-ci bot added the approved label Nov 4, 2022

dhaiducek force-pushed the related-obj-metric branch from a7030a1 to 3caa5a1 Compare November 5, 2022 00:16

dhaiducek force-pushed the related-obj-metric branch 20 times, most recently from 2dd7bc9 to 4cbf4af Compare November 10, 2022 15:50

gparvin reviewed Nov 11, 2022

View reviewed changes

test/utils/utils.go Outdated Show resolved Hide resolved

openshift-ci bot added the dco-signoff: no label Nov 17, 2022

openshift-ci bot removed the dco-signoff: yes label Nov 17, 2022

dhaiducek force-pushed the related-obj-metric branch from 82c2a18 to 98d1586 Compare November 17, 2022 21:32

openshift-ci bot added dco-signoff: yes and removed dco-signoff: no labels Nov 17, 2022

dhaiducek requested a review from gparvin November 17, 2022 21:34

mprahl reviewed Nov 22, 2022

View reviewed changes

controllers/metric.go Outdated Show resolved Hide resolved

openshift-ci bot added the do-not-merge/hold label Nov 28, 2022

dhaiducek added 4 commits November 29, 2022 16:47

Break out of loop after matching object is found

88d6337

Signed-off-by: Dale Haiducek <19750917+dhaiducek@users.noreply.github.com>

Use native Make ignore symbols

e4b6aca

Signed-off-by: Dale Haiducek <19750917+dhaiducek@users.noreply.github.com>

Migrate metrics to metric.go

c449262

Signed-off-by: Dale Haiducek <19750917+dhaiducek@users.noreply.github.com>

Add common_related_objects

9983e23

This gauge records any related objects monitored by multiple policies. ref: stolostron/backlog#25357 Signed-off-by: Dale Haiducek <19750917+dhaiducek@users.noreply.github.com>

dhaiducek force-pushed the related-obj-metric branch from 98d1586 to 9983e23 Compare November 29, 2022 21:48

dhaiducek requested a review from mprahl November 29, 2022 21:48

gparvin approved these changes Nov 30, 2022

View reviewed changes

openshift-ci bot assigned gparvin Nov 30, 2022

openshift-ci bot added the lgtm label Nov 30, 2022

openshift-ci bot removed the do-not-merge/hold label Dec 5, 2022

openshift-merge-robot merged commit 45cff9d into open-cluster-management-io:main Dec 5, 2022

This was referenced Dec 5, 2022

🤖 Sync from open-cluster-management-io/config-policy-controller: #77 stolostron/config-policy-controller#379

Closed

😿 Failed to sync the upstream PRs: #77, #83 stolostron/config-policy-controller#380

Closed

dhaiducek mentioned this pull request Dec 5, 2022

Manual upstream sync of #77 and #83 stolostron/config-policy-controller#381

Merged

dhaiducek deleted the related-obj-metric branch January 16, 2024 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metric to monitor common related objects #77

Add metric to monitor common related objects #77

dhaiducek commented Nov 4, 2022 •

edited

Loading

dhaiducek commented Nov 5, 2022

dhaiducek commented Nov 10, 2022

dhaiducek commented Nov 10, 2022

gparvin left a comment

gparvin commented Nov 28, 2022

openshift-ci bot commented Nov 30, 2022

JustinKuli commented Nov 30, 2022

dhaiducek commented Nov 30, 2022

dhaiducek commented Dec 2, 2022

JustinKuli commented Dec 5, 2022

gparvin commented Dec 5, 2022

Add metric to monitor common related objects #77

Add metric to monitor common related objects #77

Conversation

dhaiducek commented Nov 4, 2022 • edited Loading

dhaiducek commented Nov 5, 2022

dhaiducek commented Nov 10, 2022

dhaiducek commented Nov 10, 2022

gparvin left a comment

Choose a reason for hiding this comment

gparvin commented Nov 28, 2022

openshift-ci bot commented Nov 30, 2022

JustinKuli commented Nov 30, 2022

dhaiducek commented Nov 30, 2022

dhaiducek commented Dec 2, 2022

JustinKuli commented Dec 5, 2022

gparvin commented Dec 5, 2022

dhaiducek commented Nov 4, 2022 •

edited

Loading