add status metrics on prometheus #4251

rafaelreinert · 2022-01-19T20:11:59Z

This PR is adding the status metrics on Prometheus register with these metrics:

Metric name	Metric type	Description
plugin_status_gauge	gauge	Number of plugins by name and status.
bundle_loaded_counter	counter	Number of bundles loaded with success.
bundle_failed_load_counter	counter	Number of bundles that failed to load.
last_bundle_request	gauge	Last bundle request in UNIX nanoseconds.
last_success_bundle_activation	gauge	Last successfully bundle activation in UNIX nanoseconds.
last_success_bundle_download	gauge	Last successfully bundle download in UNIX nanoseconds.
last_success_bundle_request	gauge	Last successfully bundle request in UNIX nanoseconds.
bundle_loading_duration_ns	histogram	A histogram of duration for bundle loading.

In order to give control to the user if the metrics should be exported or not I've created a new config on status:

status:
  prometheus: true

And if Prometheus be true, the status plugin will register these metrics on start and will update the metrics using the UpdateRequestV1 at every oneShot function call.

These two issues are related of this pr. #1472 and #1506

srenatus · 2022-01-24T07:54:53Z

Sorry for the radio silence on this. I'm going to have a look today.

srenatus

Thanks for contributing. Added a few comments, please bear with me 🙃

srenatus · 2022-01-24T08:09:47Z

plugins/status/metrics.go

+
+var (
+	pluginStatus = prometheus.NewGaugeVec(
+		prometheus.GaugeOpts{


I'm not an expert with prometheus, so please bear with me on the naive question: Is it common to use gauges like that? From the docs,

Gauge is a Metric that represents a single numerical value that can arbitrarily go up and down.

A Gauge is typically used for measured values like temperatures or current memory usage, but also "counts" that can go up and down, like the number of running goroutines.

Now, I'm not sure how "status" and the unix nano metrics below fit into this. 🤔

It is common, as the Gauge is an int64 and can be set as any value it is perfect for timestamp.
For example:
https://github.com/kubernetes/kube-state-metrics/blob/master/docs/cronjob-metrics.md the metric kube_cronjob_next_schedule_time
https://github.com/prometheus/node_exporter the metric node_time_seconds

I suppose ultimately, it depends on the types of metrics you'd like to put on a dashboard...

I think I can follow the argument wrt having the timestamps in guages, but what is plugin_status_gauge about? From what I can tell, we're setting the value to 1 and use the label to carry the status... is that useful? I guess I don't see what this adds over the other gauges...? should this be a last_status_change timestamp? 🤔

Hey @srenatus , I haven’t made this clear what I was willing to do with each metric, my bad sorry.
The table below has how I use each metric (The queries are just examples).

Metric name Metric type Description What questions it answers (dashboard) alert

plugin_status_gauge gauge Number of plugins by name and status. - How many plugins I have? 'sum by(instance)(plugin_status_gauge{})'. - which instance has the plugin *? 'plugin_status_gauge{name="x"}' -Has any plugin unhealthy? 'plugin_status_gauge{status!~"OK,NOT_READY"} > 0'

bundle_loaded_counter counter Number of bundles loaded with success. How Many successful pooling I Do Per minute? 'sum by(name)(rate(bundle_loaded_counter{}[1m]))'

bundle_failed_load_counter counter Number of bundles that failed to load. How Many unsuccessfull pooling pooling I Do Per minute ?'sum by(instance, name, code, message)(rate(bundle_failed_load_counter{}[1m]))' The Erro-rate is > 10% ? '((sum by(instance, name)(rate(bundle_failed_load_counter{}[1m]))+sum by(instance, name)(rate(bundle_loaded_counter{}[1m])))/sum by(instance, name)(rate(bundle_failed_load_counter{}[1m]))>10'

last_bundle_request gauge Last bundle request in UNIX nanoseconds. How old was the last bundle request '(time()-((max by(name)(last_bundle_request{instance=~~"$instance", name=~~"$bundle"})/1e+9)))'

last_success_bundle_activation gauge Last successfully bundle activation in UNIX nanoseconds. How old was the last successfully bundle activation '(time()-((max by(name)(last_success_bundle_activation{instance=~~"$instance", name=~~"$bundle"})/1e+9)))'

last_success_bundle_download gauge Last bundle request in UNIX nanoseconds. How old was the last successfully bundle download '(time()-((max by(name)(last_success_bundle_download{instance=~~"$instance", name=~~"$bundle"})/1e+9)))'

last_success_bundle_request gauge Last successfully bundle request in UNIX nanoseconds. How old was the last successfully bundle request '(time()-((max by(name)(last_success_bundle_request{instance=~~"$instance", name=~~"$bundle"})/1e+9)))'

bundle_loading_duration_ns histogram A histogram of duration for bundle loading. What is my 95% bundle loading duration? 'histogram_quantile(0.95, sum(rate(bundle_loading_duration_ns_bucket{instance="$instance",name=~"$bundle"}[1m])) by (le,name))'

Thinking about which questions The status metrics should answer, what questions would you do?

For exemple part of my dash :

plugins/status/plugin.go

internal/prometheus/prometheus.go

plugins/status/plugin.go

docs/content/monitoring.md

srenatus

Thanks for bearing with me and thanks for adding tests. I've got a question inline on your opinion about making the metrics registration part of the plugin manager.

srenatus · 2022-02-01T09:23:28Z

internal/prometheus/prometheus.go

+	return p.registry.Register(c)
+}
+
+// MustRegister register the collectors on OPA prometheus registry and panics when an error occurs


Do we need this? I don't think it's used, let's just keep this minimal. 🤔

I have implemented these methods here only to implement the interface prometheus.Registerer, If Prometheus already has an interface for registering, I prefer to use that instead of creating a specific for OPA.

Ah, I see. Thanks, that makes sense.

srenatus · 2022-02-01T09:23:50Z

internal/prometheus/prometheus.go

+	p.registry.MustRegister(cs...)
+}
+
+// Unregister unregister the collectors on OPA prometheus registry


Similarly here, do you think we'll need the ability to unregister a collector?

the same explication above.

srenatus · 2022-02-01T09:25:38Z

plugins/discovery/discovery.go

+	metrics            metrics.Metrics
+	readyOnce          sync.Once
+	logger             logging.Logger
+	prometheusRegister prometheus.Registerer


Sorry for the back and forth, but in my head, it would perhaps look kind of nice if the prometheus.Registerer mechanism was part of the plugin manager: every plugin could then register its metrics with the plugin manager. What do you think?

It looks nice, I've seen an issue about that #2348. I will try to implement the interface prometheus.Registerer on the manager.

Hey @srenatus , in this commit 3cce236 I've moved the Prometheus.Register to the plugin manager. with that, all plugins can register metrics.
In my point of view as the plugin manager can be used in places without Prometheus (as an example the SDK)I prefer to expose the prometheus.Registerer as struct variable than register methods inside the plugin manager struct.
What do you think about that?

vendor/github.com/prometheus/client_golang/prometheus/testutil/promlint/promlint.go

rafaelreinert · 2022-02-07T11:40:21Z

Hey @srenatus , could you review it again? Thanks

srenatus

Some copy-edits on the docs, but this looks good to go. Thanks for taking the time!

plugins/status/plugin.go

docs/content/management-status.md

docs/content/monitoring.md

rafaelreinert · 2022-02-08T12:47:26Z

Hey @srenatus I've just finished the last requests and updated the branch with the main ;) Please review it again.
After the approval what do I need to do? just merge it to main?
Thanks a lot for the review and I hope do other contributions soon ;)

srenatus · 2022-02-08T12:50:42Z

@rafaelreinert thanks for bearing with me. As a final step, could you please squash your commits into one, with a brief description what's going on here? We'll just merge it then.

…e monitoring Signed-off-by: rafael otero reinert <rafaelreinert@gmail.com>

rafaelreinert · 2022-02-08T13:08:13Z

@srenatus done

rafaelreinert force-pushed the status_exporter branch from b49dd48 to 02ff4f6 Compare January 19, 2022 20:17

srenatus reviewed Jan 24, 2022

View reviewed changes

ashutosh-narkar reviewed Jan 24, 2022

View reviewed changes

docs/content/monitoring.md Show resolved Hide resolved

rafaelreinert force-pushed the status_exporter branch from a18001c to 3f3a092 Compare January 25, 2022 17:52

rafaelreinert requested review from srenatus and ashutosh-narkar January 25, 2022 20:00

srenatus reviewed Feb 1, 2022

View reviewed changes

rafaelreinert requested a review from srenatus February 2, 2022 16:43

srenatus previously approved these changes Feb 7, 2022

View reviewed changes

plugins/status/plugin.go Outdated Show resolved Hide resolved

docs/content/management-status.md Show resolved Hide resolved

docs/content/management-status.md Outdated Show resolved Hide resolved

docs/content/monitoring.md Outdated Show resolved Hide resolved

rafaelreinert dismissed srenatus’s stale review via e49b2c4 February 8, 2022 12:19

rafaelreinert requested a review from srenatus February 8, 2022 12:47

add status metrics on prometheus in order to improve plugin and bundl…

9f4187d

…e monitoring Signed-off-by: rafael otero reinert <rafaelreinert@gmail.com>

rafaelreinert force-pushed the status_exporter branch from e49b2c4 to 9f4187d Compare February 8, 2022 13:05

srenatus merged commit 8569551 into open-policy-agent:main Feb 8, 2022

anderseknert mentioned this pull request Jun 7, 2022

OPA Status - expose as prometheus metrics #1472

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add status metrics on prometheus #4251

add status metrics on prometheus #4251

rafaelreinert commented Jan 19, 2022

srenatus commented Jan 24, 2022

srenatus left a comment

srenatus Jan 24, 2022

rafaelreinert Jan 25, 2022

srenatus Jan 26, 2022

rafaelreinert Jan 26, 2022

rafaelreinert Jan 26, 2022

srenatus left a comment

srenatus Feb 1, 2022

rafaelreinert Feb 1, 2022

srenatus Feb 1, 2022

srenatus Feb 1, 2022

rafaelreinert Feb 1, 2022 •

edited

Loading

srenatus Feb 1, 2022

rafaelreinert Feb 1, 2022

rafaelreinert Feb 1, 2022

rafaelreinert commented Feb 7, 2022

srenatus left a comment

rafaelreinert commented Feb 8, 2022

srenatus commented Feb 8, 2022

rafaelreinert commented Feb 8, 2022

add status metrics on prometheus #4251

add status metrics on prometheus #4251

Conversation

rafaelreinert commented Jan 19, 2022

srenatus commented Jan 24, 2022

srenatus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srenatus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaelreinert Feb 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaelreinert commented Feb 7, 2022

srenatus left a comment

Choose a reason for hiding this comment

rafaelreinert commented Feb 8, 2022

srenatus commented Feb 8, 2022

rafaelreinert commented Feb 8, 2022

rafaelreinert Feb 1, 2022 •

edited

Loading