Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPA Status - expose as prometheus metrics #1472

Closed
omerlh opened this issue Jun 5, 2019 · 8 comments
Closed

OPA Status - expose as prometheus metrics #1472

omerlh opened this issue Jun 5, 2019 · 8 comments
Labels
inactive monitoring Issues related to decision log and status plugins

Comments

@omerlh
Copy link
Contributor

omerlh commented Jun 5, 2019

Expected Behavior

Bundle information that exposed via status API should be exposed also as Prometheus metrics (last bundle load time, last bundle version, etc)

Actual Behavior

Exposed only by implementing status API

If this makes sense, I might be able to provide a PR.

@srenatus
Copy link
Contributor

srenatus commented Jun 8, 2019

Hmm I'm not too well-acquainted with prometheus, but which of these four metric types would fit here?

@omerlh
Copy link
Contributor Author

omerlh commented Jun 10, 2019

It's not ideal, but it's common to "abuse" counters for this - a good example is the up metrics, which is a counter that is either 0 or 1. This requires some thinking, but I would maybe imagine metrics like the following:

Gauge - last_bundle_fetch_duration
Counter - last_bundle_fetch_status{status="success|failure"} 1/0

And maybe some other metrics, so the status API will be used for logging that enhances Prometheus metrics. I can use last_bundle_fetch_status for alerting, and the logging to understand what happened.

@srenatus
Copy link
Contributor

Thanks for explaining this. I'm not familiar with what's common in prometheus land. Taking your word for it, I'd suppose this could be a useful addition. 😃 (But it's really not up to me to say...)

@omerlh
Copy link
Contributor Author

omerlh commented Jun 11, 2019

I just looked yesterday on Kube State Metrics docs, look like it's also common to have a metric which is time :)

@tsandall
Copy link
Member

If we add support for this, I'd like find the least "abusive" option.

A couple thoughts:

  • Is there a reason to use a counter instead of a gauge for the success/failure status?
  • We should fold download and activation status into a single gauge. I interpreted "fetch" as "download". Might be my bias.
  • OPA currently reports status on discovery and normal bundles. The status exported to Prometheus should do the same. For simplicity, I'd say we should fold discovery and normal bundle status into a single gauge, e.g., bundle_status{status=success|failure} 0/1. I'd make success 0 so that we have right to use non-zero values to communicate different errors in the future. If we don't want to fold discovery and bundle status into a single gauge, we need to namespace them. This will become even more important when we support multiple bundles (Load multiple policy bundles #721).

@stale
Copy link

stale bot commented Nov 22, 2021

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

@stale stale bot added the inactive label Nov 22, 2021
@stale stale bot removed the inactive label Dec 3, 2021
@tsandall tsandall added the monitoring Issues related to decision log and status plugins label Dec 3, 2021
@stale
Copy link

stale bot commented Jan 2, 2022

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

@anderseknert
Copy link
Member

From what I can tell, this was resolved by #4251

Let me know if that's not the case, and I'll re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive monitoring Issues related to decision log and status plugins
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants