-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPA generates a lot of bundle metrics and floods system #4584
Comments
Thanks for reporting! Indeed it sounds like having the active revision label is the problem. Would the metrics still be useful without that? 🤔 |
Cc @rafaelreinert what do you think? I guess we could either make it a feature you'd need to enable, or drop the labels? |
@srenatus in our case the metrics are valuable, even without the |
Hey @srenatus and @costimuraru , about the flag it already exists, in order to enable the status metrics on Prometheus it must be configured and the default value is false ( see https://www.openpolicyagent.org/docs/latest/management-status/#prometheus-status-metrics). I've done that just to not overflow the metrics in workloads like yours. I really believe someone activated it in your infrastructure. (Please let me know if the flag is not set, maybe a bug).
What do you think about this approach? |
Hey @srenatus and @costimuraru, I was thinking again about this issue and I realized what is maybe root cause, The problem is that, this function https://github.com/open-policy-agent/opa/blob/main/plugins/status/plugin.go#L478 is not resetting the older bundles metrics, I've not seen that as an issue because my environment has few bundles and the opa instance lifespan is less than 3 days. But in an environment that has a long lifespan and many bundles that become an issue because the number of metrics exported increase a lot. This week I will try to fix it and reset the metrics because if the bundle is not used anymore it doesn't be exported. |
Thanks for the details on the rationale behind this behavior, @rafaelreinert.
For instance, the For our use case, I think what we need is to just remove the Coming back to the metric types, the gauge is probably not that important, cause it retains the latest value, right? Your suggestion to have a flag which makes it possible to select between these 2 behaviors is probably best (metrics with and without the |
After stripping the activeRevision label, OPA is looking much better: |
I am thinking about that, maybe the best solution is to remove the |
@rafaelreinert that sounds reasonable. @costimuraru what do you think? @rafaelreinert would you be able to pick up making this change? 😃 (I'll take care of it if it's too much on your plate right now.) |
Thanks @srenatus. |
Having one activeRevision label on each of the prometheus metrics emitted by the status plugin has proven to be problematic with a large number of bundles. So with this change, 1. we keep the activeRevision label (just on) the last_success_bundle_activation metric. 2. the gauge gets reset, so we only keep the last active_revision (instead of keeping them all and therefore avoiding the situation where the /metrics output grows indefinitely) Fixes #4584. Signed-off-by: cmuraru <cmuraru@adobe.com>
…y-agent#4600) Having one activeRevision label on each of the prometheus metrics emitted by the status plugin has proven to be problematic with a large number of bundles. So with this change, 1. we keep the activeRevision label (just on) the last_success_bundle_activation metric. 2. the gauge gets reset, so we only keep the last active_revision (instead of keeping them all and therefore avoiding the situation where the /metrics output grows indefinitely) Fixes open-policy-agent#4584. Signed-off-by: cmuraru <cmuraru@adobe.com>
# This is the 1st commit message: finalizing changes for formatting with sprintf Signed-off-by: Damien Burks <damien@damienjburks.com> # This is the commit message open-policy-agent#2: updating changes to allow for multiple format strings Signed-off-by: Damien Burks <damien@damienjburks.com> # This is the commit message open-policy-agent#3: fixing golint issues Signed-off-by: Damien Burks <damien@damienjburks.com> # This is the commit message open-policy-agent#4: fixing golint issues Signed-off-by: Damien Burks <damien@damienjburks.com> # This is the commit message open-policy-agent#5: making recommended change: package level variable Signed-off-by: Damien Burks <damien@damienjburks.com> # This is the commit message open-policy-agent#6: adding support for explicit argument indexes Signed-off-by: Damien Burks <damien@damienjburks.com> # This is the commit message open-policy-agent#7: format: don't add 'in' keyword import when 'every' is there (open-policy-agent#4607) Also ensure that added imports have a location set. Previously, `opa fmt` on the added test file would have panicked because the import hadn't had a location. Fixes open-policy-agent#4606. Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com> # This is the commit message open-policy-agent#8: ast+topdown+planner: allow for mocking built-in functions via "with" (open-policy-agent#4540) With this change, we can replace calls to built-in functions via `with`. The replacement can either be a value -- which will be used as the return value for every call to the mocked built-in -- or a reference to a non-built-in function -- when the results need to depend on the call's arguments. Compiler, topdown, and planner have been adapted in this change. The included docs changes describe the replacement options further. Fixes first part of open-policy-agent#4449. (Missing are non-built-in functions as mock targets.) Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com> # This is the commit message open-policy-agent#9: build(deps): bump google.golang.org/grpc from 1.45.0 to 1.46.0 (open-policy-agent#4617) # This is the commit message open-policy-agent#10: docs/policy-testing: use assignment operator in mocks (open-policy-agent#4618) Additionally, simplify one test example. Signed-off-by: Anders Eknert <anders@eknert.com> # This is the commit message open-policy-agent#11: cmd/capabilities: expose capabilities through CLI (open-policy-agent#4588) There is a new command argument "capabilities". With this, it is possible to print the current capabilities version, show all capabilities versions & print any capabilities version, without the need of a file. Moreover, for the other commands which use the --capabilities flag, it is possible to give only the version number, without specifying a file. However, there are no breaking changes for those who use the capabilities file as an input for the flag. Unit tests were also written, in order to test the new argument and the changes made in ast. Fixes: open-policy-agent#4236 Signed-off-by: IoannisMatzaris <matzarisioannis@gmail.com> # This is the commit message open-policy-agent#12: format,eval: don't use source locations when formatting PE output (open-policy-agent#4611) * format: allow ignoreing source locations * cmd/eval: format disregarding source locations for partial result Before, we'd see this output: ``` $ opa eval -p -fsource 'time.clock(input.x)==time.clock(input.y)' # Query 1 time.clock(time.clock(input.x), input.y) ``` Now, we get the proper answer: `time.clock(input.y, time.clock(input.x))`. Note that it's a _display_ issue; the JSON output of PE has not been affected. Fixes open-policy-agent#4609. Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com> # This is the commit message open-policy-agent#13: build(deps): bump github/codeql-action from 1 to 2 (open-policy-agent#4621) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 1 to 2. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@v1...v2) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> # This is the commit message open-policy-agent#14: status: Remove activeRevision label on all but one metric (open-policy-agent#4600) Having one activeRevision label on each of the prometheus metrics emitted by the status plugin has proven to be problematic with a large number of bundles. So with this change, 1. we keep the activeRevision label (just on) the last_success_bundle_activation metric. 2. the gauge gets reset, so we only keep the last active_revision (instead of keeping them all and therefore avoiding the situation where the /metrics output grows indefinitely) Fixes open-policy-agent#4584. Signed-off-by: cmuraru <cmuraru@adobe.com> # This is the commit message open-policy-agent#15: website: add playground button to navbar (open-policy-agent#4622) Addressing one tiny bit of open-policy-agent#4614. Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com> # This is the commit message open-policy-agent#16: topdown/net: require prefix length for IPv6 in net.cidr_merge (open-policy-agent#4613) There are no default prefixes in IPv6, so if an IPv6 without a prefix is fed into net.cidr_merge, we'll return a non-halt error now. Before, we'd fail in various ways if a prefix-less IPv6 was fed into `net.cidr_merge`. With only one, we'd return `[ "<nil>" ]`, with two, we'd panic. Fixes open-policy-agent#4596. Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com> # This is the commit message open-policy-agent#17: Dockerfile: add source annotation (open-policy-agent#4626) `org.opencontainers.image.source` URL to get source code for building the image (string) https://github.com/opencontainers/image-spec/blob/main/annotations.md Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com> # This is the commit message open-policy-agent#18: build(deps): bump github.com/fsnotify/fsnotify v1.5.2 -> v1.5.4 (open-policy-agent#4628) https://github.com/fsnotify/fsnotify/releases/tag/v1.5.4 Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com> # This is the commit message open-policy-agent#19: docs: update version in kubernetes examples (open-policy-agent#4627) Signed-off-by: yongen.pan <yongen.pan@daocloud.io> # This is the commit message open-policy-agent#20: bundle/status: Include bundle type in status information OPA has support for Delta Bundles. The status object already contains valuable information such as last activation timestamp but does not specify if the bundle was a canonical snapshot or delta. This change updates the bundle.Status object to include the bundle type string: either "snapshot" or "delta". This can be useful for status endpoints to differentiate between the bundle types. Issue: 4477 Signed-off-by: Bryan Fulton <bryan@styra.com> # This is the commit message open-policy-agent#21: ast+topdown+planner: replacement of non-built-in functions via 'with' (open-policy-agent#4616) Follow-up to open-policy-agent#4540 We can now mock functions that are user-defined: package test f(_) = 1 { input.x = "x" } p = y { y := f(1) with f as 2 } ...following the same scoping rules as laid out for built-in mocks. The replacement can be a value (replacing all calls), or a built-in, or another non-built-in function. Also addresses bugs in the previous slice: * topdown/evalCall: account for empty rules result from indexer * topdown/eval: capture value replacement in PE could panic Note: in PE, we now drop 'with' for function mocks of any kind: These are always fully replaced in the saved support modules, so this should be OK. When keeping them, we'd also have to either copy the existing definitions into the support module; or create a function stub in it. Fixes open-policy-agent#4449. Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com> # This is the commit message open-policy-agent#22: format: keep whitespaces for multiple indented same-line withs (open-policy-agent#4635) Fixes open-policy-agent#4634. Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com> # This is the commit message open-policy-agent#23: downloader: support for downloading bundles from an OCI registry (open-policy-agent#4558) Initial support for open-policy-agent#4518. Configuration uses the 'services' config for registries, via the "type: oci" field. Bundles configured to pull from that service will then use OCI. ``` services: ghcr-registry: url: https://ghcr.io type: oci bundles: authz: service: ghcr-registry resource: ghcr.io/${ORGANIZATION}/${REPOSITORY}:${TAG} persist: true polling: min_delay_seconds: 60 max_delay_seconds: 120 persistence_directory: ${PERSISTENCE_PATH} ``` Service credentials are supported: if you want to pull from a private registry, use ``` services: ghcr-registry: url: https://ghcr.io type: oci credentials: bearer: token: ${GH_PAT} ``` If no `persistence_directory` is configured, the data is stored in a directory under /tmp. See docs/devel/OCI.md for manual steps to test this feature with some OCI registry (like ghcr.io). Signed-off-by: carabasdaniel <dani@aserto.com> # This is the commit message open-policy-agent#24: Prepare v0.40.0 Release (open-policy-agent#4631) Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com> # This is the commit message open-policy-agent#25: Prepare v0.41.0 development (open-policy-agent#4636) Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com> # This is the commit message open-policy-agent#26: docs: Adding example for `rego.metadata.role()` usage (open-policy-agent#4640) Signed-off-by: Johan Fylling <johan.dev@fylling.se> # This is the commit message open-policy-agent#27: build(deps): bump oras.land/oras-go from 1.1.0 to 1.1.1 (open-policy-agent#4643) Bumps [oras.land/oras-go](https://github.com/oras-project/oras-go) from 1.1.0 to 1.1.1. - [Release notes](https://github.com/oras-project/oras-go/releases) - [Commits](oras-project/oras-go@v1.1.0...v1.1.1) --- updated-dependencies: - dependency-name: oras.land/oras-go dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> # This is the commit message open-policy-agent#28: build(deps): bump OpenTelemetry 1.6.3 -> 1.7.0 (open-policy-agent#4649) https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.7.0 https://github.com/open-telemetry/opentelemetry-go-contrib/releases/tag/v1.7.0 Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com> # This is the commit message open-policy-agent#29: build(deps): bump github.com/containerd/containerd from 1.6.2 to 1.6.3 (open-policy-agent#4654) Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.6.2 to 1.6.3. - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](containerd/containerd@v1.6.2...v1.6.3) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> # This is the commit message open-policy-agent#30: Update k8s examples to the latest schema (open-policy-agent#4655) Signed-off-by: Víctor Martínez Bevià <vicmarbev@gmail.com> # This is the commit message open-policy-agent#31: Fix incorrect padding claims (open-policy-agent#4657) Signed-off-by: Anders Eknert <anders@eknert.com> # This is the commit message open-policy-agent#32: build(deps): bump github.com/containerd/containerd from 1.6.3 to 1.6.4 (open-policy-agent#4662) Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.6.3 to 1.6.4. - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](containerd/containerd@v1.6.3...v1.6.4) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> # This is the commit message open-policy-agent#33: build(deps): bump docker/setup-qemu-action from 1 to 2 (open-policy-agent#4668) # This is the commit message open-policy-agent#34: build(deps): bump docker/setup-buildx-action from 1 to 2 (open-policy-agent#4669) Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 1 to 2. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](docker/setup-buildx-action@v1...v2) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> # This is the commit message open-policy-agent#35: build(deps): github.com/bytecodealliance/wasmtime-go 0.35.0 -> 0.36.0 (open-policy-agent#4652) * build(deps): bump wasmtime-go: 0.35.0 -> 0.36.0 * internal/wasm: adapt to using epoch-based interruption Looks like we don't get frames for this. Also, there is currentlty no better way than comparing the message, as the trap code isn't surfaced (yet). Fixes open-policy-agent#4663. Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com> # This is the commit message open-policy-agent#36: ecosystem: Add Sansshell (open-policy-agent#4674) Signed-off-by: James Chacon <james.chacon@snowflake.com> # This is the commit message open-policy-agent#37: topdown: Add units.parse builtin (open-policy-agent#4676) This function works on all base decimal and binary SI units of the set: m, K/Ki, M/Mi, G/Gi, T/Ti, P/Pi, and E/Ei Note: Unlike `units.parse_bytes`, this function is case sensitive. Fixes open-policy-agent#1802. Signed-off-by: Philip Conrad <philipaconrad@gmail.com> # This is the commit message open-policy-agent#38: docs/contrib-code: Add capabilities step to built-in functions tutorial (open-policy-agent#4677) Signed-off-by: Philip Conrad <philipaconrad@gmail.com>
Short description
We have OPA deployed in Kubernetes as a standalone service (ie. 3 pods). We generate a new bundle every 30 seconds (bundle contains updated rego policies), which OPA downloads. We also scrape the metrics on each OPA pod (via Prometheus) to monitor them. We've noticed that the number of metrics emitted by OPA increases dramatically - from ~400 metrics when OPA starts to around 200 000+ after a few days. This is per pod. It seems that the increase is attributed to a series of metrics that relate to the bundle:
Each of this metric has a label named
active_revision
, which I believe is the bundle id. Given that we load a new bundle every 30 seconds, the number of metrics increases fast.You can see the output of OPA /metrics with 200k+ metrics in this Gist.
Examples:
0.39
curl http://opa-pod:8182/metrics | wc -l
229129
Steps To Reproduce
Expected behavior
A constant number of metrics should be outputted over time.
Actual behavior
The number of metrics increases over time dramatically. It starts with 400 metrics and in 2 days, in our case, it reached 229 000 metrics. This floods our systems (Prometheus/Cortex), where we have quota on the number of unique metrics scrapped. The scrapping itself also takes a lot of time (10-20 seconds).
Additional context
The text was updated successfully, but these errors were encountered: