-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: runtime/metrics: define a recommended set of metrics #67120
Comments
@dashpole Out of curiosity, would OpenTelemetry use the |
We would likely use it as part of the tests for the package as a way to verify that we are exposing all of the recommended metrics to users. We would probably not use it for programmatic generation of new metrics. |
@MikeMitchellWebDev It's true that Lastly, note that EDIT: To be clear, I don't mean deprecating |
Hello hello, I'm Arthur from prometheus/client_golang team 👋 We have a different set of default metrics and I believe we can't just change the default exposed metrics without a major version bump. One approach we could use is to have a configuration option in client_golang "ExposeRecommendedMetrics", but I predict we'll have questions like "why the recommended metrics aren't the default?". With that said, I like the idea of the Go team providing instructions about what metrics are worth paying the price to collect and store. I also like the idea of those instructions being programmatically available, we just need to evaluate if there's a need for a major version bump in client_golang and if it's worth the effort |
@ArthurSens for some historical perspective see prometheus/client_golang#955 and prometheus/client_golang#1033 My suggestion was that |
Thanks for the extra context! Yeah, I agree we can offer the recommended metrics in some way :) |
@mknyszek for "Recommended" histogram metrics (currently just |
The proposal states that the 'Recommended' set follows the guarantees of the "runtime/metrics" package: // For a given metric name, the value of Buckets is guaranteed not to change
// between calls until program exit. from: https://cs.opensource.google/go/go/+/refs/tags/go1.22.3:src/runtime/metrics/histogram.go;l=26-27 |
I'm asking about general stability (e.g. across go versions, or across multiple instances of an applications). |
@dashpole No, they're not guaranteed to remain stable and have changed across Go versions. We've definitely removed buckets before. |
Thanks for this, great work! What's the end goal?
I wonder what is the exact intention and the end-goal behind this proposal. Is it to: A. Convince the common instrumentation SDKs to give the Go team control over the default published metrics for the Go runtime? So the largest amount of Go applications possible have those common metrics OOTB, and adopt potential metrics changes as soon as they are rebuilt with a new Go version? or... B. To support a certain amount of users who wants to stay with the Go runtime "default" metrics that might change on Go version to version basis and there are fine with that. Picking a healthy, limited "recommended/default" set from the Go team is definitely helping for all of those. I love the recommendation mechanism too, easy to use to me. As co-maintainer of the Prometheus client_golang I fully support @ArthurSens words around adding a programmatic option e.g. I wonder if A is realistic. Then if A is not possible at the moment, because e.g. OpenTelemetry and/or Prometheus client_golang (potentially popular metric SDKs) want to keep the influence on what's default (the current status quo), than is this proposal still viable? I think to motivate SDKs to pursue A with Go team, we need to learn more about pros & cons here. What user will get out of it vs SDK adding manually some Go runtime metrics to default based on user feedback and the recent changes to recommended set? Some cons would be potentially different stability guarantees across Go team vs Otel vs Prometheus. To sum up, is it A? Can we unpack pros & cons here for SDKs to assess those? Recommended MetricsTL;DR: Those make sense. Just to evaluate your proposed metrics and contribute to pros & cons of using Go recommended metrics as default, I diffed what client_golang has now vs recommended. NOTE: All
To sum up, I think I Prometheus is really close to recommended ones, plus I would propose adding With that.. it's only
|
It's really C, in practice. B is nice for those that want it, but I don't think A is practical. Everyone is always going to be free to choose what metrics they collect and/or expose at any layer. Really I think we're just trying to set a better foundation here than the existing, somewhat haphazard, "collect
FWIW, my thought was that SDKs can just choose to skip inherently high cardinality types programmatically, like
That's a good sign IMO. I'm supportive of adding those. While they're likely to be exactly the same over time, the fact is that you can mutate automatically at runtime. As above, re: |
Epic. Thanks. Especially with this intention, LGTM. 💪🏽 Good effort allowing everyone to learn and adopt to new practices once they appear. We will be using recommended metrics in Prometheus SDK to stay close those as much as possible. |
This work was done in prep for our talk on GopherCon UK about Go Runtime Metrics. Feedback welcome to the dashboard data, layout and style! Essentially it has all metric we are maintaining in client_golang (most popular Go metric SDK). Exposed metrics also align with Go Team recommendation golang/go#67120 Signed-off-by: bwplotka <bwplotka@gmail.com>
This work was done in prep for our talk on GopherCon UK about Go Runtime Metrics. Feedback welcome to the dashboard data, layout and style! Essentially it has all metric we are maintaining in client_golang (most popular Go metric SDK). Exposed metrics also align with Go Team recommendation golang/go#67120 Signed-off-by: bwplotka <bwplotka@gmail.com>
This work was done in prep for our talk on GopherCon UK about Go Runtime Metrics. Feedback welcome to the dashboard data, layout and style! Essentially it has all metric we are maintaining in client_golang (most popular Go metric SDK). Exposed metrics also align with Go Team recommendation golang/go#67120 Signed-off-by: bwplotka <bwplotka@gmail.com>
* Add recommended Prometheus dashboards for Go. This work was done in prep for our talk on GopherCon UK about Go Runtime Metrics. Feedback welcome to the dashboard data, layout and style! Essentially it has all metric we are maintaining in client_golang (most popular Go metric SDK). Exposed metrics also align with Go Team recommendation golang/go#67120 Signed-off-by: bwplotka <bwplotka@gmail.com> * Addressed comments. Signed-off-by: bwplotka <bwplotka@gmail.com> --------- Signed-off-by: bwplotka <bwplotka@gmail.com>
Introduction
With each Go release the set of metrics exported by the
runtime/metrics
grows in size. Not all metrics are applicable to all cases, and it can become difficult to identify which metrics are actually useful. This is especially true for projects like OpenTelemetry and Prometheus which want to export some broadly-applicable Go runtime metrics by default, but the full set is overwhelming and not particularly user-friendly.Another problem with collecting all metrics is cost. The cardinality of the default metric set is closely watched by projects like Prometheus, because downstream users are often paying for the storage costs of these metrics when making use of hosted solutions.
This issue proposes defining a conservative subset of runtime metrics that are broadly applicable, and a simple mechanism for discovering them programmatically.
Proposal
There are two parts to this proposal. The categorization of some metrics as "recommended" by the Go toolchain, and the actual mechanism for that categorization.
To start with, I would like to propose documenting such a set of metrics as "recommended" at the top of the
runtime/metrics
documentation. Each metric is required to have a full rationale explaining its utility and use-cases. The "recommended" set is intended to hold a lot of weight. We need to make sure the reason why we promote a particular metric is well-documented. The "recommended" set of metrics generally follows the compatibility guarantees of the runtime/metrics package. That being said, a metric is unlikely to be promoted to "recommended" if it's not likely to just exist indefinitely. Still, we reserve the right to remove them.Next, we'll add a
Tags []string
field tometric.Description
so that these metrics can be found programmatically. We could get by with a simple boolean field, but that's inflexible. In particular, what I'd like to avoid is having dedicated fields for future categorizations such that they end up non-orthogonal and confusing.The tag indicating the default set will be the string "recommended".
Proposed initial metrics
Below is an initial proposed set of metrics. This list is intended to be a conservative and uncontroversial set of metrics that have clear real-world use-cases.
/gc/gogc:percent
-GOGC
./gc/gomemlimit:bytes
-GOMEMLIMIT
./gc/heap/allocs:bytes
- Total bytes allocated./gc/heap/allocs:objects
- Total individual allocations made./gc/heap/goal:bytes
- GC heap goal.GOGC
andGOMEMLIMIT
, and a close approximation for heap memory footprint./memory/classes/heap/released:bytes
- Current count of heap bytes that are released back to the OS but which remain mapped.GOMEMLIMIT
. It is also necessary to understand what the runtime believes its own physical memory footprint is, as a subtraction from the total./memory/classes/heap/stacks:bytes
- Current count of bytes allocated to goroutine stacks./memory/classes/total:bytes
- Total Go runtime memory footprint.GOMEMLIMIT
. It's also useful for identifying "other" memory, and together with/memory/classes/heap/released:bytes
, what the runtime believes the physical memory footprint of the application is./sched/gomaxprocs:threads
-GOMAXPROCS
./sched/goroutines:goroutines
- Current count of live goroutines (blocked, running, etc.)./sched/latencies:seconds
- Distribution of time goroutines spend runnable (that is, not blocked), but not running.This results in 10
uint64
metrics and 1Float64Histogram
metric in the default set, a significant reduction from the 81 metrics currently exported by the package.Here are a few other metrics that were not included.
/memory/classes/heap/objects:bytes
- Current count of bytes allocated./memory/classes/metadata/other:bytes
- Runtime metadata, mostly GC metadata./gc/heap/frees:bytes
- Total bytes freed./gc/heap/allocs:bytes
. Not that useful on its own, and live+unswept heap memory isn't a terribly useful metric since it tends to be noisy and misleading, subject to sweep scheduling nuances. The heap goal is a much more reliable measure of total heap footprint./gc/heap/frees:objects
- Total individual allocations freed./gc/heap/allocs:objects
. Not that useful on its own, and the number of live objects on its own also isn't that useful. Together with/gc/heap/frees:objects
,/gc/heap/allocs:bytes
, and/gc/heap/frees:bytes
it can be used to calculate average object size, but that's also not very useful on its own. The distribution of object sizes is more useful, but the metric is currently incomplete, as it currently buckets all objects >32 KiB in size together./godebug/non-default/*
- Count of instances of a behavior change due to aGODEBUG
setting.Alternatives
Only documenting the recommended set
One alternative is to only document the set of recommended metrics. This is fine, but it also runs counter to
runtime/metrics
' original goal of being able to discover metrics programmatically. Some mechanism here seems necessary to keep the package useful to both humans and computers.A toolchain-versioned default metrics set
Originally, we had considered an API (for example,
metrics.Recommended(...)
) that accepted a Go toolchain version and would return the set of default metrics (specifically, a[]metrics.Description
) for that version. All the metrics within would always be valid to pass tometrics.Read
.You could also imagine this set being controlled via the language version set in the
go.mod
indirectly viaGODEBUG
flags. (That is, every time we would change this set, we'd add a valid value toGODEBUG
. Specifically something likeGODEBUG=runtimemetricsgo121=1
.)Unfortunately, there are already a lot of questions here about stability and versioning. Least of which is the fact that toolchain versions, at least those reported by the runtime/debug package, aren't very structured.
Furthermore, this is a type of categorization that doesn't really compose well. If we ever wanted new categories, we'd need to define a new API, or possibly dummy toolchain strings. It's also a much more complicated change.
The text was updated successfully, but these errors were encountered: