Description
Proposal: API for unstable runtime metrics
Background & Motivation
Today runtime metrics are exposed in two ways.
The first way is via the struct-based sampling APIs runtime.ReadMemStats
and runtime/debug.GCStats
. These functions accept a pointer to a struct and then populate the struct with data from the runtime.
The problems with this type of API are:
- Removing/renaming old metrics from the structs is impossible.
- For example,
MemStats.BySize
is hard-coded to 61 size classes when there are currently 83. We cannot ever changeBySize
.
- For example,
- Adding implementation-specific metrics to the structs is discouraged, because it pollutes the API when inevitably they'll be deprecated.
runtime.ReadMemStats
has a global effect on the application because it forces a STW. This has a direct effect on latency. Being able to tease apart which metrics actually need gives users more control over performance.
The good things about this type of API are:
- Protected by the Go 1 compatibility promise.
- Easy for applications to ingest, use for their own purposes, or push to a metrics collection service or log.
The second is via GODEBUG
flags which emit strings containing metrics to standard error (e.g. gctrace
, gcpacertrace
, scavtrace
).
The problems with this type of API are:
- Difficult for an application to ingest because it must be parsed.
- Format of the output is not protected by the Go 1 backwards compatibility promise.
The good things about this type of API are:
- We can freely change it and add implementation-specific metrics.
- We never have to live with bad decisions.
I would like to propose a new API which takes the best of both approaches.
Requirements
- The API should be easily extendable with new metrics.
- The API should be easily retractable, to deprecate old metrics.
- Removing a metric should not break any Go applications as per the Go 1 compatibility promise.
- The API should be discoverable, to obtain a list of currently relevant metrics.
- The API should be rich, allowing a variety of metrics (e.g. distributions).
- The API implementation should minimize CPU/memory usage, such that it does not appreciably
affect any of the metrics being measured. - The API should include useful existing metrics already exposed by the runtime.
Goals
Given the requirements, I suggest we prioritize the following concerns when designing the API in the following order.
- Extensibility.
- Metrics are “unstable” and therefore it should always be compatible to add or remove metrics.
- Since metrics will tend to be implementation-specific, this feature is critical.
- Discoverability.
- Because these metrics are “unstable,” there must be a way for the application, and for the human writing the application, to discover the set of usable metrics and be able to do something useful with that information (e.g. log the metric).
- The API should enable collecting a subset of metrics programmatically. For example, one might want to “collect all memory-related metrics” or “collect all metrics which are efficient to collect”.
- Performance.
- Must have a minimized effect on the metrics it returns in the steady-state.
- Should scale up to 100s of metrics, an amount that a human might consider “a lot.”
- Note that picking the right types to expose can limit the amount of metrics we need to expose. For example, a distribution type would significantly reduce the number of metrics.
- Ergonomics.
- The API should be as easy to use as it can be, given the above.
Design
See full design document at https://golang.org/design/37112-unstable-runtime-metrics.
Highlights:
- Expose a new sampling-based API in a new package, the
runtime/metrics
package. - Use string keys for each metric which include the unit of the metric in an easily-parseable format.
- Expose a discovery API which provides metadata about each metric at runtime, such as whether it requires a STW and whether it's cumulative (counter as opposed to a gauge).
- Add a Histogram interface to the package which represents a distribution.
- Support for event-based metrics is discussed and left open, but considered outside the scope of this proposal.
Backwards Compatibility
Note that although the set of metrics the runtime exposes will not be stable across Go versions, the API to discover and access those metrics will be.
Therefore, this proposal strictly increases the API surface of the Go standard library without changing any existing functionality and is therefore Go 1 compatible.