Skip to content

runtime: API for unstable metrics #37112

Closed
@mknyszek

Description

@mknyszek

Proposal: API for unstable runtime metrics

Background & Motivation

Today runtime metrics are exposed in two ways.

The first way is via the struct-based sampling APIs runtime.ReadMemStats and runtime/debug.GCStats. These functions accept a pointer to a struct and then populate the struct with data from the runtime.

The problems with this type of API are:

  • Removing/renaming old metrics from the structs is impossible.
    • For example, MemStats.BySize is hard-coded to 61 size classes when there are currently 83. We cannot ever change BySize.
  • Adding implementation-specific metrics to the structs is discouraged, because it pollutes the API when inevitably they'll be deprecated.
  • runtime.ReadMemStats has a global effect on the application because it forces a STW. This has a direct effect on latency. Being able to tease apart which metrics actually need gives users more control over performance.

The good things about this type of API are:

  • Protected by the Go 1 compatibility promise.
  • Easy for applications to ingest, use for their own purposes, or push to a metrics collection service or log.

The second is via GODEBUG flags which emit strings containing metrics to standard error (e.g. gctrace, gcpacertrace, scavtrace).

The problems with this type of API are:

  • Difficult for an application to ingest because it must be parsed.
  • Format of the output is not protected by the Go 1 backwards compatibility promise.

The good things about this type of API are:

  • We can freely change it and add implementation-specific metrics.
  • We never have to live with bad decisions.

I would like to propose a new API which takes the best of both approaches.

Requirements

  • The API should be easily extendable with new metrics.
  • The API should be easily retractable, to deprecate old metrics.
    • Removing a metric should not break any Go applications as per the Go 1 compatibility promise.
  • The API should be discoverable, to obtain a list of currently relevant metrics.
  • The API should be rich, allowing a variety of metrics (e.g. distributions).
  • The API implementation should minimize CPU/memory usage, such that it does not appreciably
    affect any of the metrics being measured.
  • The API should include useful existing metrics already exposed by the runtime.

Goals

Given the requirements, I suggest we prioritize the following concerns when designing the API in the following order.

  1. Extensibility.
    • Metrics are “unstable” and therefore it should always be compatible to add or remove metrics.
    • Since metrics will tend to be implementation-specific, this feature is critical.
  2. Discoverability.
    • Because these metrics are “unstable,” there must be a way for the application, and for the human writing the application, to discover the set of usable metrics and be able to do something useful with that information (e.g. log the metric).
    • The API should enable collecting a subset of metrics programmatically. For example, one might want to “collect all memory-related metrics” or “collect all metrics which are efficient to collect”.
  3. Performance.
    • Must have a minimized effect on the metrics it returns in the steady-state.
    • Should scale up to 100s of metrics, an amount that a human might consider “a lot.”
      • Note that picking the right types to expose can limit the amount of metrics we need to expose. For example, a distribution type would significantly reduce the number of metrics.
  4. Ergonomics.
    • The API should be as easy to use as it can be, given the above.

Design

See full design document at https://golang.org/design/37112-unstable-runtime-metrics.

Highlights:

  • Expose a new sampling-based API in a new package, the runtime/metrics package.
  • Use string keys for each metric which include the unit of the metric in an easily-parseable format.
  • Expose a discovery API which provides metadata about each metric at runtime, such as whether it requires a STW and whether it's cumulative (counter as opposed to a gauge).
  • Add a Histogram interface to the package which represents a distribution.
  • Support for event-based metrics is discussed and left open, but considered outside the scope of this proposal.

Backwards Compatibility

Note that although the set of metrics the runtime exposes will not be stable across Go versions, the API to discover and access those metrics will be.

Therefore, this proposal strictly increases the API surface of the Go standard library without changing any existing functionality and is therefore Go 1 compatible.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions