Skip to content

Provide a way to prevent unbounded metric growth #197

Open
@howardjohn

Description

@howardjohn

We are interested in a mechanism to control unbounded growth of metrics. While we generally follow best practices around limiting cardinality, for extreme long lived processes this is still problematic. For instance, its common to record the binary version of something in a metric, but with 100s of rollouts over days or months, these can explode in time series if the metrics collection is never restarted.

We would like some way to control this in our application.


Currently, there is a a .clear() and .remove(). These are good building blocks, but I am not sure they are sufficient on their own.

remove() is challenging on its own because we don't have any way to understand the entire set of labels stored in the metric at any point. In theory you could use EncodeMetric::encode and parse the results, but that is quite hacky.

clear() is also challenging, because it is all or nothing.


Ideally, I think we would have some interface like:

family.retain_if(|(labelset, metric)| {
 Instant::now().duration_since(metric.last_write()) < Duration::from_secs(3600)
})  

(remove any metrics not modified for an hour)

This would require a method on the family, but also maybe some changes on the metric type as well to make this easier to encode.

In #196 I have put up a small draft of what this could look like, but very open to alternatives

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions