Design: proposing new metricsgenerationprocessor #2722

hossain-rayhan · 2021-03-17T10:34:47Z

This project supports an important requirement for EKS Fargate Container Insights metrics

Doc update history
Issue Created: 03/17
1st update: 04/28
2nd update: 05/05

SIG meeting discussion
1st discussion: SIG meeting 03/17

Initial design review and the proposal was welcomed!
Told to explore Prometheus Recording rules

2nt discussion: SIG meeting 03/24

Prometheus recording rules seem to be way more complecated. And we decided to move forwared with the proposed simple approach.
Planned to start the implementation and there was no objection.

3rd discussion: SIG meeting 04/28

Collector SIG meeting, Bogdan said he will review the code.

4th discussion: SIG meeting 05/05

Rayhan will update the gitHub design proposal explainging the use case and how its different from existing metricstransformprocessor.
If comunity decides not to have this processor in the contrib, we will prefix the type/name with experimental_. Like experimental_resourcegeneration. We discussed this earlier and Bogdan mentioned today again.

Decision: On slack channel discussion, Punya and Josh agreed to move forward with experimental_ prefix and they both looked into the first PR. Josh approved the PR and waiting for Bogdan's to get it merged.

5th discussion: SIG meeting 05/12

Bogdan said he will have a look within one day.

Can you explain the use case?
We need to create a new metric from existing metrics. In the new metric, the metric value (Datapoint value) will be different from the existing metric values. Say for example, we have two existing metrics- pod.memory.usage and node.memory.limit. Now we need to create a new metric pod.memory.utilization which follows the following equation-
pod.memory.utilization = (pod.memory.usage.bytes / node.memory.limit)

Another use case is, we need to change the metric values to match the units. Say for example, we have pod.memory.usage in Megabytes. But we want to report it in Bytes. So we want to create a new metric which has memory usage in Bytes, like pod.memory.usage.bytes = pod.memory.usage * 1048576.

Why the existing metricstransformprocessor does not support your use case?
Existing metricstransformprocessor does not have a way to generate a new metric from existing metrics following a given calculation. It has the following functionalities which does not cover our use case-

Rename a metric
Add a metric label
Rename label key
Rename label value
delete datapoint
toggle datapoint
Aggregate across label sets
Aggregate across label values
Combine multiple metrics

How is it different from existing metricstransformprocessor?

In our existing metricstransformprocessor, we don't have a way to calculate a new metric from two different existing metrics.
In our existing metricstransformprocessor, we don't do any operation which creates a new metric with new value (calculated datapoint values). Though we have options for agreegating same metric values over level set which we are not looking for.

Why can't it be part of metricstransformprocessor?

We discussed this in our SIG meeting. @bogdandrutu suggested to create this feature as a new processor. The existing metricstransformprocessor works with old OpenCensus metrics types. We have a plan to rewrite the whole thing and don't want to more code on top of it.

Design Description:
The metrics generation processor (metricsgenerationprocessor) can be used to create new metrics using existing metrics following a given rule. In the beginning, it may support following two approaches for creating a new metric.

It can create a new metric using two existing metrics applying one of the follwing arithmatic operations: add, subtract, multiply, divide and percent. One use case is to calculate the pod.memory.utilization metric like the following equation-
pod.memory.utilization = (pod.memory.usage.bytes / node.memory.limit)
It can create a new metric by scaling the value of an existing metric with a given constant number. One use case is to convert pod.memory.usage metric values from Megabytes to Bytes (multiply the existing metric's value by 1048576)

Configuration

Configuration is specified through a list of generation rules. Generation rules find the metrics which
matche the given metric names and apply the operation to those metrics.

processors:
    # processor name: metricsgeneration
    metricsgeneration:

        # specify the metric generation rules
        generation_rules:
              # name of the new metric. this is a required field
            - new_metric_name: <new_metric_name>

              # generation_type describes how the metric will be generated. it can either be calculate or scale calculate generates a metric applying the given operation on two operand metrics. scale operates only on  operand1 metric to generate the new metric.
              generation_type: {calculate, scale}

              # this is a required field
              operand1_metric: <first_operand_metric>

              # this field is required only if the generation_type is calculate
              operand2_metric: <second_operand_metric>

              # operation specifies which atrithmatic operation to apply. it can be one of the five supported operations.
              operation: {add, subtract, multiply, divide, percent}

              # scale_by is only required if the generation_type is scale
              scale_by: 1000

Example Configurations

Create a new metric using two existing metrics

# create pod.cpu.utilized following (pod.cpu.usage / node.cpu.limit)
generation_rules:
    - new_metric_name: pod.cpu.utilized
      generation_type: calculate
      operand1_metric: pod.cpu.usage
      operand2_metric: node.cpu.limit
      operation: divide

Create a new metric scaling the value of an existing metric

# create pod.memory.usage.bytes from pod.memory.usage.megabytes
generation_rules:
    - new_metric_name: pod.memory.usage.bytes
      generation_type: scale
      operand1_metric: pod.memory.usage.megabytes
      operation: multiply
      scale_by: 1048576

In-Scope

It will support calculating a new metric in two different approaches. 1) using two metrics and applyig an arithmatic operation, and 2) using one metric to scale the value with a fixed constant.
It supports a fixed set of operations/calculations- add, subtract, multiply, divide and percent. It does not support any other complex calculation.

Exception

In case of missing metric name or label, the operation will simply be distracted printing a debug log.
If the new metric name (being calculated) is already present, the operation will simply be distracted printing a debug log.

Open Questions

For renaming a metrics (without calculating a new value), based on the metric label, I think the metrictransform processor is the right place. I created a seperate PR [metrics_transform_processor] Add filtering capabilities matching metric label values for applying changes #3201 for that. Is there any issue with this other than re-writing plan?

The text was updated successfully, but these errors were encountered:

hossain-rayhan · 2021-03-29T19:48:50Z

In addition to the above mentioned approach, I was exploring to see if we can support expressions like Prometheus Recording Rules. Recording rules and pretty generic and roubust. However, considering the complexity and amount of effort required to make something generic like this, I prefer not to go for it right now. We discussed this on last Wednesday's (03/24) Prometheus SIG meeting, and I believe from the meeting we decided to explore/ move with above mentioned simple approach. The configuration is not final, expecting more opinions/suggestions on the configuration.

@bogdandrutu @jmacd @jrcamp Would you please have a look to share your thoughts. Thanks in advance.

jrcamp · 2021-03-29T22:29:03Z

@hossain-rayhan Maybe take a look at how we (@dmitryax) did something similar inside the signalfxexporter. It operates on our own datapoints instead of OTLP so can't really be reused but might provide some inspiration:

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/signalfxexporter/translation/constants.go

I think it was discussed but I've already forgotten and don't see it in meeting notes. Would this replace metricstransformprocessor? Edit: or the reverse, why can't this be added to MTP?

hossain-rayhan · 2021-03-29T22:51:43Z

@jrcamp thanks, I will look into the signalfxexporter implementation.

Would this replace metricstransformprocessor? Edit: or the reverse, why can't this be added to MTP?

I added this in the Open Question section in the description. Thanks. We discussed this in our SIG meeting. @bogdandrutu suggested to create this feature as a new processor. The existing metricstransformprocessor works with old OpenCensus metrics types. We have a plan to rewrite the whole thing and don't want to more code on top of it.

jmacd · 2021-03-30T03:23:04Z

We discussed this on last Wednesday's (03/24) Prometheus SIG meeting, and I believe from the meeting we decided to explore/ move with above mentioned simple approach.

Thanks for following up, here. The proposal indeed looks simple and well justified, given how much more powerful and free-form the Prometheus recording rules.

hossain-rayhan · 2021-03-30T17:29:33Z

@jrcamp , I looked into signalFx code (following example). I like their idea about naming the metrics like operand1_metricand operand1_metric. However, I think our one is also clear and declarative. But their code will give us some idea.

- action: calculate_new_metric
  metric_name: memory.utilization
  operand1_metric: memory.used
  operand2_metric: memory.total
  operator: /

lubingfeng · 2021-05-05T16:56:20Z

@bogdandrutu @alolita @punya can you review and give you comments / suggestions?

hossain-rayhan · 2021-05-05T18:02:28Z

05/05 Collector SIG meeting discussion notes:

Rayhan will update the GitHub design proposal explainging the use case and how its different from existing metricstransformprocessor.
If community decides not to have this processor in the contrib, we will prefix the type/name with experimental_. Like experimental_resourcegeneration processor. We discussed this earlier and Bogdan mentioned today again.

bogdandrutu · 2021-05-05T19:07:21Z

Questions that are not answered:

Combine multiple metrics in metricstransform seems to do 70% of what you are doing. At least based on the name.
There is a PR Add scale_value transformation #3177 that adds scaling which is similar with unit change. Where should that functionality live?
"Another use case us, we need to change the metric values to match the units. Say for example, we have pod.memory.usage in Megabytes. But we want to report it in Bytes. So we want to create a new metric which has memory usage in Bytes, like pod.memory.usage.bytes = pod.memory.usage * 1048576." - this usually does not "create" a new metric, it may update "unit". Why is this functionality in generator vs transformer?
Where is the line between what goes to generator vs transformer?
In an ideal world if we were able to extend transformer (after change to pdata) would this be a new processor or an extension to the other one?

hossain-rayhan · 2021-05-05T19:36:12Z

Thanks @bogdandrutu . Here are my thoughts-

Combine multiple metrics in metricstransform seems to do 70% of what you are doing. At least based on the name.

It does not do specifc calculation on selected metrics. For example, we need to define exact two specific metrics for calculating the pod.memory.utilization and it should be defined which one is numerator and which one is denominator. Also, it can perform only on aggregation type (sum, min, max, mean). To to other operation like- multiply, divide, and subtract we have some constraints. Its achieveable but I would say it requires heavy modifications.

There is a PR Add scale_value transformation #3177 that adds scaling which is similar with unit change. Where should that functionality live?

It's the same question where do we want to put this. Another thing this PR only has power to multiply a value with a constant I believe- but not a big deal.

this usually does not "create" a new metric, it may update "unit". Why is this functionality in generator vs transformer?

Its updating the units as well as values. User may not want to change the units always. Sometimes they may want to keep two metrics in two different units for the sake of their Dashboard/Console. So, thats my understanding.

Where is the line between what goes to generator vs transformer? and

In an ideal world if we were able to extend transformer (after change to pdata) would this be a new processor or an extension to the other one?

I can think of the follwing two options-

Scope down the existing metricstransformprocessor to perform only name and label changing functionalities (which I don't think is feasible). Move everything else (calculation related) to a different processor (possibly metrics generation processor).
Put everything on top of metricstransform processor.

I am expecting your guidance here.

hossain-rayhan · 2021-07-08T07:30:22Z

The experimental processor is now available in the contrib repo.

…alculations (#35428) **Description:** The metrics generation processor would previously hit a panic when the calculation rule was operating on a `sum` metric, instead of a `gauge`. The [component proposal](#2722), [initial implementation](#3433), nor the [README](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/metricsgenerationprocessor) state that it's a requirement for the metric to be a `gauge`, nor do I think this requirement makes sense. I've updated the logic to account for sum metrics being used and added a test. **Testing:** Added a test for this situation, the test panics before the change and passes after.

…alculations (open-telemetry#35428) **Description:** The metrics generation processor would previously hit a panic when the calculation rule was operating on a `sum` metric, instead of a `gauge`. The [component proposal](open-telemetry#2722), [initial implementation](open-telemetry#3433), nor the [README](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/metricsgenerationprocessor) state that it's a requirement for the metric to be a `gauge`, nor do I think this requirement makes sense. I've updated the logic to account for sum metrics being used and added a test. **Testing:** Added a test for this situation, the test panics before the change and passes after.

hossain-rayhan mentioned this issue Mar 30, 2021

[metricstransformprocessor] Configuration driven simple arithmetic to manipulate values of metric data points #2901

Closed

hossain-rayhan mentioned this issue Apr 28, 2021

[ resource generation processor] add new processor skeleton and configuration examples #3266

Merged

alolita added the enhancement New feature or request label May 12, 2021

hossain-rayhan mentioned this issue May 19, 2021

[metrics_generation_processor] Add metrics generation logic #3433

Merged

hossain-rayhan mentioned this issue Jun 30, 2021

Enable metrics-generation-processor in the contrib collector #4047

Merged

hossain-rayhan closed this as completed Jul 8, 2021

crobert-1 mentioned this issue Sep 25, 2024

[processor/metricsgeneration] Fix panic when using a sum metric for calculations #35428

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design: proposing new metricsgenerationprocessor #2722

Design: proposing new metricsgenerationprocessor #2722

hossain-rayhan commented Mar 17, 2021 •

edited

Loading

hossain-rayhan commented Mar 29, 2021

jrcamp commented Mar 29, 2021 •

edited

Loading

hossain-rayhan commented Mar 29, 2021 •

edited

Loading

jmacd commented Mar 30, 2021

hossain-rayhan commented Mar 30, 2021

lubingfeng commented May 5, 2021 •

edited

Loading

hossain-rayhan commented May 5, 2021

bogdandrutu commented May 5, 2021 •

edited

Loading

hossain-rayhan commented May 5, 2021

hossain-rayhan commented Jul 8, 2021

Design: proposing new metricsgenerationprocessor #2722

Design: proposing new metricsgenerationprocessor #2722

Comments

hossain-rayhan commented Mar 17, 2021 • edited Loading

Configuration

Example Configurations

Create a new metric using two existing metrics

Create a new metric scaling the value of an existing metric

hossain-rayhan commented Mar 29, 2021

jrcamp commented Mar 29, 2021 • edited Loading

hossain-rayhan commented Mar 29, 2021 • edited Loading

jmacd commented Mar 30, 2021

hossain-rayhan commented Mar 30, 2021

lubingfeng commented May 5, 2021 • edited Loading

hossain-rayhan commented May 5, 2021

bogdandrutu commented May 5, 2021 • edited Loading

hossain-rayhan commented May 5, 2021

hossain-rayhan commented Jul 8, 2021

hossain-rayhan commented Mar 17, 2021 •

edited

Loading

jrcamp commented Mar 29, 2021 •

edited

Loading

hossain-rayhan commented Mar 29, 2021 •

edited

Loading

lubingfeng commented May 5, 2021 •

edited

Loading

bogdandrutu commented May 5, 2021 •

edited

Loading