Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design: proposing new metricsgenerationprocessor #2722

Closed
hossain-rayhan opened this issue Mar 17, 2021 · 10 comments
Closed

Design: proposing new metricsgenerationprocessor #2722

hossain-rayhan opened this issue Mar 17, 2021 · 10 comments
Labels
enhancement New feature or request

Comments

@hossain-rayhan
Copy link
Contributor

hossain-rayhan commented Mar 17, 2021

This project supports an important requirement for EKS Fargate Container Insights metrics

Doc update history
Issue Created: 03/17
1st update: 04/28
2nd update: 05/05

SIG meeting discussion
1st discussion: SIG meeting 03/17

  • Initial design review and the proposal was welcomed!
  • Told to explore Prometheus Recording rules

2nt discussion: SIG meeting 03/24

  • Prometheus recording rules seem to be way more complecated. And we decided to move forwared with the proposed simple approach.
  • Planned to start the implementation and there was no objection.

3rd discussion: SIG meeting 04/28

  • Collector SIG meeting, Bogdan said he will review the code.

4th discussion: SIG meeting 05/05

  • Rayhan will update the gitHub design proposal explainging the use case and how its different from existing metricstransformprocessor.
  • If comunity decides not to have this processor in the contrib, we will prefix the type/name with experimental_. Like experimental_resourcegeneration. We discussed this earlier and Bogdan mentioned today again.

Decision: On slack channel discussion, Punya and Josh agreed to move forward with experimental_ prefix and they both looked into the first PR. Josh approved the PR and waiting for Bogdan's to get it merged.

5th discussion: SIG meeting 05/12

  • Bogdan said he will have a look within one day.

Can you explain the use case?
We need to create a new metric from existing metrics. In the new metric, the metric value (Datapoint value) will be different from the existing metric values. Say for example, we have two existing metrics- pod.memory.usage and node.memory.limit. Now we need to create a new metric pod.memory.utilization which follows the following equation-
pod.memory.utilization = (pod.memory.usage.bytes / node.memory.limit)

Another use case is, we need to change the metric values to match the units. Say for example, we have pod.memory.usage in Megabytes. But we want to report it in Bytes. So we want to create a new metric which has memory usage in Bytes, like pod.memory.usage.bytes = pod.memory.usage * 1048576.

Why the existing metricstransformprocessor does not support your use case?
Existing metricstransformprocessor does not have a way to generate a new metric from existing metrics following a given calculation. It has the following functionalities which does not cover our use case-

  • Rename a metric
  • Add a metric label
  • Rename label key
  • Rename label value
  • delete datapoint
  • toggle datapoint
  • Aggregate across label sets
  • Aggregate across label values
  • Combine multiple metrics

How is it different from existing metricstransformprocessor?

  • In our existing metricstransformprocessor, we don't have a way to calculate a new metric from two different existing metrics.
  • In our existing metricstransformprocessor, we don't do any operation which creates a new metric with new value (calculated datapoint values). Though we have options for agreegating same metric values over level set which we are not looking for.

Why can't it be part of metricstransformprocessor?

  • We discussed this in our SIG meeting. @bogdandrutu suggested to create this feature as a new processor. The existing metricstransformprocessor works with old OpenCensus metrics types. We have a plan to rewrite the whole thing and don't want to more code on top of it.

Design Description:
The metrics generation processor (metricsgenerationprocessor) can be used to create new metrics using existing metrics following a given rule. In the beginning, it may support following two approaches for creating a new metric.

  1. It can create a new metric using two existing metrics applying one of the follwing arithmatic operations: add, subtract, multiply, divide and percent. One use case is to calculate the pod.memory.utilization metric like the following equation-
    pod.memory.utilization = (pod.memory.usage.bytes / node.memory.limit)
  2. It can create a new metric by scaling the value of an existing metric with a given constant number. One use case is to convert pod.memory.usage metric values from Megabytes to Bytes (multiply the existing metric's value by 1048576)

Configuration

Configuration is specified through a list of generation rules. Generation rules find the metrics which
matche the given metric names and apply the operation to those metrics.

processors:
    # processor name: metricsgeneration
    metricsgeneration:

        # specify the metric generation rules
        generation_rules:
              # name of the new metric. this is a required field
            - new_metric_name: <new_metric_name>

              # generation_type describes how the metric will be generated. it can either be calculate or scale calculate generates a metric applying the given operation on two operand metrics. scale operates only on  operand1 metric to generate the new metric.
              generation_type: {calculate, scale}

              # this is a required field
              operand1_metric: <first_operand_metric>

              # this field is required only if the generation_type is calculate
              operand2_metric: <second_operand_metric>

              # operation specifies which atrithmatic operation to apply. it can be one of the five supported operations.
              operation: {add, subtract, multiply, divide, percent}

              # scale_by is only required if the generation_type is scale
              scale_by: 1000

Example Configurations

Create a new metric using two existing metrics

# create pod.cpu.utilized following (pod.cpu.usage / node.cpu.limit)
generation_rules:
    - new_metric_name: pod.cpu.utilized
      generation_type: calculate
      operand1_metric: pod.cpu.usage
      operand2_metric: node.cpu.limit
      operation: divide

Create a new metric scaling the value of an existing metric

# create pod.memory.usage.bytes from pod.memory.usage.megabytes
generation_rules:
    - new_metric_name: pod.memory.usage.bytes
      generation_type: scale
      operand1_metric: pod.memory.usage.megabytes
      operation: multiply
      scale_by: 1048576

In-Scope

  • It will support calculating a new metric in two different approaches. 1) using two metrics and applyig an arithmatic operation, and 2) using one metric to scale the value with a fixed constant.
  • It supports a fixed set of operations/calculations- add, subtract, multiply, divide and percent. It does not support any other complex calculation.

Exception

  • In case of missing metric name or label, the operation will simply be distracted printing a debug log.
  • If the new metric name (being calculated) is already present, the operation will simply be distracted printing a debug log.

Open Questions

  1. For renaming a metrics (without calculating a new value), based on the metric label, I think the metrictransform processor is the right place. I created a seperate PR [metrics_transform_processor] Add filtering capabilities matching metric label values for applying changes #3201 for that. Is there any issue with this other than re-writing plan?
@hossain-rayhan
Copy link
Contributor Author

In addition to the above mentioned approach, I was exploring to see if we can support expressions like Prometheus Recording Rules. Recording rules and pretty generic and roubust. However, considering the complexity and amount of effort required to make something generic like this, I prefer not to go for it right now. We discussed this on last Wednesday's (03/24) Prometheus SIG meeting, and I believe from the meeting we decided to explore/ move with above mentioned simple approach. The configuration is not final, expecting more opinions/suggestions on the configuration.

@bogdandrutu @jmacd @jrcamp Would you please have a look to share your thoughts. Thanks in advance.

@jrcamp
Copy link
Contributor

jrcamp commented Mar 29, 2021

@hossain-rayhan Maybe take a look at how we (@dmitryax) did something similar inside the signalfxexporter. It operates on our own datapoints instead of OTLP so can't really be reused but might provide some inspiration:

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/signalfxexporter/translation/constants.go

I think it was discussed but I've already forgotten and don't see it in meeting notes. Would this replace metricstransformprocessor? Edit: or the reverse, why can't this be added to MTP?

@hossain-rayhan
Copy link
Contributor Author

hossain-rayhan commented Mar 29, 2021

@jrcamp thanks, I will look into the signalfxexporter implementation.

Would this replace metricstransformprocessor? Edit: or the reverse, why can't this be added to MTP?

I added this in the Open Question section in the description. Thanks. We discussed this in our SIG meeting. @bogdandrutu suggested to create this feature as a new processor. The existing metricstransformprocessor works with old OpenCensus metrics types. We have a plan to rewrite the whole thing and don't want to more code on top of it.

@jmacd
Copy link
Contributor

jmacd commented Mar 30, 2021

We discussed this on last Wednesday's (03/24) Prometheus SIG meeting, and I believe from the meeting we decided to explore/ move with above mentioned simple approach.

Thanks for following up, here. The proposal indeed looks simple and well justified, given how much more powerful and free-form the Prometheus recording rules.

@hossain-rayhan
Copy link
Contributor Author

@jrcamp , I looked into signalFx code (following example). I like their idea about naming the metrics like operand1_metricand operand1_metric. However, I think our one is also clear and declarative. But their code will give us some idea.

- action: calculate_new_metric
  metric_name: memory.utilization
  operand1_metric: memory.used
  operand2_metric: memory.total
  operator: /

@lubingfeng
Copy link

lubingfeng commented May 5, 2021

@bogdandrutu @alolita @punya can you review and give you comments / suggestions?

@hossain-rayhan
Copy link
Contributor Author

05/05 Collector SIG meeting discussion notes:

  • Rayhan will update the GitHub design proposal explainging the use case and how its different from existing metricstransformprocessor.
  • If community decides not to have this processor in the contrib, we will prefix the type/name with experimental_. Like experimental_resourcegeneration processor. We discussed this earlier and Bogdan mentioned today again.

@bogdandrutu
Copy link
Member

bogdandrutu commented May 5, 2021

Questions that are not answered:

  1. Combine multiple metrics in metricstransform seems to do 70% of what you are doing. At least based on the name.
  2. There is a PR Add scale_value transformation #3177 that adds scaling which is similar with unit change. Where should that functionality live?
  3. "Another use case us, we need to change the metric values to match the units. Say for example, we have pod.memory.usage in Megabytes. But we want to report it in Bytes. So we want to create a new metric which has memory usage in Bytes, like pod.memory.usage.bytes = pod.memory.usage * 1048576." - this usually does not "create" a new metric, it may update "unit". Why is this functionality in generator vs transformer?
  4. Where is the line between what goes to generator vs transformer?
  5. In an ideal world if we were able to extend transformer (after change to pdata) would this be a new processor or an extension to the other one?

@hossain-rayhan
Copy link
Contributor Author

Thanks @bogdandrutu . Here are my thoughts-

  1. Combine multiple metrics in metricstransform seems to do 70% of what you are doing. At least based on the name.

It does not do specifc calculation on selected metrics. For example, we need to define exact two specific metrics for calculating the pod.memory.utilization and it should be defined which one is numerator and which one is denominator. Also, it can perform only on aggregation type (sum, min, max, mean). To to other operation like- multiply, divide, and subtract we have some constraints. Its achieveable but I would say it requires heavy modifications.

  1. There is a PR Add scale_value transformation #3177 that adds scaling which is similar with unit change. Where should that functionality live?

It's the same question where do we want to put this. Another thing this PR only has power to multiply a value with a constant I believe- but not a big deal.

  1. this usually does not "create" a new metric, it may update "unit". Why is this functionality in generator vs transformer?

Its updating the units as well as values. User may not want to change the units always. Sometimes they may want to keep two metrics in two different units for the sake of their Dashboard/Console. So, thats my understanding.

  1. Where is the line between what goes to generator vs transformer? and
  2. In an ideal world if we were able to extend transformer (after change to pdata) would this be a new processor or an extension to the other one?

I can think of the follwing two options-

  1. Scope down the existing metricstransformprocessor to perform only name and label changing functionalities (which I don't think is feasible). Move everything else (calculation related) to a different processor (possibly metrics generation processor).
  2. Put everything on top of metricstransform processor.

I am expecting your guidance here.

@hossain-rayhan
Copy link
Contributor Author

The experimental processor is now available in the contrib repo.

dmitryax pushed a commit that referenced this issue Sep 30, 2024
…alculations (#35428)

**Description:** 
The metrics generation processor would previously hit a panic when the
calculation rule was operating on a `sum` metric, instead of a `gauge`.
The [component
proposal](#2722),
[initial
implementation](#3433),
nor the
[README](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/metricsgenerationprocessor)
state that it's a requirement for the metric to be a `gauge`, nor do I
think this requirement makes sense.

I've updated the logic to account for sum metrics being used and added a
test.

**Testing:** 
Added a test for this situation, the test panics before the change and
passes after.
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this issue Oct 4, 2024
…alculations (open-telemetry#35428)

**Description:** 
The metrics generation processor would previously hit a panic when the
calculation rule was operating on a `sum` metric, instead of a `gauge`.
The [component
proposal](open-telemetry#2722),
[initial
implementation](open-telemetry#3433),
nor the
[README](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/metricsgenerationprocessor)
state that it's a requirement for the metric to be a `gauge`, nor do I
think this requirement makes sense.

I've updated the logic to account for sum metrics being used and added a
test.

**Testing:** 
Added a test for this situation, the test panics before the change and
passes after.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants