Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/awsemf] Exporter ignores the first batch of a metric send to cloudwatch #1653

Closed
mircohacker opened this issue Nov 22, 2022 · 7 comments
Assignees

Comments

@mircohacker
Copy link

The awsemf exporter of the ADOT distribution of OTEL sends faulty data for the first batch of metrics send to AWS Cloudwatch.

Steps to reproduce

  1. The code is checked in at https://github.com/mircohaug/otel-awsemf-reproduction-ignored-first-batch
  2. Authenticate your shell against an AWS account with the default profile. (Or change the value of the AWS_PROFILE env var in step 3)
  3. create the log group emfbug-reproduction-embedded-metrics-otel by running aws logs create-log-group --log-group-name emfbug-reproduction-embedded-metrics-otel
  4. Start the otel agent by running
docker run -d --rm -p 4317:4317 \
-e AWS_REGION=eu-central-1 \
-e AWS_PROFILE=default \
-v ~/.aws:/root/.aws \
-v "$(pwd)/otel-agent-config.yaml":/otel-local-config.yaml \
--name awscollector \
public.ecr.aws/aws-observability/aws-otel-collector:latest \
--config otel-local-config.yaml;
  1. run yarn install
  2. create metrics by running yarn start
    1. In the code we create a new OTEL counter. We add 1 to the counter four times in total. We Split these four increments on two batches with two increments each. Between these batches there is a wait time to allow the OTEL agent to flush the values to AWS.
  3. Run this log insights query (fields counter_name,@timestamp on the log group emfbug-reproduction-embedded-metrics-otel ) to see the published EMF Metrics.
    1. See actual and expected result
  4. Cleanup by running docker rm -f awscollector and aws logs delete-log-group --log-group-name emfbug-reproduction-embedded-metrics-otel

Expected Result

We expect the loggroup to contain two entries with a value of 2. One entry for each batch.

Actual result

We only get one entry with a value of 2. The first batch gets its value set to 0.

Additional information

  • The behaviour persists over multiple runs of the script.
  • Also does it happen for each new combination of counter name, and attributes.
  • In Order to exclude a faulty implementation in the OTEL Framework we also added a ConsoleMetricExporter alongside the one that exports the metrics to the OTEL agent. This Exporter print the correct values to the console.
  • In addition we added a file exporter to the OTEL agent pipeline. This one also shows the correct values.
  • Small waits between the batches lead to all four increments ending up in the same batch and we se no value in cloudwatch whatsoever.
  • The upstream issue. Opened here as well for visibility.
@bryan-aguilar
Copy link
Contributor

bryan-aguilar commented Dec 1, 2022

Could this be due to the fact that the awsemfexporter performs cululative-to-delta conversion for cumulative sums? The conversion drops the initial state, you can see relevant discussion on this PR.

@mircohacker
Copy link
Author

This sounds exactly like the behavior I've observed. Do you @bryan-aguilar know if this can be disabled or if there is a workaround? I already tried to to initialize the counter and adding 0 before using it but this did not work. Probably because the values are aggregated over a certain time frame...

@jan-xyz
Copy link

jan-xyz commented Dec 26, 2022

I'm running into the same bug as well. I was actually not seeing any metrics at all until I found your bug report and tried to just collect some more data points.

I tried force flushing once when starting up the application as a workaround but it also doesn't "fix" it.

My sample application can be found here. It's not as minimal as @mircohaug's, and does also tracing and x-ray log correlation. With metrics being broken for low-frequency metrics. It's a go lambda that uses the ADOT lambda layer with awsemf and awsxray exporters configured.

@mircohacker
Copy link
Author

PR open-telemetry/opentelemetry-collector-contrib#17988 was merged. In the next release there exists the configuration flag retain_initial_value_of_delta_metric in the emf subsection. If this flag is set to true the initial value of any metric is not dropped any more.

@mizzzto
Copy link

mizzzto commented Apr 25, 2023

Hi @mircohaug
I tried your test project by adding the new flag retain_initial_value_of_delta_metric: true in otel-agent-config.yaml
I added the flag in the awsemf section right below namespace. However, after running the application the CloudWatch log was still showing incorrect count. Have you tested the application after the PR was merged and released? Or am I missing some other configuration needed?
Screenshot 2023-04-25 at 10 40 59

@bryan-aguilar
Copy link
Contributor

@mizzzto can you please open a new issue in this repository? We will most likely ask for an issue to be opened upstream also if this is indeed a bug.

@mizzzto
Copy link

mizzzto commented Apr 25, 2023

@bryan-aguilar Sure, here it is: #1991

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants