Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[datadogexporter/datadogconnector] OpenTelemetry Missing Tags in APM Stats while upgrading to 0.95.0 #36329

Closed
JM89 opened this issue Nov 12, 2024 · 13 comments
Labels
bug Something isn't working connector/datadog exporter/datadog Datadog components

Comments

@JM89
Copy link

JM89 commented Nov 12, 2024

Component(s)

No response

What happened?

Description

I am publishing traces from EKS Services to DataDog using otel/opentelemetry-collector-contrib:0.94.0.
I have been trying to upgrade to the latest version, but could not go beyond 0.95.0 without breaking most of our alerting system which relies heavily on APM Metrics and specific tags being present.

I went through reusing the Datadog Connector as described here and here; and “trace.Microsoft.AspNetCore.server.hits” APM metrics now appear in DataDog, but without any of the custom tags available in the APM traces. I can see a number of options in the datadog connector (e.g. peer_tags) but none worked. The tags service, env and resource_name seems to go through, but not “host” or anything custom.

Is this behavior expected? Can these tags be available the same way they used to be? Is there a better way of doing this?

Steps to Reproduce

Upgrade the OpenTelemetryCollector sidecar from 0.94.0 to 0.95.0 and reuse datadog connector as configured below. Note that this was tried with v0.113.0 as well.

Expected Result

The computed APM Metrics contain the same tags as the APM traces.

Actual Result

The computed APM Metrics contain only service, env, and resource_name. The tag "host" is set to "none" and custom tags are not available.

Collector version

v0.95.0

Environment information

Environment

  • AWS EKS
  • .NET Applications publishing traces.
  • OpenTelemetry Collector running as sidecar.

OpenTelemetry Collector configuration

receivers:
      otlp:
        protocols:
          grpc:
          http:
processors: 
  batch:
    timeout: 10s
connectors:
    datadog/connector:
exporters:
  datadog:
    api:
      key: ${env:DD_API_KEY}
      site: ${env:DD_SITE}
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [datadog/connector]
    traces/2:
      receivers: [datadog/connector]
      processors: [batch]
      exporters: [datadog]
    metrics:
      receivers: [datadog/connector]
      processors: [batch]
      exporters: [datadog]

Log output

No response

Additional context

No response

@JM89 JM89 added bug Something isn't working needs triage New item requiring triage labels Nov 12, 2024
@jackgopack4
Copy link
Contributor

jackgopack4 commented Nov 20, 2024

Can you share a sample trace with the tags applied? Also have you opened a ticket with Datadog support?

Also, there have been some improvements/fixes relating to tagging between versions 0.95.0 and the current version 0.114.0 so if you are able to try a newer version that might be helpful as well.

@JM89
Copy link
Author

JM89 commented Dec 19, 2024

Sorry for the late reply, didn't have a chance to do more testing until today.

Can you share a sample trace with the tags applied?

What is the simplest way to share a sample trace? I need to anonymize it... However, please keep in mind that the issue does not appear on tags: the trace tags didn't change between datadog-exporter and datadog-connector for APM stats.

Also have you opened a ticket with Datadog support?

No, it was rapidly isolated to a difference in behavior between the datadog-exporter and datadog-connector so I thought this repository was the best place to ask the question. In fact, after some tests with 0.94.0 and the datadog-connector, the tags aren't there as well so it's not so much about the version, more like the tags are propagated differently between datadog-connector and datadog-exporter v0.94.0 during APM stats computation. Do you know if this is expected behavior?

Also, there have been some improvements/fixes relating to tagging between versions 0.95.0 and the current version 0.114.0 so if you can try a newer version that might be helpful as well.

Yes, I tested with 0.113.0 as mentioned in the question. I also tested with the 0.116.1, after the communication with the peer tags changes. The tags still do not appear.


Since the host is set to "none", is there a way to enrich at least this one manually using a processor? k8sattribute does not fit the purpose here since it's a sidecar setup. The processor attributes didn't add any tags anywhere either.

@jackgopack4
Copy link
Contributor

if you can open the support ticket that will allow our engineers to view sample traces from the support tool, that's why I suggested it

@JM89
Copy link
Author

JM89 commented Jan 6, 2025

I have now opened a DataDog support ticket. I am still very interested to understand the difference between the APM stats computations by the DataDog exporter v0.94.0 and the connector in the meantime? Especially around this host tag? Seems like this is not an expected behaviour.

@parkedwards
Copy link

parkedwards commented Jan 9, 2025

Update - my issue may not be the same as the OP; I was placing the attribute on the wrong context

#36272 (comment)


i am also running into this issue. our attributes processor is used to set a key/value tag onto our otlp traces and statistics we send to DD

    processors:
      attributes/env:
        actions:
          - action: insert
            key: env
            value: ${env}
...
        traces:
          receivers: [otlp]
          processors: [batch/dd, attributes/env]
          exporters: [datadog/connector]
        traces/dd-with-sampling:
          receivers: [datadog/connector]
          processors: [batch/dd, attributes/env, tail_sampling]
          exporters: [datadog/external]
        metrics/dd:
          receivers: [datadog/connector]
          processors: [batch/dd, attributes/env]
          exporters: [datadog/external]

however, only our traces have the resulting env tag. the statistics do not have this tag populating. we're on -contrib:0.116.1

Copy link
Contributor

Pinging code owners for exporter/datadog: @mx-psi @dineshg13 @liustanley @songy23 @mackjmr @ankitpatel96 @jade-guiton-dd. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

Copy link
Contributor

Pinging code owners for connector/datadog: @mx-psi @dineshg13 @ankitpatel96 @jade-guiton-dd. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

@mx-psi mx-psi removed the needs triage New item requiring triage label Jan 14, 2025
@mackjmr
Copy link
Member

mackjmr commented Jan 14, 2025

@parkedwards The reason why the attributes processor is not added to your trace stats is that the OTLP metrics which are exported via the datadog connector are not the trace stats, but contain a tag which contains the serialised trace stats payload. If you want the env tag on your stats computed in dd connector, you will need to set resource attribute deployment.environment.name on the relevant traces.

@JM89
Copy link
Author

JM89 commented Jan 17, 2025

Hi,

Unfortunately, the support ticket does not help much to progress on this issue. I would deeply appreciate any technical assistance with this configuration as well as some context on the expected behavior.

I have done a few more tests, and so far, I have managed to find a way to get the cluster name and host back by trial and error, but no custom metrics go through for the trace metrics trace.Microsoft.AspNetCore.server.hits. However, the other custom metrics coming from OTEL have all tags. No problems with the traces themselves.

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: opentelemetry-collector
  labels:
    kustomize.toolkit.fluxcd.io/substitute: disabled
spec:
  mode: sidecar
  image: docker/otel/opentelemetry-collector-contrib:0.117.0
  resources:
    requests:
      cpu: 10m
      memory: 46Mi
  env:
    - name: API_KEY
      valueFrom:
        secretKeyRef:
          name: vault-secrets
          key: datadog-api-key
    - name: DD_NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName
    - name: DD_CLUSTER_NAME
      value: cluster
    - name: DD_TEAM
      value: team_x
    - name: DD_HOSTNAME
      value: $(DD_NODE_NAME)-$(DD_CLUSTER_NAME)
  config: 
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors: 
      resourcedetection:
        detectors: [system, eks]
        timeout: 15s
        override: false
        eks:
          resource_attributes: # No effect noticed
            k8s.cluster.name:
              enabled: true
        system:
          resource_attributes: # No effect noticed
            host.name:
              enabled: true
            host.id:
              enabled: false
      batch:
        timeout: 10s
      resource/custom:
        attributes: # No effect noticed
          - action: insert
            key: cluster_name
            value:  coming_from_resource
          - action: insert
            key: test_two
            value:  coming_from_resource
          - action: insert
            key: k8s.cluster.name	
            value:  coming_from_resource
          - action: insert 
            key: host.ip
            value: coming_from_resource 
          - action: insert
            key: host.name
            value: coming_from_resource 
      attributes/custom:
        actions:
          - action: insert
            key: test
            value: coming_from_attr_custom  # This does not appears in trace.Microsoft.AspNetCore.server.hits but appears in otlp custom metrics and traces + aspire
          - action: insert
            key: topicname
            value: coming_from_attr_custom  # This does not appears in trace.Microsoft.AspNetCore.server.hits but appears in otlp custom metrics and traces + aspire
      transform:
        metric_statements: &statements
          - context: resource
            statements:
              - set(attributes["datadog.host.name"], "coming_from_transform")     # This appears in trace.Microsoft.AspNetCore.server.hits
              - set(attributes["kube_cluster_name"], "coming_from_transform")     # This does not appear in trace.Microsoft.AspNetCore.server.hits but appears in otlp custom metrics and traces + aspire
              - set(attributes["k8s.cluster.name"], "coming_from_transform_two")  # This appears in trace.Microsoft.AspNetCore.server.hits under cluster_name
              - set(attributes["cluster_name"], "coming_from_transform")          # This does not appear in trace.Microsoft.AspNetCore.server.hits but appears in otlp custom metrics and traces + aspire
              - set(attributes["datadog.host.use_as_metadata"], true)
              - set(attributes["datadog.test_three"], "coming_from_transform")  # This does not appear in trace.Microsoft.AspNetCore.server.hits but appears in otlp custom metrics and traces + aspire
              - set(attributes["team"], "coming_from_transform")                  # This does not appear in trace.Microsoft.AspNetCore.server.hits but appears in otlp custom metrics and traces + aspire
              - set(attributes["datadog.team"], "coming_from_transform_two")      # This does not appear in trace.Microsoft.AspNetCore.server.hits but appears in otlp custom metrics and traces + aspire
              - set(attributes["topicname"], "coming_from_transform")             # This does not appear in trace.Microsoft.AspNetCore.server.hits but appears in otlp custom metrics and traces + aspire
              - set(attributes["service.name"], "coming_from_transform")          # This appeared before in trace.Microsoft.AspNetCore.server.hits under `service`, and can be overriden 
        trace_statements: *statements # Use the same statements as in metrics
        log_statements: *statements # Use the same statements as in metrics
    exporters:
      datadog/exporter:
        api:
          key: "${API_KEY}"
          site: datadoghq.eu
        metrics:
          resource_attributes_as_tags: true
      otlp/aspire:
        endpoint: aspire-otlp:18889
        tls:
          insecure: true
    connectors:
      datadog/connector:
        traces:
          peer_tags: ["cluster_name", "test", "test_two", "k8s.cluster.name", "host.ip", "host.name"] # No effect noticed
          resource_attributes_as_container_tags: ["cloud.availability_zone", "cloud.region"]              # No effect noticed
          compute_stats_by_span_kind: true                                                                # No effect noticed
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [resourcedetection, transform, attributes/custom, resource/custom, batch]
          exporters: [datadog/connector, datadog/exporter, otlp/aspire]
        metrics:
          receivers: [datadog/connector, otlp]
          processors: [resourcedetection, transform, attributes/custom, resource/custom, batch]
          exporters: [datadog/exporter, otlp/aspire]

I can see the documentation suggests that span tags should not be there, where can I get the full list of tags available by default? are the resource tags also subject to this explicit "filtering"? can this behavior be overridden in any way?

Image

Reference doc:

Thank you very much,

PS:

parkedwards Based on what I have seen so far, have you tried the transform processor yet?

processors:
      transform:
        metric_statements: &statements
          - context: resource
            statements:
              - set(attributes["deployment.environment.name"], "new env")  
        trace_statements: *statements # Use the same statements as in metrics
        log_statements: *statements # Use the same statements as in metrics
 service:
    pipelines:
      traces:
        receivers: [otlp]
        processors: [transform, batch]
        exporters: [datadog/connector, datadog]
      metrics:
        receivers: [datadog/connector, otlp]
        processors: [transform, batch]
        exporters: [datadog]

@jackgopack4
Copy link
Contributor

so there are a few things happening here.

  1. @JM89 your most recent screenshot of the relevant doc section is true; only those specific tags will get passed into the apm stats payloads. The "second primary tag" is something that can get added to stats payloads IF it is present on all OTLP trace data as well as the trace metrics payload, something I will get to later in how to set up correctly with OTel.
  2. The host tags from datadog are only available in APM stats payloads when using the Datadog Agent currently. This is something that is not yet implemented for connector. It may have been working in the legacy exporter (pre 0.95.0) due to the fact that the logic was more tightly coupled to the Agent codebase. You can see in our source code we still depend on packages from Datadog Agent in Connector/Exporter, but they are more purpose-built utilities that have lower overhead than instantiating an entire trace agent just to compute APM stats.
  3. As @mackjmr mentioned earlier, none of the OTLP metrics that the datadog connector is producing are the actual APM metrics. We generate a "throwaway" OTLP metric with connector to which a stats payload is attached; these stats are consumed by Datadog backend and the Trace Metrics/APM Metrics are calculated there, on the backend. It is possible that we may be able to ship APM stats as OTLP data in the future, but there were performance issues in the past, so this is the way it works with no plans to change/overhaul.

As such, throwing resource/attributes/transform processors at the metrics won't affect the APM stats/calculated metrics, and even throwing them at the traces coming in won't add the tags to the calculated metrics unless they are one of the types of tags mentioned in the doc.

Which brings me back to the "second primary tag." There is currently a method to do this, call it a "beta" or "workaround," but we can set either host tags or container tags as a "second primary tag" given that it is placed on all OTLP traces, and that the host.name resource attribute or the datadog.host.name matches the Infrastructure List host metadata in the Datadog backend.

If you follow the steps here and here, and set the host.name or datadog.host.name on the telemetry such that it matches the host name in your Datadog Infrastructure List in the backend, it should match up. The easiest way to set the host.name may be via OTEL_RESOURCE_ATTRIBUTES variable on your application, or via any one of those processors you've tried in the collector.

Sorry this has taken so long to resolve; hopefully this brings you and anyone else dealing with this issue closer to resolution.

@JM89
Copy link
Author

JM89 commented Jan 20, 2025

Hi @jackgopack4,

Thank you very much for the context and your detailed explanations.

In summary, there is currently no way to pass custom tags in the APM Stats Payload, which means we need to migrate all live DD monitors based on APM metrics with custom tags before we can upgrade to the newer version of the OpenTelemetry collector. If this conclusion is exact, I'll suggest an update of the documentation in the support ticket to cover the difference in tagging logic here: https://docs.datadoghq.com/opentelemetry/guide/migration/ if that's okay.

Out of curiosity for number 3, is this the name of the "throwaway" OTLP metric dd.internal.stats.payload? It does appear empty in the Aspire Dashboard.

Best,

@jackgopack4
Copy link
Contributor

jackgopack4 commented Jan 21, 2025

In summary, there is currently no way to pass custom tags in the APM Stats Payload

Correct, there is a workaround to pass a second primary tag as detailed above but only a singular tag.

I'll suggest an update of the documentation in the support ticket to cover the difference in tagging logic here: docs.datadoghq.com/opentelemetry/guide/migration if that's okay.

That's a good point, it could use some clarification. I am not certain the previous functionality was actually intentional, but we will make sure to clarify it.

Out of curiosity for number 3, is this the name of the "throwaway" OTLP metric dd.internal.stats.payload? It does appear empty in the Aspire Dashboard.

Correct, although that could always be subject to change down the road.

@JM89
Copy link
Author

JM89 commented Jan 21, 2025

Thank you very much for your help,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working connector/datadog exporter/datadog Datadog components
Projects
None yet
Development

No branches or pull requests

6 participants