Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTEL sources should create events without any transformations #5259

Open
kkondaka opened this issue Dec 12, 2024 · 2 comments
Open

OTEL sources should create events without any transformations #5259

kkondaka opened this issue Dec 12, 2024 · 2 comments

Comments

@kkondaka
Copy link
Collaborator

kkondaka commented Dec 12, 2024

Is your feature request related to a problem? Please describe.
Currently, all OTEL sources (OTEL trace source, OTEL metric source and OTEL log source) does some transformations while creating events from OTEL data.

  1. In all sources, the keys are created by replacing "." with "@" (dedotting)
  2. In all sources, the attributes are "flattened" by moving them to the root of the event instead of nesting under "attributes"

The dedotting is done to make the data is compatible with OpenSearch.

Describe the solution you'd like
I think the transformations should be outside of the OTEL sources because sink is not always OpenSearch. Also, the users are not given any option to not do the transformations. We should remove the transformations from the OTEL sources and let users explicitly do this as a processor or OpenSearch sink option

   processor:
      - opensearch_compatibility_transform:
           flatten_attributes: true
           dedotting: true

or Alternatively

   sink:
     - opensearch:
           opensearch_compatibility_transformation: true

Describe alternatives you've considered (Optional)
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@KarstenSchnitter
Copy link
Collaborator

KarstenSchnitter commented Dec 12, 2024

OpenTelemetry usually has a threefold aggregation of its signals: resource > instrumentation scope > logs/metrics/spans. This means a resource can hold a collection of instrumentation scopes which can hold a collection of either logs, metrics or spans. Each of these levels have a name and can have a nested map of attributes usually described by the semantic conventions.

The flattening of the data structure was introduced to break down these collections to the smallest part, i.e., a log record, metric data point or span record. This helps generate single documents for each of these parts. If the flattening is removed, the collections will stay together. Note, that these collections might have been built arbitrarily by the batch processor, e.g., implemented in the OpenTelemetry Collector. This seems strange to keep in Data Prepper.

Furthermore, Data Prepper currently does not handle the instrumentation scope attributes correctly. They are added as log/metric/span attributes without any means of distinction from the proper attributes of these signals. This needs to be fixed in any case.

Finally, the OpenSearch data model created by Data Prepper does not align to the schemes and mappings defined by the OpenSearch catalogue: https://github.com/opensearch-project/opensearch-catalog/tree/main/schema/observability. This should be aligned, so that data ingested by Data Prepper can be used with the visualizations from the catalogue.

@dlvenable
Copy link
Member

@kkondaka , I like the goal of this proposal and think it makes sense. However, I'm not sure that a generic OpenSearch compatibility processor would be ideal. What does it mean to make something "OpenSearch compatible?"

For OTel, some fields are nested and some are not. You can see this in the OTel trace mappings.

Here is some sample data from the current trace pipeline. You see that resources.attributes. creates nested fields. But, within there the fields are flattened using @.

"span.attributes.sampler@param": true,
"span.attributes.http@method": "GET",
"span.attributes.http@url": "/jquery-3.1.1.min.js",
"resource.attributes.client-uuid": "41b4ebf38f38063a",
"resource.attributes.ip": "172.20.0.7",
"resource.attributes.host@name": "3a28510b4824",
"resource.attributes.opencensus@exporterversion": "Jaeger-Go-2.30.0",
"resource.attributes.service@name": "frontend",
"span.attributes.component": "net/http",

I think we'd do better moving this logic into the otel_traces processor. Similarly, we'd want to have processors for otel_metrics and otel_logs which get these to conform to the OpenSearch standard.

Also, we'd need to create a migration path. Could you elaborate on what that migration path would look like?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants