Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/elasticsearch] Emit _doc_count for metric documents in OTel mode when data point attribute _doc_count is true #35348

Merged
merged 8 commits into from
Sep 25, 2024

Conversation

carsonip
Copy link
Contributor

@carsonip carsonip commented Sep 23, 2024

Description:

Emit _doc_count for metric documents in OTel mode when data point attribute _doc_count is true

Link to tracking Issue:

Testing:

Documentation:

// documents is per-resource. Therefore, there is no need to hash resource attributes
hash := metricOTelHash(dp, scope.Attributes(), metric.Unit())
hash := metricOTelHash(dp, scope.Attributes(), metric.Unit(), docCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need docCount as a hashing key? Maybe we could add the doc counts for existing hashes and not consider docCount as a hashing key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need docCount as a hashing key?

I think we'll need it. If 2 metrics with the same dimension have different doc counts, they cannot share the same document, as there is only 1 doc count in 1 document.

Maybe we could add the doc counts for existing hashes

Could you elaborate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 2 metrics with the same dimension have different doc counts

Having a hard time imagining how this could practically happen. Doesn't this mean that we would drop the metric anyway since they are duplicates w.r.t dimensions?

Maybe we could add the doc counts for existing hashes

I was thinking that if all attributes are same then merging them would be the way to go (merging histograms + doc count values) but honestly, I can't imagine a case when this could happen in practical scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean that we would drop the metric anyway since they are duplicates w.r.t dimensions?

Good point, that appears to be the case for TSDB. @felixbarny did mention something about hashing doc count, not sure if this case was considered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of that issue, I think it would be easier to map metrics named _doc_count to the top-level _doc_count field. We don't need a _doc_count for all metrics, and it makes it harder to group as many metrics to the same document as possible, because these metrics may differ just by their _doc_count.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new implementation that is based on the _doc_count: true attribute, I'm also wondering whether we can remove the doc count from the hash. If there are multiple doc counts for a given doc, I think it's fine to chose one of them (for example the first, last, or max value).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I forgot to make that change.

@carsonip carsonip changed the title [exporter/elasticsearch] Emit _doc_count for metric documents [exporter/elasticsearch] Emit _doc_count for metric documents in OTel mode when _doc_count is true Sep 24, 2024
@carsonip carsonip changed the title [exporter/elasticsearch] Emit _doc_count for metric documents in OTel mode when _doc_count is true [exporter/elasticsearch] Emit _doc_count for metric documents in OTel mode conditionally Sep 24, 2024
Copy link
Contributor

@felixbarny felixbarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the leftover LGTM

exporter/elasticsearchexporter/model.go Outdated Show resolved Hide resolved
exporter/elasticsearchexporter/model.go Outdated Show resolved Hide resolved
@carsonip carsonip marked this pull request as ready for review September 24, 2024 13:11
@carsonip carsonip requested a review from a team as a code owner September 24, 2024 13:11
@carsonip carsonip changed the title [exporter/elasticsearch] Emit _doc_count for metric documents in OTel mode conditionally [exporter/elasticsearch] Emit _doc_count for metric documents in OTel mode when data point attribute _doc_count is true Sep 25, 2024
Copy link
Member

@andrzej-stencel andrzej-stencel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this feature should be mentioned in the docs?

@andrzej-stencel andrzej-stencel merged commit 24ea6bc into open-telemetry:main Sep 25, 2024
158 checks passed
@github-actions github-actions bot added this to the next release milestone Sep 25, 2024
andrzej-stencel pushed a commit that referenced this pull request Oct 1, 2024
…ute handling for data points in OTel mapping mode (#35479)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Supersedes #35348 

elasticsearch.mapping.hints takes a slice of strings.
`_doc_count` enables emitting `_doc_count` for the document.
`aggregate_metric_double` causes histogram or exponential histogram to
be emitted as aggregate_metric_double.

**Link to tracking Issue:** <Issue number if applicable>

**Testing:** <Describe what testing was performed and which tests were
added.>

**Documentation:** <Describe the documentation added.>
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this pull request Oct 4, 2024
… mode when data point attribute _doc_count is true (open-telemetry#35348)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Emit _doc_count for metric documents in OTel mode when data point
attribute _doc_count is true

**Link to tracking Issue:** <Issue number if applicable>

**Testing:** <Describe what testing was performed and which tests were
added.>

**Documentation:** <Describe the documentation added.>
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this pull request Oct 4, 2024
…ute handling for data points in OTel mapping mode (open-telemetry#35479)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Supersedes open-telemetry#35348 

elasticsearch.mapping.hints takes a slice of strings.
`_doc_count` enables emitting `_doc_count` for the document.
`aggregate_metric_double` causes histogram or exponential histogram to
be emitted as aggregate_metric_double.

**Link to tracking Issue:** <Issue number if applicable>

**Testing:** <Describe what testing was performed and which tests were
added.>

**Documentation:** <Describe the documentation added.>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants