Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: Log deduplication processor #34118

Closed
3 tasks
codeboten opened this issue Jul 16, 2024 · 12 comments · Fixed by #34465
Closed
3 tasks

New component: Log deduplication processor #34118

codeboten opened this issue Jul 16, 2024 · 12 comments · Fixed by #34465
Labels
Accepted Component New component has been sponsored

Comments

@codeboten
Copy link
Contributor

codeboten commented Jul 16, 2024

The purpose and use-cases of the new component

We're working to put together a log de-duplication processor at Honeycomb and wanted to get the temperature of the community for hosting such a component in the contrib repository.

Example configuration for the component

dedupe:
  // TTL is the time-to-live for each entry in the cache. Default is 30 seconds.
  ttl: 30s
  // MaxEntries is the maximum number of entries that can be stored in the cache. Default is 1000.
  max_entries: 1000
  // AttributeNames is the list of attributes to use as the key for deduplication.
  attribute_names: []

Telemetry data types supported

logs

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am a member of the OpenTelemetry organization.
  • If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.

Code Owner(s)

@codeboten @MikeGoldsmith

Sponsor (optional)

No response

Additional context

No response

@codeboten codeboten added Sponsor Needed New component seeking sponsor needs triage New item requiring triage labels Jul 16, 2024
@atoulme
Copy link
Contributor

atoulme commented Jul 17, 2024

Do you only intend to dedup on attributes? What about timestamps and log body?

@atoulme atoulme removed the needs triage New item requiring triage label Jul 17, 2024
@codeboten
Copy link
Contributor Author

log bodies would be deduped by default, the attributes_names configuration would allow users to differentiate between different sources for logs they care about. i would expect timestamps would likely be thrown away as part of the deduplication w/ only the first or last timestamp reported

@BinaryFissionGames
Copy link
Contributor

We have a processor with similar functionality in our agent: https://github.com/observIQ/bindplane-agent/tree/release/v1.56.0/processor/logdeduplicationprocessor

We ended up storing the first/last observed timestamp on the attributes, as well as storing the count of deduped logs as an attribute. It seems like it could be worth considering doing something similar here as well.

It would be awesome to have functionality like this here in contrib, deduping logs is very useful.

@codeboten
Copy link
Contributor Author

@BinaryFissionGames would you consider donating the component from that repo here? Seems like it would cover a lot of functionality already

@BinaryFissionGames
Copy link
Contributor

@codeboten Yeah, we'd be happy to donate the component to contrib.

@djaglowski
Copy link
Member

I can sponsor this.

@djaglowski djaglowski added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor labels Jul 22, 2024
@codeboten
Copy link
Contributor Author

Thanks @djaglowski!

@pellared
Copy link
Member

Related spec issue: open-telemetry/opentelemetry-specification#3931

@codeboten
Copy link
Contributor Author

@BinaryFissionGames would you like to do the initial PRs to make the component a reality or would you prefer if I did?

@BinaryFissionGames
Copy link
Contributor

@codeboten I'm a bit tight on time at the moment, if you're able to get a start on it that would be awesome.

@MikeGoldsmith
Copy link
Member

Thanks @BinaryFissionGames - I'll prepare a PR with the code from your repo.

@MikeGoldsmith
Copy link
Member

PR created

@BinaryFissionGames would be great for you review and approve on behalf of yourself & ObserveIQ.

djaglowski added a commit that referenced this issue Aug 7, 2024
**Description:**

Starts the donation of the
[logdedupprocessor](https://github.com/observIQ/bindplane-agent/tree/release/v1.58.0/processor/logdeduplicationprocessor)
from ObserveIQ's Bindplane agent on behalf of @BinaryFissionGames.

**Link to tracking Issue:**

- Closes #34118 

**Testing:**

Includes unit tests.

**Documentation:**

---------

Co-authored-by: Brandon Johnson <binaryfissiongames@gmail.com>
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
djaglowski pushed a commit that referenced this issue Aug 7, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
I noticed in the PR introducing this component that the README was
missing the generic header that all other components have. This is added
by `mdatagen` when the README has the `<!-- status autogenerated section
-->` and `<!-- end autogenerated section -->` lines included.

I also removed the section about supported pipelines, as the added
header should make this clear enough.

**Link to tracking Issue:** <Issue number if applicable>

#34118
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
…emetry#34465)

**Description:**

Starts the donation of the
[logdedupprocessor](https://github.com/observIQ/bindplane-agent/tree/release/v1.58.0/processor/logdeduplicationprocessor)
from ObserveIQ's Bindplane agent on behalf of @BinaryFissionGames.

**Link to tracking Issue:**

- Closes open-telemetry#34118 

**Testing:**

Includes unit tests.

**Documentation:**

---------

Co-authored-by: Brandon Johnson <binaryfissiongames@gmail.com>
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
…#34488)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
I noticed in the PR introducing this component that the README was
missing the generic header that all other components have. This is added
by `mdatagen` when the README has the `<!-- status autogenerated section
-->` and `<!-- end autogenerated section -->` lines included.

I also removed the section about supported pipelines, as the added
header should make this clear enough.

**Link to tracking Issue:** <Issue number if applicable>

open-telemetry#34118
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored
Projects
None yet
6 participants