Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: AWS S3 Receiver #30750

Closed
2 tasks
adcharre opened this issue Jan 24, 2024 · 15 comments · Fixed by #35809
Closed
2 tasks

New component: AWS S3 Receiver #30750

adcharre opened this issue Jan 24, 2024 · 15 comments · Fixed by #35809
Labels
Accepted Component New component has been sponsored Stale

Comments

@adcharre
Copy link
Contributor

adcharre commented Jan 24, 2024

The purpose and use-cases of the new component

The S3 receiver will allow the retrieval and processing of telemetry data previously stored in S3 by the AWS S3 Exporter.
This will make it possible to retrieve data previously cold stored in S3 and allow us to investigate issues not reported within the time span data is available in our Observability service provider.

Example configuration for the component

receivers:
  awss3:
    s3downloader:
      s3_bucket: abucket
      s3_prefix: tenant_a
      s3_partition: minute
    starttime: "2024-01-13 15:00"
    endtime: "2024-01-21 15:00"

Telemetry data types supported

  • traces
  • metrics
  • logs

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.

Code Owner(s)

adcharre

Sponsor (optional)

@atoulme

Additional context

No response

@adcharre adcharre added needs triage New item requiring triage Sponsor Needed New component seeking sponsor labels Jan 24, 2024
@atoulme
Copy link
Contributor

atoulme commented Jan 24, 2024

I'm interested to learn more. Would this be something you'd be able to checkpoint on?

@adcharre
Copy link
Contributor Author

Would this be something you'd be able to checkpoint on?

@atoulme certainly, it's something I'm actively looking into at the moment it so makes sense to me get a second opinion on the best way to implement this and hopefully accepted.
How best to organise?

@atoulme
Copy link
Contributor

atoulme commented Mar 6, 2024

For all components, we tend to work with folks through CONTRIBUTING.md. The question I asked you earlier is in earnest - one of the thorny issues around a component reading from a remote source is to have a checkpoint mechanism that allows you to know where you stopped. We can use the storage extension for that purpose.

I am happy to sponsor this component if you'd like to work on it.

@atoulme atoulme added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor needs triage New item requiring triage labels Mar 6, 2024
@adcharre
Copy link
Contributor Author

adcharre commented Mar 6, 2024

Ahh, I understand now! Thank you for clarification and yes that is an issue I have been thinking about - how best to signal that ingest is finished. I'll look into the storage extension and get a PR up with the skeleton of the receiver.

dmitryax pushed a commit that referenced this issue Mar 25, 2024
**Description:** Initial skeleton implementation of the AWS S3 receiver
described in issue #30750.
Full implementation will follow in future PRs.

**Link to tracking Issue:** #30750

**Testing:** -

**Documentation:** Initial README added.
@rhysxevans
Copy link

Hi, apologies I may be hijacking this thread.

Has there been any thought around integrating the S3 receiver, with SQS and S3 Event notifications ?

Our use case is we cannot directly write to an OTEL reciever in all cases, however we can write to a S3 bucket. We would then like the object event notification to notify SQS, where we could have a OTEL collector (or set of them) "listening" and on notification fetch the uploaded file and then output it into hte OTLP backend store. We coudl then also retain the source data in S3 and leverage the current features of this reciever to replay data if required.

An example sender may look something like:

    receivers:
      otlp:
        protocols:
          http:
            endpoint: 0.0.0.0:4318
            cors:
              allowed_origins:
                - "http://*"
                - "https://*"

    exporters:
      awss3:
        s3uploader:
            region: us-west-2
            s3_bucket: "tempo-traces-bucket"
            s3_prefix: 'metric'
            s3_partition: 'minute'

    processors:
      batch:
        send_batch_size: 10000
        timeout: 30s
      resource:
        attributes:
        - key: service.instance.id
          from_attribute: k8s.pod.uid
          action: insert
      memory_limiter:
        check_interval: 5s
        limit_mib: 200

    service:
      pipelines:
        traces:
          processors: [memory_limiter, resource, batch]
          exporters: [awss3, spanmetrics]

The reciever could poss look something like:

    receivers:
      awss3:
        sqs:
          queue_url: "https://sqs.us-west-1.amazonaws.com/<account_id>/queue"

    exporters:
      otlp:
        endpoint: 'http://otlp-endpoint:4317'

    processors:
      batch:
        send_batch_size: 10000
        timeout: 30s
      memory_limiter:
        check_interval: 5s
        limit_mib: 200

    service:
      pipelines:
        traces:
          processors: [memory_limiter, batch]
          exporters: [otlp, spanmetrics]

Thoughts ?

S3 Event Notifications: https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventNotifications.html

@szechyjs
Copy link

szechyjs commented Apr 9, 2024

It would be nice to be able to have this run continuously instead of specifying start/end times. This would help with shipping traces across clusters/accounts.

flowchart LR
  subgraph env1
  app1 --> env1-collector
  app2 --> env1-collector
  end
  env1-collector --> S3[(S3)]
  subgraph env2
  app3 --> env2-collector
  app4 --> env2-collector
  end
  env2-collector --> S3
  subgraph shared-env
  S3 --> shared-collector
  end
Loading

@awesomeinsight
Copy link

It would be nice to be able to have this run continuously instead of specifying start/end times. This would help with shipping traces across clusters/accounts.

flowchart LR
  subgraph env1
  app1 --> env1-collector
  app2 --> env1-collector
  end
  env1-collector --> S3[(S3)]
  subgraph env2
  app3 --> env2-collector
  app4 --> env2-collector
  end
  env2-collector --> S3
  subgraph shared-env
  S3 --> shared-collector
  end
Loading

Fully agree,

we also have scenarios where a receiver should constantly process new uploads (from S3Exporter) on an S3 bucket. Means without specifying starttime and endtime but having a checkpoint where it last stopped reading.

@adcharre
Copy link
Contributor Author

@awesomeinsight / @rhysxevans - I see no reason why the receiver could not be expanded to include the scenario you suggest. At the moment I'm focusing on getting the initial implementation merged which focuses on my main use case of restoring data between a set of dates.

@worksForM3
Copy link

@awesomeinsight / @rhysxevans - I see no reason why the receiver could not be expanded to include the scenario you suggest. At the moment I'm focusing on getting the initial implementation merged which focuses on my main use case of restoring data between a set of dates.

If the receiver would be expanded at some point to constantly process new uploads made by the S3Exporter, could it be used to buffer data independently of a file system? The idea would be to have an alternative to https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/exporterhelper#persistent-queue.

The idea is to have a resilient setup of exporters + importers (with s3 in between as buffer) which run stateless, as they would not require any filesystem to buffer data to disks.

Do you think a setup like this would make sense?

andrzej-stencel pushed a commit that referenced this issue May 7, 2024
**Description:** This is the initial implementation of the AWS S3
receiver. The receiver can load trace from an S3 bucket starting at the
configured time until the stop time. Json and protobuf formats are
supported along with gzip compression.

**Link to tracking Issue:** #30750

**Testing:** Unit tests added and read real trace from an S3 bucket.

**Documentation:** None added

---------

Co-authored-by: Antoine Toulme <antoine@toulme.name>
codeboten pushed a commit that referenced this issue Jul 5, 2024
**Description:** Add support for receiving Logs and Metrics using the
AWS S3 Receiver

**Link to tracking Issue:** #30750

**Testing:** New unit tests added for Logs and Metrics
Copy link
Contributor

github-actions bot commented Jul 8, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@taraspos
Copy link

taraspos commented Jul 12, 2024

One more use-case I see is exporting CloudWatch Metrics metrics to S3 with CloudWatch Metric Streams and then reading them with awss3receiver.

@github-actions github-actions bot removed the Stale label Jul 13, 2024
@adcharre
Copy link
Contributor Author

@taraspos - Interesting suggestion and certainly possible when the ability to process new objects being added to the bucket is implemented.

codeboten pushed a commit that referenced this issue Jul 25, 2024
**Description:** Remove use of deprecated AWS SDK v2 Endpoint resolver
API

**Link to tracking Issue:** #30750

**Testing:** Tested against a locally running localstack instance.

**Documentation:** N/A
@limberger
Copy link

Hello, is there any simple way to debug the component or make it more verbose?
I'm trying to use it, i have some .json files on S3, but nothing happens.

@adcharre
Copy link
Contributor Author

adcharre commented Aug 9, 2024

Hello, is there any simple way to debug the component or make it more verbose? I'm trying to use it, i have some .json files on S3, but nothing happens.

@limberger Were these JSON file written to the bucket using the AWS S3 Exporter? If not you need to ensure that you're following the same conventions that the S3 exporter uses for naming the objects written to the bucket.
I will be adding some more logging to help with monitoring progress in an upcoming PR.

jpkrohling pushed a commit that referenced this issue Aug 20, 2024
**Description:** Enhance the logging of the AWS S3 Receiver in normal
operation to make it easier for user to debug what is happening.

**Link to tracking Issue:** #30750

**Testing:** Confirmed that logging appears when run as part of the full
collector build.

**Documentation:** N/A
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
**Description:** Enhance the logging of the AWS S3 Receiver in normal
operation to make it easier for user to debug what is happening.

**Link to tracking Issue:** open-telemetry#30750

**Testing:** Confirmed that logging appears when run as part of the full
collector build.

**Documentation:** N/A
Copy link
Contributor

github-actions bot commented Oct 9, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Oct 9, 2024
@mx-psi mx-psi closed this as completed in 635864b Oct 29, 2024
zzhlogin pushed a commit to zzhlogin/opentelemetry-collector-contrib-aws that referenced this issue Nov 12, 2024
…telcontribcol (open-telemetry#35809)

#### Description
Mark awss3receiver as alpha and enable it in the otelcontribcol

#### Link to tracking issue
Closes open-telemetry#30750
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this issue Dec 17, 2024
…telcontribcol (open-telemetry#35809)

#### Description
Mark awss3receiver as alpha and enable it in the otelcontribcol

#### Link to tracking issue
Closes open-telemetry#30750
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored Stale
Projects
None yet
8 participants