Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: AWS ApplicationSignals Processor #32808

Closed
2 tasks done
mxiamxia opened this issue May 1, 2024 · 24 comments
Closed
2 tasks done

New component: AWS ApplicationSignals Processor #32808

mxiamxia opened this issue May 1, 2024 · 24 comments
Labels

Comments

@mxiamxia
Copy link
Member

mxiamxia commented May 1, 2024

The purpose and use-cases of the new component

Amazon CloudWatch ApplicationSignals utilizes the OTel Auto-instrumentation SDKs to automatically instrument applications running on AWS, and generates the custom application metrics, traces and log to monitor the application health and track long-term application performance. Currently, the generated telemetry data are processed by CloudWatch Agent before being sent to the AWS backend.

This proposal is to contribute ApplicationSignals components in CloudWatch Agent to OpenTelemetry Collector community.

The main functionalities

  1. High Cardinality Metrics Protection which helps users to cap the total number of unique metrics for their services before sending it to the destination.
  2. AWS Platform related telemetry attributes enrichment

Example configuration for the component

awsappsignals:
        limiter:
            disabled: false
            drop_threshold: 5000
            log_dropped_metrics: true
            rotation_interval: 10m0s
        resolvers:
            - name: app-signals
              platform: eks
        rules: []

Telemetry data types supported

  1. Traces
  2. Metrics
  3. Logs

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.

Code Owner(s)

mxiamxia@; bjrara@

Sponsor (optional)

Additional context

The AwsApplicationProcessor provides a data-processing pipeline which has 4 major components:

  1. AttributeResolver: automatically resolve the telemetry attributes with high cardinality values(Eg, Pod IPs in K8s/EKS) into static values. The AttributeResolver is an abstract interface which can be extended into different sub-resolvers. For example, we implemented K8sResovler to resolve the internal IP addresses/alias in EKS cluster with the corresponding service deployment name. we also plan to implement PublicIpResolver to resolve the public IP addresses into the static domain name.
  2. AttributeNormalizer: normalizes OTel attributes name based into Application Signals Metrics & Traces defined schema. For example, rename OTel metric attribute names(Eg, aws.remote.service to RemoteService)
  3. MetricLimiter: Cap the total number of unique metrics a service can send. A limit is imposed on metrics generated for each service(by service.name) so that any additional metrics beyond the threshold limit will be dropped, the existing metrics prior to the threshold will be kept sending.
  4. CustomReplacer: Reads the customer provided configuration rules to replace the attribute values based on the rules

image

No response

@mxiamxia mxiamxia added needs triage New item requiring triage Sponsor Needed New component seeking sponsor labels May 1, 2024
@codeboten
Copy link
Contributor

Added this component to the Collector SIG agenda as a vendor proposed component, the next maintainer on the rotating sponsor list should pick this up. cc @crobert-1

@mxiamxia
Copy link
Member Author

mxiamxia commented May 9, 2024

Thank you! Missed the collector SIG discussions for this week. Will join the discussions next week and provide more details about the proposal.

@jeromeinsf
Copy link

jeromeinsf commented May 10, 2024

High Cardinality Metrics Protection which helps users to cap the total number of unique metrics for their services before sending it to the destination.

Could we consider achieving this in a non vendor specific way?

@jeromeinsf
Copy link

jeromeinsf commented May 10, 2024

AWS Platform related telemetry attributes enrichment

Can we detail this?

@mxiamxia
Copy link
Member Author

@jeromeinsf
Copy link

Thx @mxiamxia
I think it would be beneficial to detail what is AWS specific in the 4 components of this processor, and why a combination of the existing processors cannot achieve the same results

@crobert-1
Copy link
Member

@bryan-aguilar We had discussed this in a couple collector sig meetings, are you able to sponsor this?

@mxiamxia
Copy link
Member Author

mxiamxia commented Jun 11, 2024

High Cardinality Metrics Protection which helps users to cap the total number of unique metrics for their services before sending it to the destination.

Could we consider achieving this in a non vendor specific way?

Hi @jeromeinsf , sorry for the late response. The MetricLimiter coming with this component primarily having 2 functions - 1) group the metrics on a list of specific metric attributes, then count and sort the occurrences of each grouped metrics using Count-Min Sketch(CMS). 2) take the actions to the metrics having less occurrences found in CMS when the cardinality threshold limit is met. Currently, these 2 piece of functions are implemented in a very specific way based on the customer experience designed for AppSignals. With extra efforts, I think it is possible to abstract some pieces into general purpose. With the current proposal, we probably want to defer this efforts and it requires a follow up with community for the further discussion on the abstractions.

@mxiamxia
Copy link
Member Author

mxiamxia commented Jun 11, 2024

Thx @mxiamxia
I think it would be beneficial to detail what is AWS specific in the 4 components of this processor, and why a combination of the existing processors cannot achieve the same results

Thx @mxiamxia I think it would be beneficial to detail what is AWS specific in the 4 components of this processor, and why a combination of the existing processors cannot achieve the same results

Regarding 4 components listed, AppSignals users can leverage a list of existing processors including attributesprocessor, spanprocessor, k8sprocessor and spanprocessor and then configure it in certain way for processing AppSignals data to fulfill partial of requirements for ApplicationSignals, and some new things introduced like EKS/K8s Pod IP resolver, MetricLimiter will still be needed by introducing this new component. So we build on inclusive vendor specific component to achieve all these requirement so users won't need to worry about any configuration things.

@djaglowski
Copy link
Member

This component is not for ADOT only. The reason to add this component into upstream, is so that existing consumers of any OTel collector (non-ADOT) will be able to use ApplicationSignals with their current OTel setup.

Generally, hosting a component upstream isn't necessary in order to allow others to pull it into any other OTel collector. As long as the component's go module is publicly accessible it should be possible. If this is the only reason then is it worth having the community take this on as an obligation?

The MetricLimiter coming with this component primarily having 2 functions - 1) group the metrics on a list of specific metric attributes, then count and sort the occurrences of each grouped metrics using Count-Min Sketch(CMS). 2) take the actions to the metrics having less occurrences found in CMS when the cardinality threshold limit is met. Currently, these 2 piece of functions are implemented in a very specific way based on the customer experience designed for AppSignals. With extra efforts, I think it is possible to abstract some pieces into general purpose. With the current proposal, we probably want to defer this efforts and it requires a follow up with community for the further discussion on the abstractions.
...
Regarding 4 components listed, AppSignals users can leverage a list of existing processors including attributesprocessor, spanprocessor, k8sprocessor and spanprocessor and then configure it in certain way for processing AppSignals data to fulfill partial of requirements for ApplicationSignals, and some new things introduced like EKS/K8s Pod IP resolver, MetricLimiter will still be needed by introducing this new component. So we build on inclusive vendor specific component to achieve all these requirement so users won't need to worry about any configuration things.

I'm not sold on the idea that there's anything vendor-specific here other than configuration settings. Is it fair to say that this proposal could be separated into two parts?

  1. Opinionated configuration for three existing components.
  2. A new MetricLimiter processor, which could be generic, but would be easier to implement with the same opinionated assumptions used in a reference implementation.

@mxiamxia
Copy link
Member Author

mxiamxia commented Jul 25, 2024

This component is not for ADOT only. The reason to add this component into upstream, is so that existing consumers of any OTel collector (non-ADOT) will be able to use ApplicationSignals with their current OTel setup.

Generally, hosting a component upstream isn't necessary in order to allow others to pull it into any other OTel collector. As long as the component's go module is publicly accessible it should be possible. If this is the only reason then is it worth having the community take this on as an obligation?

Thanks. Another reason is that we want these components to be more discoverable for Otel users. IMHO, I think OTel-contrib repo was designed for this purpose, allowing vendors to contribute their components. AWS has most of its Otel components in the contrib repos, so we would like to include this one there as well. :)

@mxiamxia
Copy link
Member Author

mxiamxia commented Jul 25, 2024

I'm not sold on the idea that there's anything vendor-specific here other than configuration settings. Is it fair to say that this proposal could be separated into two parts?
Opinionated configuration for three existing components.

We can't simply replace this component with the existing opinionated config because it involves very vendor-specific implementations. For example, we mutate telemetry attributes based on the detected AWS platform where the applications are running. We also plan to implement centralized telemetry data filter/replace/drop rules that can be retrieved from AWS, eliminating the need for customers to update their local config. These are part of reasons why we want to introduce this component.

Maybe go back your previous comment, for vendors introducing very business specific processors, is it common to contribute their processors to the contrib repo, or should vendors just host them in their own repo with public accessibility?

A new MetricLimiter processor, which could be generic, but would be easier to implement with the same opinionated assumptions used in a reference implementation.

Yes, we can make the metric limiter part generic for all Otel users. Initially, we were thinking of contributing it as is and then collaborating with the community to optimize it for general use. However, I am fine that we can hold off on MetricLimiter part for now and come up with a more general design later.

@djaglowski
Copy link
Member

It sounds like there may be a real need for such a component but I'm not going to take it on as an obligation. The concept of automatically accepting vendor-related components was never intended to apply to all vendor-specific use cases. It was intended to ensure no vendor is excluded at a basic level. We recently updated our guidelines to reflect this intention. Generally speaking, components should be accepted based on the capacity and judgement of the maintainers & approvers.

@mxiamxia
Copy link
Member Author

mxiamxia commented Jul 27, 2024

Hi Daniel, we are committed to maintaining this component and addressing any issues that may come up. We will also follow the guideline and ensure all listed criteria are met. Appreciate it if you could help us on reviewing our ongoing and upcoming PRs.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants