-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support AWS cloudwatch container insight metrics for EKS/ECS clusters #2307
Labels
Comments
kisieland
referenced
this issue
in kisieland/opentelemetry-collector-contrib
Mar 16, 2021
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
This was referenced Mar 22, 2021
Some notes from the meeting earlier today:
|
We have hosted the relevant code here: https://github.com/aws-observability/aws-otel-collector/tree/container-insight-backup/internal temporarily and will work to migrate the code to |
This was referenced Apr 22, 2021
tigrannajaryan
pushed a commit
that referenced
this issue
Apr 30, 2021
Add constants and utils functions for aws container insights * define constants for all metrics * define units for the metrics * add utils functions to convert metrics to OpenTelemetry metrics This PR is a part of our efforts to migrate the [code for aws container insights](https://github.com/aws-observability/aws-otel-collector/tree/container-insight-backup/internal) to upstream. More PRs will come along the way. **Link to tracking Issue:** #2307 **Testing:** Unit tests
This was referenced May 11, 2021
This was referenced Jun 2, 2021
tigrannajaryan
pushed a commit
that referenced
this issue
Jun 8, 2021
Add `k8sapiserver` component to collect cluster-level metrics from k8s api server: * To guarantees that only one piece of cluster-level metric is generated per cluster, we utilize the leader election API provided by `kubernetes/client-go`. A dedicated configmap is used as the lock resource. Multiple receivers will try to acquire the lock and only one will succeed and generate cluster-level metrics. **Link to tracking Issue:** #2307
This was referenced Jun 11, 2021
This was referenced Jun 22, 2021
@pxaws is this completed? |
It is completed. Let's close it. |
6 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Background
Cloudwatch container insight is a AWS monitoring solution for EKS and ECS clusters. It can collect, aggregate, and summarize metrics and logs from containerized applications and microservices. Currently the metrics and logs are collected by Cloudwatch agent running as a daemon set. We want to migrate to use opentelemetry collector instead. and achieve the same feature parity.
Issues:
Container insights generate a list of metrics for both EKS and ECS clusters: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-metrics.html. With openTelemetry collector, we want to generate the same set of metrics with same dimensions. So we examined the metrics provided by the existing receivers:
kubeletstatsreceiver
andk8sclusterreceiver
in open telemetry and found that they don't satisfy our need for container insight.The major issue is that some metrics required by container insight are not available in existing receivers. Cloudwatch agent embed cadvisor inside and can use it to collect a rich set of metrics. The
kubeletstatsreceiver
receiver currently get metrics from kubeletstats
endpoint rather than from the kubelet cadvisor endpoint/metrics/cadvisor
. This leads to some missing metrics as required by container insight (e.g.node_cpu_usage_total
,pod_network_rx_bytes
,pod_network_tx_bytes
, .... ). Even ifkubeletstatsreceiver
begin to support the kubelet cadvisor endpoint, this will not cover our use case. As far as I know, the metrics that kubelet generated for cadvisor endpoint is taken from the cadvisor prometheus collector (See https://github.com/kubernetes/kubernetes/blob/release-1.18/pkg/kubelet/server/server.go#L334-L345 and https://github.com/kubernetes/kubernetes/blob/release-1.18/vendor/github.com/google/cadvisor/metrics/prometheus.go#L151-L1084). Unfortunately container insight uses more metrics than those defined in cadvisor prometheus collector (for example,container_memory_hierarchical_pgfault
,container_memory_hierarchical_pgmajfault
,node_diskio_io_service_bytes_read
, ... ).A second issue is about ECS. ECS clusters don't provide an endpoint like kubelet api endpoint. So the
kubeletstatsreceiver
receiver won't work. Since we want to continue to support existing container insight users for ECS clusters, we have to develop our own receiver based on cadvisor.An additional concern with the existing receivers like
kubeletstatsreceiver
is that they all rely on kubelet endpoints which could limit extensibility of our container insight support. What if we want to use a metrics that are not provided by kubelet?Proposal
So we (AWS Cloudwatch agent team) want to develop our own receiver
awscontainerinsightreceiver
by embedding cadvisor lib inside (like what we did for Cloudwatch agent) and contribute to open telemetry project so that existing container insight users can smoothly migrate. This receiver needs to be deployed as a daemon set and each receiver instance is responsible to collect relevant metrics for a node. We might also need to develop a processor to decorate existing metrics and do some computation to generate new metrics (if those logic are not suitable to put into thek8sprocessor
)Please comment if you have any suggestions. Thank you!
The text was updated successfully, but these errors were encountered: