-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus errors in EKS with default configuration #401
Comments
@jvoravong can you please take a look? Looks like caused by the latest changes for the control plane. Maybe need to disable it |
@dmitryax Thanks for reporting this. I originally assumed that pod labels were unique enough or that the managed kubernetes clusters didn't allow the k8s_observer to pick up on private control plane pods.
It seems managed kubernetes clusters are trying to expose more control plane metrics in the long run which can cause more issues like this. The eks team is working on exposing proxy metrics (aws/containers-roadmap#657). Proposal:
Thoughts? |
I think we should keep Why can't we figure out pod label based on the distribution? |
We explicitly don't support EKS, GKE and AKS control plane metrics at this time. Our control plane integrations default to using a default discovery rule if the distribution is not openshift. The EKS proxy pods and GKE coredns pods happen to match our default discovery rules and are exposed enough for our k8s_observer to pick them up, but these pods don't actually support metrics (at this time). We could add discovery rules that will never match a pod specifically for the distributions that don't support control plane metrics. But I think this clutters the agent config file.
After:
We could also just not include the control plane receivers if using an unsupported distribution, that would probably be cleaner.
|
I would recommend just not setting up the control plane receivers for unsupported distributions as you suggested in the last snippet |
It says here that AKS is a supported distribution: https://github.com/signalfx/splunk-otel-collector-chart/blob/splunk-otel-collector-0.45.0/docs/advanced-configuration.md It says here that AKS is not a supported distribution: https://github.com/signalfx/splunk-otel-collector-chart/blob/splunk-otel-collector-0.45.0/helm-charts/splunk-otel-collector/templates/config/_otel-agent.tpl#L79 What's up? :) |
@lindhe This helm chart does support collecting many metrics from AKS, but it specifically does not support collecting metrics the AKS control plane which is what _otel-agent.tpl#L79 is referring to. Managed Kubernetes services such as AKS do not allow the user to access the control plane for metric collection. |
Alright, thanks for clarifying! Is the first documentation I linked to out-of-date then? |
Documentation was added for these changes, see advanced-configuration.md under the Control plane metrics section |
Hm... I'm sure it's just me missing the point here. But to me it looks like the documentation contradicts itself. Here it says: * Supported Distributions:
* kubernetes 1.22 (kops created)
* openshift v4.9
* Unsupported Distributions:
* aks
* eks
* eks/fargate
* gke
* gke/autopilot And here it says: Use the `distribution` parameter to provide information about underlying
Kubernetes deployment. This parameter allows the connector to automatically
scrape additional metadata. The supported options are:
- `aks` - Azure AKS
- `eks` - Amazon EKS
- `eks/fargate` - Amazon EKS with Fargate profiles
- `gke` - Google GKE / Standard mode
- `gke/autopilot` - Google GKE / Autopilot mode
- `openshift` - Red Hat OpenShift Is "the |
Recent version of the helm chart installed with default configuration in EKS throws the following errors:
k8s version: v1.21.5-eks-bc4871b
The text was updated successfully, but these errors were encountered: