-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EKS Metrics Server can't scrape pod/node metrics - Unauthorized 401 #963
Comments
I'm not familiar with EKS, however I remember similar issue that solved the problem by adding some EKS specific configuration. I was not able to find it now, will continue looking. |
@mokhirashakira have you patched your kube-proxy config so that metrics are bound to |
@stevehipwell I think that issue you linked is about kube-proxy which is unrelated to Metrics Server. |
@serathius it's been a long time since I worked on this so I might have miss remembered which metrics component was impacted by this. @mokhirashakira are you actually using Calico as your CNI? |
Sorry, no it's not Calico. It's the EKS CNI - https://docs.aws.amazon.com/eks/latest/userguide/pod-networking.html |
@mokhirashakira it looks like you're running v0.3.6 on an EKS v1.21 cluster? How are you installing MS? I know that the latest Helm chart works correctly on EKS v1.21. |
Not through Helm, it was installed using the following https://docs.aws.amazon.com/eks/latest/userguide/metrics-server.html |
@mokhirashakira have you tried re-running the apply step with an up to date manifest? The file linked to is dynamic and needs to be kept up to date, the MS version in your report is @serathius are the compatibilities on the README correct, does MS just use a well defined subset of client-go? Even if the binary is compatible with K8s v1.21 the manifests for v0.3.6 might not be? |
Yes, to my knowledge, however they are based on deprecation notices and we didn't do much testing.
Not sure what you mean by "well defined subset". As any binary we need to build with specific versions of dependencies. Before each release I try to make sure we pick up latest client-go version, however we cannot guarantee that it will be forever forward compatible.
True, one example is that both manifests and binary picks some specific api versions like |
Thanks @serathius, I think we can say that MS is pretty compatible forwards and backwards so the README compatibility matrix is correct to best effort.
I think there could also be other considerations in the manifests that mean that they might apply correctly to an EKS v1.21 cluster but not work or break when an older cluster is upgraded. |
Hi, @mokhirashakira, Does the solution in the known issues solve your problem? |
I experienced a similar problem on EKS v1.21: Cloudwatch kube-controller-manager showed lines like:
In this case it were new clusters where some things were different compared to clusters we already have running where everything works fine:
The terraform module by default gives (among others) the security group attached to nodes rules like this:
After adding a sg rule that matches the container port configured in metrics server when using the latest helm chart, everything worked.
This does align with the endpoints of the metrics-server service:
Still wrapping my head around if this makes sense, VPC CNI networking is not the easiest part of EKS. Update: Reading OP again which really mentions a 401, my problem obviously was a different one. Comparing the cluster rules in the OP, I notice a subtle difference between those, and the ones installed via helm chart on EKS 1.21:
|
@TBeijen your issue is/was separate and is specifically about the changes that were made in the v18 release of the EKS module dropping almost all SG rules. I'm not sure if the module docs have been updated but it's covered in a number of issues. As an aside, and I'm sure you're aware of this, when using the AWS VPC CNI you don't need to use host network for MS as long as your SGs are configured correctly. |
@stevehipwell Thanks for the confirmation. I was aware of the dropped sg rules, by reading the v18 docs so it is documented. I just failed to grasp the impact on Extension API servers straight away. This quite well summarizes how node sg affects to what extent EKS API can access pods (https://docs.aws.amazon.com/eks/latest/userguide/cni-custom-network.html) and clarifies how the added sg rule indeed fixes things:
Sorry for the noise and distracting of OPs 401 issue. |
Upgrading the metrics server to the latest version helped fix the issue. Thank you! |
What happened:
Metric server is not able to read metrics with the
error: metrics not available yet
kubectl top pod/nodes
return error: metrics not available yetWhat you expected to happen:
Metrics server to scrape all pods and nodes.
Anything else we need to know?:
Everything is in the details section.
Environment:
Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.): EKS
Container Network Setup (flannel, calico, etc.): EKS VPC CNI
Kubernetes version (use
kubectl version
):1.21.5-eks-bc4871b
Metrics Server manifest
spoiler for Metrics Server manifest:
spoiler for Metrics Server logs:
spolier for Status of Metrics API:
/kind bug
The text was updated successfully, but these errors were encountered: