-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[eks] [bug]: getting alerts for v1.metrics.eks.amazonaws.com/default #2479
Comments
I have had the same alert since I upgrade eks from 1.30 to 1.31. btw I have installed the latest prometheus components |
we are facing the same issue, but with eks 1.29 |
Hello, we have the same alerts with EKS 1.28 |
Hi folks! This is a known issue and EKS is currently working on the fix. The fix has been effective for new clusters, which means new cluster should not see this error anymore during cluster upgrade (or instance refresh performed by EKS regularly, which is invisible to you). For existing clusters, the ECD to receive the fix is 01/2025. More details on the issue: EKS recently launched a new feature to fetch additional control plane metrics in Prometheus compatible format from Kubernetes controllers like kube-controller-manager and kube-scheduler. As part of this feature, EKS introduced a new APIService object
EKS clusters can occasionally see these error messages in kube-apiserver log related to the unavailability of the APIService
These log messages are false positive generated during an EKS Kubernetes control plane update. As the cluster update process creates new control plane instances, the API server regularly checks if this component is available. However, when this metrics server component is not ready, API server generates these log messages. There is no action that needs to be taken on your end, this should not have any functionality impact and we can safely ignore these error messages. If you notice any availability drop for requests scraping the control plane metrics, please don't hesitate to reach out to EKS support. Action Items EKS is taking to avoid confusion: More context on APIService availability check: In addition, all the kube-apiserver updates the availability of APIService to a Kubernetes object, which is shared among kube-apiserver. As a result, the APIService would be marked unavailable if any of kube-apiserver marks it unavailable. The functionality of the APIService is not impacted because the newly launched instances or instances being terminated are not put behind the cluster load balancer, so they won’t actually take any traffic from your requests. |
Community Note
Tell us about your request
What do you want us to build?
---> Getting consistent alert for v1.metrics.eks.amazonaws.com and alerts come in thousands and this creates a lot of confusion.
e.g - Kubernetes aggregated API v1.metrics.eks.amazonaws.com/default has reported errors. It has appeared unavailable 12.28k times averaged over the past 10m. This issue is surfacing randomly, like during upgrade or without any activity.
Which service(s) is this request for?
This could be Fargate, ECS, EKS, ECR
EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.
Are you currently working around this issue?
How are you currently solving this problem?
As this is not from application side, is there any way to suppress this alarm ?
Additional context
Anything else we should know?
This is creating alot of confusion if there is any issue at control plane or at customer application level and will it impact anything ?
Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)
The text was updated successfully, but these errors were encountered: