-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High-cardinality Prometheus metric ingress_controller_ssl_expire_time_seconds #2773
Comments
We have a similar issue with the |
Thanks; I did read through #2726 - I did not see anything to avoid reporting the same metric across all ingress controllers. I can confirm that there exists an installation with >50,000 |
I am sorry, I didn't address this in my comment. You are right, this is not fixed in #2726 because this is impossible in the current deployment model. Right now we have multiple components in the same k8s deployment: the k8s controller and nginx launched from go. The metric collector is located in the go binary, collecting information about ingress, nginx process, nginx status information and go runtime. This means when you scale the deployment to let's say 5 you will see the same metrics * 5 with one difference, the I plan to refactor the way the controller works splitting the go controller and nginx in different deployments. This provides several improvements:
Edit: all that said, you will see a drastic reduction of the metric count after #2726 |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
I'm experiencing a similar issue to this in my multi-tenant environment where each nginx-ingress is reporting many different I understand that |
If I understand the code correctly, you'd make a matching change to the labels passed to
|
@hairyhenderson you should probably file your request as a separate issue because it isn't really the same as my original point here. I just coded up your suggestion; PR coming RSN. |
Hi, we don't run the nginx ingress ourselves, but we do run a hosted Prometheus service, which brings us into contact.
We observe that a user with hundreds of sites and dozens of ingress controllers will have tens of thousands of
ingress_controller_ssl_expire_time_seconds
metrics, because all the ingress controllers report all the sites. Prometheus guidance is to avoid high-cardinality metrics.Reading the code, the intention of this metric is to alert humans that a certificate renewal is necessary, so duplicating the same information from every controller is unnecessary.
I see various related work ongoing in issues and PRs, so posted this to see what you think.
The text was updated successfully, but these errors were encountered: