-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
promtail: fix high CPU usage on large kubernetes clusters. #1118
Conversation
Looks very good ! I don't think we can accept that without the upstream to be merged first as we don't want to fork Prometheus. So let's wait for that. In the meantime, I think we do have a vendor folder in Loki if you could update that with the related gomod changes you've made. This kinda helps us to see the blast radius of updates made. Again thank you for this ! |
@cyriltovena thanks for the feedback, I've updated the PR according to your comments. And the upstream Prometheus PR is still being reviewed. But I guess (and hope) that it should be merged soon. |
I wish we could have this move faster. |
This change looks really promising, so I just want to do a poke here and tell you that the upstream PR is merged now 😄 |
How's it going? |
This up for grab. I think the original author is busy. Let me know if you need help. |
Not clear why drone-ci complains on non-gofmted cc @cyriltovena |
Looking. |
Alright just merge master in and you should be good. |
Seems to be in sync with master, and no complains from linter. |
Codecov Report
@@ Coverage Diff @@
## master #1118 +/- ##
==========================================
- Coverage 62.45% 62.41% -0.05%
==========================================
Files 158 158
Lines 12761 12766 +5
==========================================
- Hits 7970 7968 -2
- Misses 4177 4183 +6
- Partials 614 615 +1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really happy about this. Thank you so much for all the effort !
LGTM
What this PR does / why we need it:
This PR is a set of multiple optimizations to make promtail usable on a large kubernetes clusters. The original problem is discussed here: #1102
The fix is doing three things:
spec.hostName=$HOSTNAME
field selector to kubernetes API, so in a large Kubernetes cluster there is no need to pull thousands of pod specs not belonging to a current node.After this fix applied, we observed a significant drop of promtail CPU usage from 100% to 3% without any spikes.
Current issues:
__host__
label in relabel config? As filtering happening before there processed labels are built, then it's possible that we may mess the target discovery.Which issue(s) this PR fixes:
Fixes #1102
Checklist