Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rate limiting for scrape config updates #2189

Merged
merged 2 commits into from
Oct 18, 2023

Conversation

swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Oct 3, 2023

Limit the rate on updates to scrape configs. The idea and code are very similar to what Prometheus' Service Discovery manager does: https://github.com/prometheus/prometheus/blob/79be1b835789d7c3fde2a907003a8799c308733f/discovery/manager.go#L341. The end result is that we emit events about changes to Prometheus CRs at most every 5 seconds.

This should help with #1544

@swiatekm swiatekm force-pushed the feat/ta/update-rate-limit branch 4 times, most recently from 6a998ce to fce41e0 Compare October 7, 2023 13:40
@swiatekm swiatekm marked this pull request as ready for review October 7, 2023 14:25
@swiatekm swiatekm requested review from a team October 7, 2023 14:25
@jaronoff97
Copy link
Contributor

@swiatekm-sumo were you able to test this in a cluster? I'd love to see some metrics about how this change affected usage.

@swiatekm
Copy link
Contributor Author

@swiatekm-sumo were you able to test this in a cluster? I'd love to see some metrics about how this change affected usage.

I did a basic smoke test, but I didn't try any synthetic stress test where I'd constantly update a bunch of ServiceMonitors and look at TA CPU usage. I can do that and post some numbers if you're interested.

@jaronoff97
Copy link
Contributor

yeah i'd appreciate that if you don't mind :)

@swiatekm swiatekm force-pushed the feat/ta/update-rate-limit branch from fce41e0 to 173ad39 Compare October 15, 2023 11:37
@swiatekm
Copy link
Contributor Author

@jaronoff97 Did a very simple benchmark where I updated the labels on a particular ServiceMonitor as fast as I could via kubectl. Pre-change this used 400m worth of CPU, post change, less than 2m. See attached Prometheus graph:
Screenshot_20231015_133847

@swiatekm swiatekm requested a review from frzifus October 17, 2023 10:35
swiatekm and others added 2 commits October 17, 2023 12:35
@swiatekm swiatekm force-pushed the feat/ta/update-rate-limit branch from 82dd785 to ee8b2cb Compare October 17, 2023 10:35
@jaronoff97 jaronoff97 merged commit 19f05f2 into open-telemetry:main Oct 18, 2023
24 checks passed
@swiatekm swiatekm deleted the feat/ta/update-rate-limit branch October 18, 2023 09:20
ItielOlenick pushed a commit to ItielOlenick/opentelemetry-operator that referenced this pull request May 1, 2024
* Add rate limiting for scrape config updates

* Rename constant to lowercase

Co-authored-by: Ben B. <bongartz@klimlive.de>

---------

Co-authored-by: Ben B. <bongartz@klimlive.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants