-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receive: Allow remote write request limits to be defined per file and tenant #5565
Receive: Allow remote write request limits to be defined per file and tenant #5565
Conversation
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
…s-per-tenant-config-file Signed-off-by: GitHub <noreply@github.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
xref: #5527 (comment) |
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
…s-per-tenant-config-file
@yeya24 based on the conversation we had at #5527 (comment), after I tried to add CLI args to configure the default limits in this PR while keeping also the file my opinion on this has changed for the following reasons:
I don't think adding extra CLI args is worth the price in extra complexity given the features and simplicity that we get from |
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
The next sections explain what each configuration value means. | ||
|
||
```yaml mdox-exec="cat pkg/receive/testdata/limits_config/good_limits.yaml" | ||
write: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea from Community Hours: generalize on any label - not only scope to tenant labels.
This allows us to keep tenancy topic separate.
Perhaps this can help with other use cases 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it clear for potential reviewers, from what I understood in the community hours, this PR and be reviewed as merged as is. The feature is documented as experimental and thus we can introduce changes to the configuration file and behavior in case we see fit, especially after the planned discussion regarding tenancy in Thanos.
I found out during verification tests that the default values for limits are not being exported in the Receive's metrics. Will fix this. edit: Fixed! |
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
…s-per-tenant-config-file
…s-per-tenant-config-file
Would changing the limits require receivers to be redeployed? |
@fpetkovski yes. We might want to improve this with some mechanism to reload this configuration in a follow up PR, like the config reload endpoint other components have. Probably during reload time the limits would be shortly disabled to avoid synchronization issues under high load. WDYT? |
Yeah I think config reloading will be more than a nice to have for this feature. Large receiver deployments can take hours to rollout, so we should try to speed up configuration changes if we can. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for awesome work! 💪🏻
LGTM minus some comments!
@@ -194,6 +194,18 @@ func runReceive( | |||
return errors.Wrap(err, "parse relabel configuration") | |||
} | |||
|
|||
var limitsConfig *receive.RootLimitsConfig | |||
if conf.limitsConfig != nil { | |||
limitsContentYaml, err := conf.limitsConfig.Content() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @fpetkovski mentioned, having some /~/reload
endpoint like we have in Thanos Ruler, to reload such configuration would be amazing, as adding some tenant limits now, would mean redeploying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, leaving this for another PR to avoid this one getting too big.
pkg/receive/limiter.go
Outdated
type limiter struct { | ||
requestLimiter requestLimiter | ||
writeGate gate.Gate | ||
// activeSeriesLimiter *activeSeriesLimiter `yaml:"active_series"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these commented struct fields?
I can add an active series limit to this config in a follow-up PR, maybe a TODO
would be better here? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in e5c171d. 👍
) | ||
|
||
// RootLimitsConfig is the root configuration for limits. | ||
type RootLimitsConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just LimitsConfig
? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many things end up being called by almost the same name and it gets confusing over time. This is already a result of many iterations on the naming. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Copyright (c) The Thanos Authors. | ||
// Licensed under the Apache License 2.0. | ||
|
||
package receive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason for having separate limiter
and request_limiter
files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limiter is mean to hold a bunch of different "limiters" (i.e. limits per request, the concurrency gate for writing data, active series limiter, and possibly a rate limiter in the future), parse the configuration of limits, initialize every limiter to avoid too much clutter all over around that code that does handler creation and setup (which also helps with reusability/dont-repeat-yourself principle plus encapsulation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also have an eye in the future, where we would be able to partially reuse a bunch of the code that is in these files in the Query component. It already has a similar gate and I plan to add similar per-request limits to it.
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clean, however, it still is tenant aware, even though it could be just "per label". I assume the conclusion is to use per tenant limits at the end?
LGTM otherwise, great job! (:
I can refactor this to be purely label-aware and take a few more days (or weeks) to get it ready, tested, and documented while the feature doesn't land. Or land the feature as experimental (very likely to change) and then work on extending and changing it to be just label-aware afterwards. I thought this was our conclusion from the community meeting. Wasn't it, @bwplotka? |
I can even already fill up the code, example configuration file, and logs with warnings that the configuration structure will be changing soon to work on a per-label basis. |
A point of the discussion: being straightforward that this feature supports tenancy is a plus for people operating Thanos clusters. Changing it to be label aware is friendly to maintainers/contributors (keep tenancy logic away and "hidden") but not to users (tenancy logic is hidden even though Receive is tenant-aware). |
There might be other points to consider before committing to make this purely label aware solution too:
So I prefer to not rush directly into it. |
Maybe this can be merged now (once conflicts are resolved) while discussion is under heavy progress, as this feature is experimental and would likely go through some iterations to get prod-ready? 🙂 |
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
…nant-config-file Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not much progress on long term goals for tenancy specific, so merging, since those are hidden flags anyway.
Thanks!
… tenant (thanos-io#5565) * Allow per-tenant limits to be configured via file Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Refactor Receive's limiting logic Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix some methods that were in plural Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Improve metric description Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add a TODO for later Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Do some cleanup after moving limits to config file Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Isolate rest of limiting logic from the handler Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Small refactor to the request limiter Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Rename MergeWith -> OverlayWith Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update changelog Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update documentation Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add missing copyright notice to few files Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix test after change in config file tenants Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Retrigger CI because of bundled-Cortex failing test Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Expose default limits as metrics Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Retrigger CI Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Replace comment with a TODOs Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix changelog after bad merge Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Prakul Jain <prakul.jain@udaan.com>
… tenant (thanos-io#5565) * Allow per-tenant limits to be configured via file Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Refactor Receive's limiting logic Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix some methods that were in plural Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Improve metric description Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add a TODO for later Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Do some cleanup after moving limits to config file Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Isolate rest of limiting logic from the handler Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Small refactor to the request limiter Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Rename MergeWith -> OverlayWith Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update changelog Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update documentation Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add missing copyright notice to few files Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix test after change in config file tenants Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Retrigger CI because of bundled-Cortex failing test Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Expose default limits as metrics Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Retrigger CI Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Replace comment with a TODOs Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix changelog after bad merge Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Prakul Jain <prakul.jain@udaan.com>
… tenant (thanos-io#5565) * Allow per-tenant limits to be configured via file Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Refactor Receive's limiting logic Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix some methods that were in plural Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Improve metric description Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add a TODO for later Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Do some cleanup after moving limits to config file Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Isolate rest of limiting logic from the handler Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Small refactor to the request limiter Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Rename MergeWith -> OverlayWith Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update changelog Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update documentation Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add missing copyright notice to few files Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix test after change in config file tenants Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Retrigger CI because of bundled-Cortex failing test Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Expose default limits as metrics Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Retrigger CI Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Replace comment with a TODOs Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix changelog after bad merge Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Prakul Jain <prakul.jain@udaan.com>
… tenant (thanos-io#5565) * Allow per-tenant limits to be configured via file Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Refactor Receive's limiting logic Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix some methods that were in plural Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Improve metric description Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add a TODO for later Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Do some cleanup after moving limits to config file Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Isolate rest of limiting logic from the handler Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Small refactor to the request limiter Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Rename MergeWith -> OverlayWith Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update changelog Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update documentation Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add missing copyright notice to few files Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix test after change in config file tenants Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Retrigger CI because of bundled-Cortex failing test Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Expose default limits as metrics Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Retrigger CI Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Replace comment with a TODOs Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix changelog after bad merge Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Signed-off-by: GitHub <noreply@github.com>
Changes
This is part of the broader work outlined by #5404.
receive.limits-config-file
or pass the file content inline toreceive.limits-config
.Follow ups
Verification