-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruler: per-tenant WAL #4344
Ruler: per-tenant WAL #4344
Conversation
e161d53
to
104b810
Compare
@@ -154,7 +173,6 @@ func (l *Limits) RegisterFlags(f *flag.FlagSet) { | |||
|
|||
f.IntVar(&l.RulerMaxRulesPerRuleGroup, "ruler.max-rules-per-rule-group", 0, "Maximum number of rules per rule group per-tenant. 0 to disable.") | |||
f.IntVar(&l.RulerMaxRuleGroupsPerTenant, "ruler.max-rule-groups-per-tenant", 0, "Maximum number of rule groups per-tenant. 0 to disable.") | |||
f.IntVar(&l.RulerRemoteWriteQueueCapacity, "ruler.remote-write.queue-capacity", 10000, "Capacity of remote-write queues; if a queue exceeds its capacity it will evict oldest samples.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: add notice in changelog about this flag being removed
6652e52
to
8eae867
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
8eae867
to
f5d17b1
Compare
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Setting WAL dir name as tenant ID Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Adding appender_ready metric Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
…rule evaluation until WAL is ready Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
…p was removed in Prometheus Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
No reason to keep a per-tenant WAL without remote-write Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
We need to prevent a package import cycle Also add Clone functions for deep copy Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Clean up WAL dir after test Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
…t be manipulated Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
f5d17b1
to
363fd7f
Compare
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
8dadf6c
to
f48dd3b
Compare
It will only clean abandoned WALs on startup, for now Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, but LGTM
|
||
return ruler.ManagerFactory(func( | ||
var registry storageRegistry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love having this globally defined, perhaps we can alter it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree, but I don't see an easy way to do so.
I'll try address this in a follow-up PR.
|
||
mgr := rules.NewManager(&rules.ManagerOptions{ | ||
Appendable: newAppendable(cfg, overrides, logger, userID, rwMetrics), | ||
Appendable: registry, | ||
Queryable: memStore, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're still using the memStore
here as queryable, which is fine, but we are now able to read from Prometheus to determine this with less hacks and remove the memstore implementation altogether. This can be in a later PR though. I opened #4362
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
What this PR does / why we need it:
Adds durability and isolation to the samples generated by Loki recording rules.
Each tenant will have their own Write-Ahead Log (WAL). Samples appended to the WAL will be sent to the configured remote-write endpoint separately for each tenant to prevent noisy neighbours.
Almost all remote-write configurations are configurable on a per-tenant basis, with each tenant defaulting to the global remote-write config. Remote-write can be reconfigured at runtime, too.
Which issue(s) this PR fixes:
Fixes #4241
Special notes for your reviewer:
We reuse a lot of Prometheus' WAL and remote-write functionality, as well as some code from the Grafana Agent. The Agent's WAL handling does away with the TSDB head appender; by skipping the head appender, we only need to keep a WAL per tenant, and not an additional tsdb.
The code from the Agent is heavily modified for our purposes.
Most of the changes I made in
pkg/ruler/storage
(copied from grafana/agent in #4354) was around making metrics injectable, removing features we didn't need and minor refactoring.Checklist