-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smooth out spikes in rate of chunk flush ops #3191
Conversation
LGTM, this effectively evens out the pressure on the DB. |
a78b2f2
to
dd1f3d3
Compare
|
I think this interacts badly with flush on shutdown or via /flush handler. In these cases we want to flush as fast as possible. Perhaps the call to (Alternative – setting infinite rate for sweep with immediate flag wouldn't work, as rate would be recomputed on next call to |
As I first wrote it (wait before dequeuing), there were some surprising behaviours.
This cures the above scenario. I initially found it odd to think we would pull a series from the queue then wait with it, but I don't think this does any harm - we always re-examine the series before sending to the DB, so won't double-flush it.
Excellent idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left some non-blocking comments.
pkg/ingester/ingester_test.go
Outdated
@@ -424,7 +424,7 @@ func TestIngesterSpreadFlush(t *testing.T) { | |||
_, _ = pushTestSamples(t, ing, 4, 1, int(cfg.MaxChunkAge.Seconds()-1)*1000) | |||
|
|||
// wait beyond flush time so first set of samples should be sent to store | |||
time.Sleep(cfg.FlushCheckPeriod * 2) | |||
time.Sleep(cfg.FlushCheckPeriod * 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand why this was needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because CI was failing; not all chunks had been flushed by the time it checked. I could understand that this PR makes the flush take longer, though the queue should clear within one flush period. On my laptop the failure was intermittent; after several hours of probing I found that Go timers just aren't that reliable - I would see the 20ms timer fire much later, sometimes by 100ms (and if that happens the test still fails after this change). Perhaps related to golang/go#38860 though I can't see what would make any goroutines CPU-bound at the point we do this Sleep().
I could take that change back out and see how we get on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could take that change back out and see how we get on.
No need for me. Thanks for explanation.
Ingester chunk flushes run periodically, by default every minute. Add a rate-limiter so we spread out calls to the DB across the period, avoiding spikes of intense activity which can slow down other operations such as incoming Push() calls. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This means we can check if it's an immediate flush, and also has better behaviour when transitioning from a fast to a slow rate, or vice-versa. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
So when you have a short flush period it will go faster Also initialize the rate-limit to "Inf" or no limit, so we start out fast and slow down once we know what the queue is like. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
dcfb336
to
2f1812f
Compare
I have rebased to get rid of the merge conflict on CHANGELOG, fixed the 'DB' comment, and dropped the change to wait longer in the test. Let's see if it passes... |
Ingester chunk flushes run periodically, by default every minute.
Add a rate-limiter so we spread out calls to the DB across the period, avoiding spikes of intense activity which can slow down other operations such as incoming
Push()
calls.Fixes #3171
Checklist
CHANGELOG.md
updated