-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't fail writes due to full WAL disk #3136
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3136 +/- ##
==========================================
+ Coverage 63.05% 63.08% +0.02%
==========================================
Files 188 188
Lines 16210 16239 +29
==========================================
+ Hits 10221 10244 +23
- Misses 5049 5060 +11
+ Partials 940 935 -5
|
|
||
1) No space left on disk | ||
|
||
In the event the underlying WAL disk is full, Loki will not fail incoming writes, but neither will it log them to the WAL. In this case, the persistence guarantees across process restarts will not hold. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting thought, feel free to tell me this is too much scope.
If we know writing to the WAL is failing can we force a flush on shutdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great idea. Otherwise, we'd end up having to remove an ingester from traffic and wait for chunk_idle
to elapse before shutting it down.
@@ -161,8 +163,13 @@ func (i *instance) Push(ctx context.Context, req *logproto.PushRequest) error { | |||
|
|||
if !record.IsEmpty() { | |||
if err := i.wal.Log(record); err != nil { | |||
return err | |||
if e, ok := err.(*os.PathError); ok && e.Err == syscall.ENOSPC { | |||
i.metrics.walDiskFullFailures.Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could log that this was happening, but we don't want to spam the logs.
Thoughts on using a boolean kept in the instance to log something one time, and then perhaps clear the bool and log that the error is cleared if writes start succeeding again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
No description provided.