Don't fail writes due to full WAL disk #3136

owen-d · 2021-01-07T15:17:23Z

No description provided.

codecov-io · 2021-01-07T15:30:02Z

Codecov Report

Merging #3136 (c1cb2e7) into master (fcabfec) will increase coverage by 0.02%.
The diff coverage is 95.23%.

@@            Coverage Diff             @@
##           master    #3136      +/-   ##
==========================================
+ Coverage   63.05%   63.08%   +0.02%     
==========================================
  Files         188      188              
  Lines       16210    16239      +29     
==========================================
+ Hits        10221    10244      +23     
- Misses       5049     5060      +11     
+ Partials      940      935       -5

Impacted Files	Coverage Δ
pkg/ingester/instance.go	`61.59% <92.00%> (+3.16%)`	⬆️
pkg/ingester/ingester.go	`48.61% <100.00%> (+0.47%)`	⬆️
pkg/ingester/metrics.go	`100.00% <100.00%> (ø)`
pkg/promtail/positions/positions.go	`46.80% <0.00%> (-11.71%)`	⬇️
pkg/promtail/targets/file/filetarget.go	`66.43% <0.00%> (+2.09%)`	⬆️
pkg/querier/queryrange/limits.go	`95.83% <0.00%> (+4.16%)`	⬆️

slim-bean · 2021-01-07T15:55:48Z

docs/sources/operations/storage/wal.md

+
+1) No space left on disk
+
+In the event the underlying WAL disk is full, Loki will not fail incoming writes, but neither will it log them to the WAL. In this case, the persistence guarantees across process restarts will not hold.


Interesting thought, feel free to tell me this is too much scope.

If we know writing to the WAL is failing can we force a flush on shutdown?

I think this is a great idea. Otherwise, we'd end up having to remove an ingester from traffic and wait for chunk_idle to elapse before shutting it down.

slim-bean · 2021-01-07T16:06:56Z

pkg/ingester/instance.go

@@ -161,8 +163,13 @@ func (i *instance) Push(ctx context.Context, req *logproto.PushRequest) error {

 	if !record.IsEmpty() {
 		if err := i.wal.Log(record); err != nil {
-			return err
+			if e, ok := err.(*os.PathError); ok && e.Err == syscall.ENOSPC {
+				i.metrics.walDiskFullFailures.Inc()


It would be nice if we could log that this was happening, but we don't want to spam the logs.

Thoughts on using a boolean kept in the instance to log something one time, and then perhaps clear the bool and log that the error is cleared if writes start succeeding again?

slim-bean

LGTM!

dont fail writes on full wal disk

4ad6a57

pull-request-size bot added the size/M label Jan 7, 2021

owen-d requested review from slim-bean and cyriltovena January 7, 2021 15:17

slim-bean reviewed Jan 7, 2021

View reviewed changes

wal full failure will cause flush on shutdown

b4fbae6

pull-request-size bot added size/L and removed size/M labels Jan 7, 2021

logs the first full WAL failure

c1cb2e7

slim-bean approved these changes Jan 7, 2021

View reviewed changes

owen-d merged commit ed649ee into grafana:master Jan 7, 2021

dannykopping mentioned this pull request Mar 30, 2022

Allow to limit the "in-memory" chunks for Ingester #5721

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't fail writes due to full WAL disk #3136

Don't fail writes due to full WAL disk #3136

owen-d commented Jan 7, 2021

codecov-io commented Jan 7, 2021 •

edited

Loading

slim-bean Jan 7, 2021 •

edited

Loading

owen-d Jan 7, 2021

slim-bean Jan 7, 2021

slim-bean left a comment


		1) No space left on disk

		In the event the underlying WAL disk is full, Loki will not fail incoming writes, but neither will it log them to the WAL. In this case, the persistence guarantees across process restarts will not hold.

Don't fail writes due to full WAL disk #3136

Don't fail writes due to full WAL disk #3136

Conversation

owen-d commented Jan 7, 2021

codecov-io commented Jan 7, 2021 • edited Loading

Codecov Report

slim-bean Jan 7, 2021 • edited Loading

Choose a reason for hiding this comment

owen-d Jan 7, 2021

Choose a reason for hiding this comment

slim-bean Jan 7, 2021

Choose a reason for hiding this comment

slim-bean left a comment

Choose a reason for hiding this comment

codecov-io commented Jan 7, 2021 •

edited

Loading

slim-bean Jan 7, 2021 •

edited

Loading