-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom Retention #3642
Custom Retention #3642
Conversation
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
…ark file. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some nits and suggestions. Overall the code looks good to me.
c.sweeper.Stop() | ||
wg.Done() | ||
}() | ||
c.sweeper.Start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some way to lock the tables when mark operation is running for a table to avoid running compaction on them simultaneously and hence avoid deleted index to re-appear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently I don't touch table that are not compacted.
} | ||
|
||
if len(objects) != 1 { | ||
// todo(1): in the future we would want to support more tables so that we can apply retention below 1d. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be hard to find a window in a large cluster where there is just one file. I think we should run mark phase immediately after compaction which also avoids conflicting updates between compaction and retention process since they run in their independent goroutines and we run compaction in our ops cluster every 5 mins which increases the chance of conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just food for thoughts this comment.
But I thought that may be in the future we want to apply this where compaction is not happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still suggest changing this now because of a couple of reasons:
- One less way to modify the index.
- We compact a table only when it has 4 or more files which means retention won't run if we stopped getting writes period and we
>1 count(files) <4
. - With a large cluster we run 30+ ingesters and a file could get uploaded as soon as compaction ran which means it would be hard to find a window where there is only one file in an active table.
We can just add a map of tableName->mutex and both compaction and retention code will have to acquire a lock to avoid a race. What do you think?
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
… first shot. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work Cyril! It looks good to me!
|
||
retentionWorkDir := filepath.Join(cfg.WorkingDirectory, "retention") | ||
|
||
sweeper, err := retention.NewSweeper(retentionWorkDir, retention.NewDeleteClient(objectClient), cfg.RetentionDeleteWorkCount, cfg.RetentionDeleteDelay, r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us add a comment that we assume chunks would always be stored in the same object store as the index files?
People who are migrating between stores might face an issue here.
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
As per our slack discussion, I'll create a follow up PR to trigger compaction if required. |
This PR adds custom retention. You can set a retention per stream or a global one for each user. The retention is currently applied by the compactor using the limits/overrides. This works is currently targeted only for boltdb shipper store.
The design doc: https://docs.google.com/document/d/1xyDNai5MZI2DWnoFm8vrSnS372fx0HohDiULbXkJosQ/edit?usp=sharing
is a bit obsolete but still gives a general idea.
The compactor has now 2 new components. A Marker and a Sweeper.
Marker
The marker goes through all tables one by one:
Only tables that are compacted are considered for now (simplicity).
Sweeper
The sweeper reads all marks available and takes only the one that have a min age (currently default to two hours) and then delete chunks. Once a chunk is delete the marks file is update. In case of failure the mark will be retried until it works. (404 are not failure).
The min age is there to allow for all Loki components to have time to download all new indexes so that we don't reference to a chunk that no longer exists.
The retention as seen by the user though should be pretty precise i.e time to sync + time to process all tables.
Done:
TODO:
Following works (other PR):