-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store: Add Time & duration based partitioning #1408
Conversation
23d12ff
to
98156da
Compare
98156da
to
01389e8
Compare
01524da
to
9a8a2ae
Compare
Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
9a8a2ae
to
bed5771
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, looks good generally.
Some naming suggestions, plus one high level question:
Also even with time slicing looks like we still will download all meta.json files to disk. Is that what we want? Essentially time slicing will not include meta.json files on disk.
@@ -309,6 +319,17 @@ func (s *BucketStore) SyncBlocks(ctx context.Context) error { | |||
if err != nil { | |||
return nil | |||
} | |||
|
|||
inRange, err := s.isBlockInMinMaxRange(ctx, id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem here is that we hit the issue of partial blocks... Something that #1394 touches.
I thought we actually handle this in store gw properly (blocks without meta are excluded), seems not so not a regression.
In any case we will just warn so 👍
pkg/store/bucket.go
Outdated
func (s *BucketStore) isBlockInMinMaxRange(ctx context.Context, id ulid.ULID) (bool, error) { | ||
dir := filepath.Join(s.dir, id.String()) | ||
|
||
b := &bucketBlock{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocating the whole block for quick check is not great. I think we should just extract loadMeta
method to separate function and user.
Also at some point we need some meta syncer/ matcher that deals with all of this, something similar to #1394 ):
Also even with time slicing looks like we will download all meta.json files to disk. Is that what we want? Essentially time slices will not include meta.json files on disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function extracted.
Good question, IMO those meta.json files are small in size, so why not? It will speed up our next meta.json lookup.
defer func() { testutil.Ok(t, os.RemoveAll(dir)) }() | ||
|
||
series := []labels.Labels{ | ||
labels.FromStrings("a", "1", "b", "1"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse prepareStoreWithTestBlocks
? Series looks exactly the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I splitted the prepareTestBlocks
from prepareStoreWithTestBlocks
so that we can easily configure Store and generate test blocks.
I tried to adjust prepareStoreWithTestBlocks
, but I didn't like the newly added parameter for filtering and it would need quite a lot of changes due to changes in sync blocks https://github.com/thanos-io/thanos/blob/master/pkg/store/bucket_e2e_test.go#L155-L159
IMO that func is awesome for https://github.com/thanos-io/thanos/blob/master/pkg/store/bucket_e2e_test.go#L365, but doesn't really fit my use case, as I need custom checks in Store gw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair
pkg/store/bucket.go
Outdated
@@ -468,6 +514,10 @@ func (s *BucketStore) TimeRange() (mint, maxt int64) { | |||
maxt = b.meta.MaxTime | |||
} | |||
} | |||
|
|||
mint = s.normalizeMinTime(mint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would be better to put "normalization" code right here, it would be more easy to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we do it in 2 places.. I think we just need to rename to something clear.
minTime := model.TimeOrDuration(cmd.Flag("min-time", "Start of time range limit to serve. Thanos Store serves only metrics, which happened later than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y."). | ||
Default("0000-01-01T00:00:00Z")) | ||
|
||
maxTime := model.TimeOrDuration(cmd.Flag("max-time", "End of time range limit to serve. Thanos Store serves only blocks, which happened eariler than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y."). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need the same logic for Prometheus sidecar.
My point is that we should avoid fetching duplicated data from different stores when it's possible to decrease merging time and memory usage.
- if Prom retention = 2h it doesn't mean that 2h is all the data that Prom will return all the time. Usually, you will see more than 2h in response (up to retention x 2). We can avoid data intersection just adding max-time to queries
- Second important thing: we would like to have alerts that use more than 2h of data. There are not a lot of such queries, but we have them. It's a good idea to run these queries on local Prom data, so we need to increase retention up to 7d. But increasing retention we're also increasing memory usage (query cache). Also, increasing retention we're increasing network traffic and finally Thanos Query will merge more data (and we're sure that it's the same data).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guys, what do you think about implementing the same in the sidecar?
Sure, but not for this PR (: |
Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
For Sidecar I think we have a similar PR open #1267 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Minor questions only.
defer func() { testutil.Ok(t, os.RemoveAll(dir)) }() | ||
|
||
series := []labels.Labels{ | ||
labels.FromStrings("a", "1", "b", "1"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair
@@ -49,7 +50,18 @@ func registerStore(m map[string]setupFunc, app *kingpin.Application, name string | |||
blockSyncConcurrency := cmd.Flag("block-sync-concurrency", "Number of goroutines to use when syncing blocks from object storage."). | |||
Default("20").Int() | |||
|
|||
minTime := model.TimeOrDuration(cmd.Flag("min-time", "Start of time range limit to serve. Thanos Store serves only metrics, which happened later than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y."). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default("0000-01-01T00:00:00Z"))
Is that lowest min time? What about -9999-12-31T23:59:59Z
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t, err = time.Parse(time.RFC3339, "-9999-12-31T23:59:59Z")
if err != nil {
fmt.Println(err)
}
fmt.Println(t)
gives:
parsing time "-9999-12-31T23:59:59Z" as "2006-01-02T15:04:05Z07:00": cannot parse "-9999-12-31T23:59:59Z" as "2006"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would set some explicit default value here (for ""
value), but we can do it in next PRs (:
Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @povilasv
@@ -49,7 +50,18 @@ func registerStore(m map[string]setupFunc, app *kingpin.Application, name string | |||
blockSyncConcurrency := cmd.Flag("block-sync-concurrency", "Number of goroutines to use when syncing blocks from object storage."). | |||
Default("20").Int() | |||
|
|||
minTime := model.TimeOrDuration(cmd.Flag("min-time", "Start of time range limit to serve. Thanos Store serves only metrics, which happened later than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y."). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would set some explicit default value here (for ""
value), but we can do it in next PRs (:
* Store: Add Time & duration based partitioning Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
Changes
New start of #1077
Verification
Multiple tests added.