Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store: Add Time & duration based partitioning #1408

Merged
merged 4 commits into from
Aug 28, 2019
Merged

Conversation

povilasv
Copy link
Member

@povilasv povilasv commented Aug 13, 2019

Changes

New start of #1077

Verification

Multiple tests added.

Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
@povilasv povilasv marked this pull request as ready for review August 27, 2019 10:27
@bwplotka bwplotka self-requested a review August 27, 2019 16:26
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks good generally.

Some naming suggestions, plus one high level question:

Also even with time slicing looks like we still will download all meta.json files to disk. Is that what we want? Essentially time slicing will not include meta.json files on disk.

cmd/thanos/store.go Outdated Show resolved Hide resolved
@@ -309,6 +319,17 @@ func (s *BucketStore) SyncBlocks(ctx context.Context) error {
if err != nil {
return nil
}

inRange, err := s.isBlockInMinMaxRange(ctx, id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem here is that we hit the issue of partial blocks... Something that #1394 touches.

I thought we actually handle this in store gw properly (blocks without meta are excluded), seems not so not a regression.

In any case we will just warn so 👍

func (s *BucketStore) isBlockInMinMaxRange(ctx context.Context, id ulid.ULID) (bool, error) {
dir := filepath.Join(s.dir, id.String())

b := &bucketBlock{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocating the whole block for quick check is not great. I think we should just extract loadMeta method to separate function and user.

Also at some point we need some meta syncer/ matcher that deals with all of this, something similar to #1394 ):

Also even with time slicing looks like we will download all meta.json files to disk. Is that what we want? Essentially time slices will not include meta.json files on disk.

Copy link
Member Author

@povilasv povilasv Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function extracted.

Good question, IMO those meta.json files are small in size, so why not? It will speed up our next meta.json lookup.

pkg/store/bucket.go Outdated Show resolved Hide resolved
defer func() { testutil.Ok(t, os.RemoveAll(dir)) }()

series := []labels.Labels{
labels.FromStrings("a", "1", "b", "1"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse prepareStoreWithTestBlocks? Series looks exactly the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I splitted the prepareTestBlocks from prepareStoreWithTestBlocks so that we can easily configure Store and generate test blocks.

I tried to adjust prepareStoreWithTestBlocks, but I didn't like the newly added parameter for filtering and it would need quite a lot of changes due to changes in sync blocks https://github.com/thanos-io/thanos/blob/master/pkg/store/bucket_e2e_test.go#L155-L159

IMO that func is awesome for https://github.com/thanos-io/thanos/blob/master/pkg/store/bucket_e2e_test.go#L365, but doesn't really fit my use case, as I need custom checks in Store gw

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair

@@ -468,6 +514,10 @@ func (s *BucketStore) TimeRange() (mint, maxt int64) {
maxt = b.meta.MaxTime
}
}

mint = s.normalizeMinTime(mint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would be better to put "normalization" code right here, it would be more easy to read

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we do it in 2 places.. I think we just need to rename to something clear.

minTime := model.TimeOrDuration(cmd.Flag("min-time", "Start of time range limit to serve. Thanos Store serves only metrics, which happened later than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y.").
Default("0000-01-01T00:00:00Z"))

maxTime := model.TimeOrDuration(cmd.Flag("max-time", "End of time range limit to serve. Thanos Store serves only blocks, which happened eariler than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y.").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need the same logic for Prometheus sidecar.

My point is that we should avoid fetching duplicated data from different stores when it's possible to decrease merging time and memory usage.

  1. if Prom retention = 2h it doesn't mean that 2h is all the data that Prom will return all the time. Usually, you will see more than 2h in response (up to retention x 2). We can avoid data intersection just adding max-time to queries
  2. Second important thing: we would like to have alerts that use more than 2h of data. There are not a lot of such queries, but we have them. It's a good idea to run these queries on local Prom data, so we need to increase retention up to 7d. But increasing retention we're also increasing memory usage (query cache). Also, increasing retention we're increasing network traffic and finally Thanos Query will merge more data (and we're sure that it's the same data).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense to me.

Copy link
Contributor

@d-ulyanov d-ulyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guys, what do you think about implementing the same in the sidecar?

@bwplotka
Copy link
Member

Sure, but not for this PR (:

Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
@povilasv
Copy link
Member Author

Guys, what do you think about implementing the same in the sidecar?

For Sidecar I think we have a similar PR open #1267

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Minor questions only.

pkg/store/bucket.go Outdated Show resolved Hide resolved
defer func() { testutil.Ok(t, os.RemoveAll(dir)) }()

series := []labels.Labels{
labels.FromStrings("a", "1", "b", "1"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair

@@ -49,7 +50,18 @@ func registerStore(m map[string]setupFunc, app *kingpin.Application, name string
blockSyncConcurrency := cmd.Flag("block-sync-concurrency", "Number of goroutines to use when syncing blocks from object storage.").
Default("20").Int()

minTime := model.TimeOrDuration(cmd.Flag("min-time", "Start of time range limit to serve. Thanos Store serves only metrics, which happened later than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y.").
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default("0000-01-01T00:00:00Z")) Is that lowest min time? What about -9999-12-31T23:59:59Z ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    t, err = time.Parse(time.RFC3339, "-9999-12-31T23:59:59Z")
    if err != nil {
        fmt.Println(err)
    }
    fmt.Println(t)

gives:

parsing time "-9999-12-31T23:59:59Z" as "2006-01-02T15:04:05Z07:00": cannot parse "-9999-12-31T23:59:59Z" as "2006"        

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would set some explicit default value here (for "" value), but we can do it in next PRs (:

Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @povilasv

@@ -49,7 +50,18 @@ func registerStore(m map[string]setupFunc, app *kingpin.Application, name string
blockSyncConcurrency := cmd.Flag("block-sync-concurrency", "Number of goroutines to use when syncing blocks from object storage.").
Default("20").Int()

minTime := model.TimeOrDuration(cmd.Flag("min-time", "Start of time range limit to serve. Thanos Store serves only metrics, which happened later than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y.").
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would set some explicit default value here (for "" value), but we can do it in next PRs (:

@bwplotka bwplotka merged commit 8a6a70f into master Aug 28, 2019
@bwplotka bwplotka deleted the time-partitioning2 branch August 28, 2019 09:22
wbh1 pushed a commit to wbh1/thanos that referenced this pull request Sep 17, 2019
* Store: Add Time & duration based partitioning

Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants