Implement lazy retrieval of series from object store. #5837

fpetkovski · 2022-10-28T11:16:11Z

The bucket store fetches series in a single blocking operation from object storage. This is likely not an ideal strategy when it comes to latency and resource usage. In addition, it causes the store to buffer everything in memory before starting to send results to queriers.

This commit modifies the series retrieval to use the proxy response heap and take advantage of the k-way merge used in the proxy store.

cc @GiedriusS @bwplotka

Latest benchmarks

name                                                          old time/op    new time/op    delta
BucketSeries/1000000SeriesWith1Samples/1of1000000-8             57.4ms ±14%    38.1ms ± 1%  -33.58%  (p=0.029 n=4+4)
BucketSeries/1000000SeriesWith1Samples/10of1000000-8            53.7ms ± 1%    38.2ms ± 1%  -28.85%  (p=0.029 n=4+4)
BucketSeries/1000000SeriesWith1Samples/1000000of1000000-8        521ms ± 3%     431ms ± 1%  -17.17%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/1of10000000-8           3.80ms ± 1%    4.17ms ± 1%   +9.85%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/100of10000000-8         3.79ms ± 0%    4.19ms ± 3%  +10.66%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/10000000of10000000-8    49.4ms ± 1%    44.3ms ± 1%  -10.45%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/1of10000000-8            135µs ± 1%     137µs ± 0%   +1.71%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/100of10000000-8          134µs ± 1%     137µs ± 1%   +2.05%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/10000000of10000000-8    12.1ms ± 0%    12.8ms ± 1%   +5.39%  (p=0.029 n=4+4)

name                                                          old alloc/op   new alloc/op   delta
BucketSeries/1000000SeriesWith1Samples/1of1000000-8             60.9MB ± 0%    54.3MB ± 0%  -10.82%  (p=0.029 n=4+4)
BucketSeries/1000000SeriesWith1Samples/10of1000000-8            60.9MB ± 0%    54.3MB ± 0%  -10.82%  (p=0.029 n=4+4)
BucketSeries/1000000SeriesWith1Samples/1000000of1000000-8       1.29GB ± 0%    0.80GB ± 0%  -38.11%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/1of10000000-8           4.58MB ± 0%    6.06MB ± 0%  +32.40%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/100of10000000-8         4.58MB ± 0%    6.06MB ± 0%  +32.39%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/10000000of10000000-8     120MB ± 3%      86MB ± 1%  -28.19%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/1of10000000-8            200kB ± 0%     202kB ± 0%   +0.81%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/100of10000000-8          200kB ± 0%     202kB ± 0%   +0.87%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/10000000of10000000-8    46.8MB ± 0%    46.8MB ± 0%     ~     (p=0.057 n=4+4)

name                                                          old allocs/op  new allocs/op  delta
BucketSeries/1000000SeriesWith1Samples/1of1000000-8              9.61k ± 0%     4.65k ± 0%  -51.61%  (p=0.029 n=4+4)
BucketSeries/1000000SeriesWith1Samples/10of1000000-8             9.71k ± 0%     4.76k ± 0%  -51.03%  (p=0.029 n=4+4)
BucketSeries/1000000SeriesWith1Samples/1000000of1000000-8        10.0M ± 0%     11.0M ± 0%   +9.79%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/1of10000000-8            1.08k ± 0%     0.71k ± 0%  -34.19%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/100of10000000-8          1.11k ± 0%     0.75k ± 0%  -32.88%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/10000000of10000000-8     1.00M ± 0%     1.10M ± 0%   +9.80%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/1of10000000-8              282 ± 0%       293 ± 0%   +3.90%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/100of10000000-8            282 ± 0%       293 ± 0%   +3.90%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/10000000of10000000-8      168k ± 0%      168k ± 0%   +0.03%  (p=0.029 n=4+4)

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

Verification

pkg/store/bucket.go

fpetkovski · 2022-10-29T06:08:57Z

Some benchmarks with batched retrieval. Seems like for retrieving 1 series things are slower, but after a certain threshold (~100 series) things start to get better.

name                                                          old time/op    new time/op    delta
BucketSeries/1000000SeriesWith1Samples/1of1000000-8             54.5ms ± 2%    54.5ms ± 2%     ~     (p=1.000 n=5+5)
BucketSeries/1000000SeriesWith1Samples/10of1000000-8            54.3ms ± 2%    54.6ms ± 2%     ~     (p=0.548 n=5+5)
BucketSeries/1000000SeriesWith1Samples/1000000of1000000-8        510ms ± 1%     431ms ± 2%  -15.45%  (p=0.008 n=5+5)
BucketSeries/100000SeriesWith100Samples/1of10000000-8           3.91ms ± 0%    3.86ms ± 1%   -1.28%  (p=0.008 n=5+5)
BucketSeries/100000SeriesWith100Samples/100of10000000-8         3.90ms ± 1%    3.85ms ± 1%   -1.28%  (p=0.032 n=5+5)
BucketSeries/100000SeriesWith100Samples/10000000of10000000-8    50.3ms ± 2%    41.6ms ± 1%  -17.29%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/1of10000000-8            134µs ± 1%     154µs ± 2%  +14.26%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/100of10000000-8          135µs ± 1%     151µs ± 1%  +11.27%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/10000000of10000000-8    12.2ms ± 1%    12.2ms ± 1%     ~     (p=0.421 n=5+5)

name                                                          old alloc/op   new alloc/op   delta
BucketSeries/1000000SeriesWith1Samples/1of1000000-8             60.9MB ± 0%    61.0MB ± 0%   +0.23%  (p=0.008 n=5+5)
BucketSeries/1000000SeriesWith1Samples/10of1000000-8            60.9MB ± 0%    61.0MB ± 0%   +0.24%  (p=0.008 n=5+5)
BucketSeries/1000000SeriesWith1Samples/1000000of1000000-8       1.29GB ± 0%    0.89GB ± 0%  -30.90%  (p=0.016 n=4+5)
BucketSeries/100000SeriesWith100Samples/1of10000000-8           4.58MB ± 0%    4.67MB ± 0%   +1.95%  (p=0.008 n=5+5)
BucketSeries/100000SeriesWith100Samples/100of10000000-8         4.58MB ± 0%    4.67MB ± 0%   +1.94%  (p=0.008 n=5+5)
BucketSeries/100000SeriesWith100Samples/10000000of10000000-8     120MB ± 2%      84MB ± 1%  -30.20%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/1of10000000-8            200kB ± 0%     277kB ± 0%  +38.16%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/100of10000000-8          200kB ± 0%     276kB ± 0%  +38.01%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/10000000of10000000-8    46.8MB ± 0%    47.1MB ± 0%   +0.62%  (p=0.008 n=5+5)

name                                                          old allocs/op  new allocs/op  delta
BucketSeries/1000000SeriesWith1Samples/1of1000000-8              9.63k ± 0%    11.36k ± 0%  +17.99%  (p=0.016 n=4+5)
BucketSeries/1000000SeriesWith1Samples/10of1000000-8             9.71k ± 0%    11.49k ± 0%  +18.35%  (p=0.008 n=5+5)
BucketSeries/1000000SeriesWith1Samples/1000000of1000000-8        10.0M ± 0%     12.1M ± 0%  +20.20%  (p=0.008 n=5+5)
BucketSeries/100000SeriesWith100Samples/1of10000000-8            1.08k ± 0%     1.25k ± 0%  +15.90%  (p=0.008 n=5+5)
BucketSeries/100000SeriesWith100Samples/100of10000000-8          1.11k ± 0%     1.29k ± 0%  +16.08%  (p=0.008 n=5+5)
BucketSeries/100000SeriesWith100Samples/10000000of10000000-8     1.00M ± 0%     1.21M ± 0%  +20.15%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/1of10000000-8              282 ± 0%       286 ± 0%   +1.42%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/100of10000000-8            282 ± 0%       286 ± 0%   +1.42%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/10000000of10000000-8      168k ± 0%      168k ± 0%   +0.02%  (p=0.008 n=5+5)

pkg/store/bucket.go

fpetkovski · 2022-10-31T10:19:21Z

This should now be ready for review. I've added the latest benchmark results to the description of the PR.

I don't think the benchmark is realistic though, I feel it overestimates the performance of the current implementation, and underestimates the performance of lazy retrieval. The first reason for this is because the benchmark uses an in-memory dataset so chunk retrieval is instantaneous. The second one is because it buffers all received series, so their memory cannot be released when they are sent over a gRPC channel. I would expect performance in a realistic scenario to be better than what is predicted.

bwplotka

Nice, some initial thoughts. Thanks!

.gitignore

bwplotka · 2022-10-31T17:03:57Z

cmd/thanos/store.go

@@ -129,6 +130,9 @@ func (sc *storeConfig) registerFlag(cmd extkingpin.FlagClause) {
 	cmd.Flag("block-meta-fetch-concurrency", "Number of goroutines to use when fetching block metadata from object storage.").
 		Default("32").IntVar(&sc.blockMetaFetchConcurrency)

+	cmd.Flag("debug.series-batch-size", "The batch size when fetching series from object storage.").


Can we mention what happens if batch size is too small or too large?

Also I am not sure this is usable - if you have 2000 blocks you have 10000 * 2000 batch size kind of, and when store has one block it has 10000... Should we have global to request batch size? (otherwise name and help should change)

The batch size is per block, and each block is retrieved concurrently through a separate goroutine. So the number of blocks should not really matter. We can also try to infer the batch size based on the number of total expanded postings.

cmd/thanos/store.go

bwplotka · 2022-10-31T17:25:12Z

pkg/store/bucket.go

@@ -276,6 +282,11 @@ func newBucketStoreMetrics(reg prometheus.Registerer) *bucketStoreMetrics {
 		Help: "Total number of empty postings when fetching block series.",
 	})

+	m.emptyStreamResponses = promauto.With(reg).NewCounter(prometheus.CounterOpts{
+		Name: "thanos_bucket_store_empty_stream_responses_total",
+		Help: "Total number of empty responses received.",


received from were?

pkg/store/bucket.go

bwplotka · 2022-10-31T17:59:18Z

pkg/store/bucket.go

+	b.entries = b.entries[:0]
+	b.batch = b.batch[:0]
+
+	b.indexr.reset()


What reset is doing?

It clears the loaded series map inside the reader. I don't know if it is necessary, that part of the codebase is still cryptic to me.

pkg/store/bucket.go

yeya24

This looks good! Some small nits only

cmd/thanos/store.go

pkg/store/bucket.go

GiedriusS · 2022-11-07T11:01:24Z

Have you tried this with "real" data? Is there any noticeable improvement? 🤔

fpetkovski · 2022-11-09T10:50:33Z

@GiedriusS In a staging environment a 30d range query took ~20-30% less, so from 14s to 10s. I would like to double check memory usage this week in a prod-like environment to make sure there are no leaks or regressions.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

pkg/store/bucket_test.go

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

GiedriusS · 2022-11-14T14:58:05Z

pkg/store/bucket.go

+	chunkFetchDuration prometheus.Histogram
+
+	// Internal state.
+	i               int


Maybe we can use uint64 here to not accidentally overflow the counter?

GiedriusS

One small nit. I tested this and it gave me a few percent improvements in query durations however this might be because remote object storage here has latency on the order of milliseconds and it is very fast i.e. for us the bottleneck is elsewhere.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski · 2022-11-14T15:36:35Z

I noticed that expanding postings and reading the index is also a non-trivial amount of work, so maybe that is also a significant factor.

yeya24 · 2022-11-26T18:49:09Z

Let's fix the conflict and I think this pr is good to go?

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>

fpetkovski · 2022-11-28T06:53:03Z

Conflicts should be fixed now.

yeya24 · 2022-11-28T07:00:22Z

Thanks!

yeya24 · 2022-12-11T23:04:11Z

pkg/store/bucket.go

-				defer span.Finish()
-
-				shardMatcher := req.ShardInfo.Matcher(&s.buffers)
-				defer shardMatcher.Close()


Did we close the matcher properly in this pr? I didn't see where we close it after the refactoring

The response set should close it here: https://github.com/thanos-io/thanos/blob/main/pkg/store/proxy_heap.go#L590

Are you seeing any issues with this change?

* Implement lazy retrieval of series from object store. The bucket store fetches series in a single blocking operation from object storage. This is likely not an ideal strategy when it comes to latency and resource usage. In addition, it causes the store to buffer everything in memory before starting to send results to queriers. This commit modifies the series retrieval to use the proxy response heap and take advantage of the k-way merge used in the proxy store. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Add batching Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Preload series in batches Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Emit proper stats Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Extract block series client Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Fix CI Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Address review comments Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Use emptyPostingsCount in lazyRespSet Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Reuse chunk metas Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Avoid overallocating for small responses Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Add metric for chunk fetch time Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Regroup imports Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Change counter to uint64 Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>

pull-request-size bot added the size/L label Oct 28, 2022

fpetkovski force-pushed the lazy-bucket-store branch from 9949925 to 424419a Compare October 28, 2022 11:16

fpetkovski commented Oct 28, 2022

View reviewed changes

pkg/store/bucket.go Outdated Show resolved Hide resolved

fpetkovski commented Oct 28, 2022

View reviewed changes

pkg/store/bucket.go Outdated Show resolved Hide resolved

fpetkovski force-pushed the lazy-bucket-store branch 2 times, most recently from 32320fa to 2257389 Compare October 28, 2022 11:26

fpetkovski force-pushed the lazy-bucket-store branch from 602ad6a to 1db4803 Compare October 31, 2022 05:54

pull-request-size bot added size/XL and removed size/L labels Oct 31, 2022

fpetkovski force-pushed the lazy-bucket-store branch 4 times, most recently from 12f2314 to 11c03fa Compare October 31, 2022 07:11

sonatype-lift bot reviewed Oct 31, 2022

View reviewed changes

pkg/store/bucket.go Outdated Show resolved Hide resolved

fpetkovski force-pushed the lazy-bucket-store branch 2 times, most recently from 9f4f13a to 184f695 Compare October 31, 2022 10:11

bwplotka reviewed Oct 31, 2022

View reviewed changes

fpetkovski force-pushed the lazy-bucket-store branch from 90d4f2d to a1bd064 Compare November 1, 2022 06:54

yeya24 previously approved these changes Nov 3, 2022

View reviewed changes

cmd/thanos/store.go Show resolved Hide resolved

pkg/store/bucket.go Outdated Show resolved Hide resolved

pkg/store/bucket.go Show resolved Hide resolved

fpetkovski dismissed yeya24’s stale review via 11a8aa1 November 4, 2022 08:13

fpetkovski force-pushed the lazy-bucket-store branch 6 times, most recently from ed7d240 to f4550c7 Compare November 6, 2022 12:45

fpetkovski added 8 commits November 12, 2022 08:31

Emit proper stats

f1e9538

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Extract block series client

e1327dc

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Fix CI

6a8b06b

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Address review comments

89d237f

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Use emptyPostingsCount in lazyRespSet

774016a

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Reuse chunk metas

30143bc

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Avoid overallocating for small responses

f2c4ecc

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Add metric for chunk fetch time

8946647

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski force-pushed the lazy-bucket-store branch from 2b51a90 to 8946647 Compare November 12, 2022 07:31

yeya24 reviewed Nov 12, 2022

View reviewed changes

pkg/store/bucket_test.go Outdated Show resolved Hide resolved

Regroup imports

cbc3d23

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

yeya24 previously approved these changes Nov 13, 2022

View reviewed changes

GiedriusS reviewed Nov 14, 2022

View reviewed changes

fpetkovski dismissed yeya24’s stale review via 074ea77 November 14, 2022 15:22

Change counter to uint64

25ed544

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski force-pushed the lazy-bucket-store branch from 074ea77 to 25ed544 Compare November 14, 2022 15:25

Merge branch 'main' into lazy-bucket-store

57a3124

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>

fpetkovski force-pushed the lazy-bucket-store branch from 7250101 to 57a3124 Compare November 27, 2022 06:52

yeya24 enabled auto-merge (squash) November 28, 2022 07:00

yeya24 approved these changes Nov 28, 2022

View reviewed changes

yeya24 merged commit 39fa005 into thanos-io:main Nov 28, 2022

yeya24 reviewed Dec 11, 2022

View reviewed changes

alanprot mentioned this pull request Jan 30, 2023

Failure to close BlockSeriesClient cause store-gateway deadlock #6085

Closed

douglascamata mentioned this pull request Aug 2, 2023

[Store gateway] Highly increased latency on "get_range" bucket operations on 0.30.2 #6540

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement lazy retrieval of series from object store. #5837

Implement lazy retrieval of series from object store. #5837

fpetkovski commented Oct 28, 2022 •

edited

Loading

fpetkovski commented Oct 29, 2022 •

edited

Loading

fpetkovski commented Oct 31, 2022 •

edited

Loading

bwplotka left a comment

bwplotka Oct 31, 2022

bwplotka Oct 31, 2022

fpetkovski Nov 1, 2022 •

edited

Loading

bwplotka Oct 31, 2022

bwplotka Oct 31, 2022

fpetkovski Nov 1, 2022

yeya24 left a comment

GiedriusS commented Nov 7, 2022

fpetkovski commented Nov 9, 2022 •

edited

Loading

GiedriusS Nov 14, 2022

GiedriusS left a comment

fpetkovski commented Nov 14, 2022

yeya24 commented Nov 26, 2022

fpetkovski commented Nov 28, 2022

yeya24 commented Nov 28, 2022

yeya24 Dec 11, 2022

fpetkovski Dec 12, 2022

Implement lazy retrieval of series from object store. #5837

Implement lazy retrieval of series from object store. #5837

Conversation

fpetkovski commented Oct 28, 2022 • edited Loading

Changes

Verification

fpetkovski commented Oct 29, 2022 • edited Loading

fpetkovski commented Oct 31, 2022 • edited Loading

bwplotka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpetkovski Nov 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yeya24 left a comment

Choose a reason for hiding this comment

GiedriusS commented Nov 7, 2022

fpetkovski commented Nov 9, 2022 • edited Loading

Choose a reason for hiding this comment

GiedriusS left a comment

Choose a reason for hiding this comment

fpetkovski commented Nov 14, 2022

yeya24 commented Nov 26, 2022

fpetkovski commented Nov 28, 2022

yeya24 commented Nov 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpetkovski commented Oct 28, 2022 •

edited

Loading

fpetkovski commented Oct 29, 2022 •

edited

Loading

fpetkovski commented Oct 31, 2022 •

edited

Loading

fpetkovski Nov 1, 2022 •

edited

Loading

fpetkovski commented Nov 9, 2022 •

edited

Loading