-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement lazy retrieval of series from object store. #5837
Conversation
9949925
to
424419a
Compare
32320fa
to
2257389
Compare
Some benchmarks with batched retrieval. Seems like for retrieving 1 series things are slower, but after a certain threshold (~100 series) things start to get better.
|
602ad6a
to
1db4803
Compare
12f2314
to
11c03fa
Compare
9f4f13a
to
184f695
Compare
This should now be ready for review. I've added the latest benchmark results to the description of the PR. I don't think the benchmark is realistic though, I feel it overestimates the performance of the current implementation, and underestimates the performance of lazy retrieval. The first reason for this is because the benchmark uses an in-memory dataset so chunk retrieval is instantaneous. The second one is because it buffers all received series, so their memory cannot be released when they are sent over a gRPC channel. I would expect performance in a realistic scenario to be better than what is predicted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, some initial thoughts. Thanks!
cmd/thanos/store.go
Outdated
@@ -129,6 +130,9 @@ func (sc *storeConfig) registerFlag(cmd extkingpin.FlagClause) { | |||
cmd.Flag("block-meta-fetch-concurrency", "Number of goroutines to use when fetching block metadata from object storage."). | |||
Default("32").IntVar(&sc.blockMetaFetchConcurrency) | |||
|
|||
cmd.Flag("debug.series-batch-size", "The batch size when fetching series from object storage."). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mention what happens if batch size is too small or too large?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I am not sure this is usable - if you have 2000 blocks you have 10000 * 2000 batch size kind of, and when store has one block it has 10000... Should we have global to request batch size? (otherwise name and help should change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The batch size is per block, and each block is retrieved concurrently through a separate goroutine. So the number of blocks should not really matter. We can also try to infer the batch size based on the number of total expanded postings.
pkg/store/bucket.go
Outdated
@@ -276,6 +282,11 @@ func newBucketStoreMetrics(reg prometheus.Registerer) *bucketStoreMetrics { | |||
Help: "Total number of empty postings when fetching block series.", | |||
}) | |||
|
|||
m.emptyStreamResponses = promauto.With(reg).NewCounter(prometheus.CounterOpts{ | |||
Name: "thanos_bucket_store_empty_stream_responses_total", | |||
Help: "Total number of empty responses received.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
received from were?
b.entries = b.entries[:0] | ||
b.batch = b.batch[:0] | ||
|
||
b.indexr.reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What reset is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It clears the loaded series map inside the reader. I don't know if it is necessary, that part of the codebase is still cryptic to me.
90d4f2d
to
a1bd064
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! Some small nits only
ed7d240
to
f4550c7
Compare
Have you tried this with "real" data? Is there any noticeable improvement? 🤔 |
@GiedriusS In a staging environment a 30d range query took ~20-30% less, so from 14s to 10s. I would like to double check memory usage this week in a prod-like environment to make sure there are no leaks or regressions. |
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2b51a90
to
8946647
Compare
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
pkg/store/bucket.go
Outdated
chunkFetchDuration prometheus.Histogram | ||
|
||
// Internal state. | ||
i int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use uint64
here to not accidentally overflow the counter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small nit. I tested this and it gave me a few percent improvements in query durations however this might be because remote object storage here has latency on the order of milliseconds and it is very fast i.e. for us the bottleneck is elsewhere.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
074ea77
to
25ed544
Compare
I noticed that expanding postings and reading the index is also a non-trivial amount of work, so maybe that is also a significant factor. |
Let's fix the conflict and I think this pr is good to go? |
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
7250101
to
57a3124
Compare
Conflicts should be fixed now. |
Thanks! |
defer span.Finish() | ||
|
||
shardMatcher := req.ShardInfo.Matcher(&s.buffers) | ||
defer shardMatcher.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we close the matcher properly in this pr? I didn't see where we close it after the refactoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The response set should close it here: https://github.com/thanos-io/thanos/blob/main/pkg/store/proxy_heap.go#L590
Are you seeing any issues with this change?
* Implement lazy retrieval of series from object store. The bucket store fetches series in a single blocking operation from object storage. This is likely not an ideal strategy when it comes to latency and resource usage. In addition, it causes the store to buffer everything in memory before starting to send results to queriers. This commit modifies the series retrieval to use the proxy response heap and take advantage of the k-way merge used in the proxy store. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Add batching Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Preload series in batches Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Emit proper stats Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Extract block series client Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Fix CI Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Address review comments Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Use emptyPostingsCount in lazyRespSet Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Reuse chunk metas Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Avoid overallocating for small responses Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Add metric for chunk fetch time Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Regroup imports Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Change counter to uint64 Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
The bucket store fetches series in a single blocking operation from object storage. This is likely not an ideal strategy when it comes to latency and resource usage. In addition, it causes the store to buffer everything in memory before starting to send results to queriers.
This commit modifies the series retrieval to use the proxy response heap and take advantage of the k-way merge used in the proxy store.
cc @GiedriusS @bwplotka
Latest benchmarks
Changes
Verification