feat: New bloom planning using chunk size TSDB stats #14547

salvacorts · 2024-10-21T09:07:15Z

What this PR does / why we need it:

This PR adds a new planning strategy to the bloom builder: split_by_chunk_size.
This strategy build tasks looking at the TSDB stats. We add a new configurable bloom_task_target_chunk_size where we configure the target chunks size of each task. We keep adding series to a task until the sum of data worth of chunks exceeds the target size.

Special notes for your reviewer:

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
Title matches the required conventional commits format, see here
- Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

salvacorts · 2024-10-22T07:13:33Z

docs/sources/shared/configuration.md

@@ -3775,6 +3775,10 @@ shard_streams:
 # CLI flag: -bloom-build.split-keyspace-by
 [bloom_split_series_keyspace_by: <int> | default = 256]

+# Experimental. Target chunk size in bytes for bloom tasks. Default is 100GB.
+# CLI flag: -bloom-build.target-chunk-size
+[bloom_task_target_chunk_size: <int> | default = 100GB]


Not sure what a good default number would be here. Maybe 100GB is too much. What about 20GB? Wdyt?

What's the impact of setting this too low/too high?

too low --> too many tasks
too big --> few huge tasks

But it wouldn't change the size of bloom blocks? If that's the case, overpartioning seems better than underpartitioning; 20GB seems fine and we can continue to adjust over time.

salvacorts · 2024-10-22T07:15:26Z

docs/sources/shared/configuration.md

@@ -3765,7 +3765,7 @@ shard_streams:
 [bloom_creation_enabled: <boolean> | default = false]

 # Experimental. Bloom planning strategy to use in bloom creation. Can be one of:
-# 'split_keyspace_by_factor'
+# 'split_keyspace_by_factor', 'split_by_chunk_size'


I'm not entirely happy with this name. Some other options I thought were:

split_by_tsdb_chunks_size_stats

This is probably my favorite

split_by_chunks_size_stats

split_by_series_chunks_size

split_by_series_size

Preferences? Ideas?

I like split_by_tsdb_chunks_size to make it clear which chunks we're talking about; split_by_tsdb_chunks_size_stats is fine too, though I'm not sure if the _stats bit adds any more information to make it more clear.

I would prefer split_by_series_size or split_by_stream_size

I would prefer split_by_series_size or split_by_stream_size

I would like to specifically have "chunks" somewhere in the name provided that in the future we may look at other stats besides the chunk sizing. But I may be overthinking it 😅

I'm going with split_by_series_chunks_size

rfratto

LGTM, but I don't feel really confident enough in my knowledge of the builder to know if I missed anything here. A review from @chaudum would also be helpful here

rfratto · 2024-10-22T12:57:02Z

pkg/bloombuild/planner/strategies/chunksize.go

+		series := sizedIter.At()
+		if series.Len() == 0 {
+			// This should never happen, but just in case.
+			level.Error(s.logger).Log("msg", "got empty series batch")


s/error/warn? I'm not sure if this is actionable by the operator

(Also, should we include extra identifying information about the series here for debugging in case this ever does hit?)

changed to a warn.

should we include extra identifying information about the series

We have no series at this level but the TSDB name. Added it to the log line.

chaudum · 2024-10-22T10:41:10Z

pkg/bloombuild/planner/strategies/chunksize.go

+	TSDB   tsdb.SingleTenantTSDBIdentifier
+	FP     model.Fingerprint
+	Chunks []index.ChunkMeta


fields can be package private

chaudum · 2024-10-22T13:58:59Z

docs/sources/shared/configuration.md

@@ -3765,7 +3765,7 @@ shard_streams:
 [bloom_creation_enabled: <boolean> | default = false]

 # Experimental. Bloom planning strategy to use in bloom creation. Can be one of:
-# 'split_keyspace_by_factor'
+# 'split_keyspace_by_factor', 'split_by_chunk_size'


I would prefer split_by_series_size or split_by_stream_size

pull-request-size bot added the size/L label Oct 21, 2024

salvacorts changed the title ~~New bloom planning using chunk size TSDB stats~~ (WIP) New bloom planning using chunk size TSDB stats Oct 21, 2024

pull-request-size bot added size/XL and removed size/L labels Oct 21, 2024

salvacorts force-pushed the salvacorts/tsdb-sats-bloom-planning branch from f10d775 to 8c564fe Compare October 21, 2024 13:16

pull-request-size bot added size/XXL and removed size/XL labels Oct 21, 2024

salvacorts added 3 commits October 21, 2024 15:27

New bloom planning chunk size strategy

4a95466

Extract common functionality to test utils pkg

5e08f42

Test for new strategy

fb7753a

salvacorts force-pushed the salvacorts/tsdb-sats-bloom-planning branch from 8c564fe to 20026e6 Compare October 21, 2024 13:35

pull-request-size bot added size/L and removed size/XXL labels Oct 21, 2024

fix after rebase

8360fdc

salvacorts force-pushed the salvacorts/tsdb-sats-bloom-planning branch from 20026e6 to 8360fdc Compare October 21, 2024 13:38

Working chunk size strategy

df4b5c9

pull-request-size bot added size/XL and removed size/L labels Oct 21, 2024

Use new strategy in planner and integration tests

451b572

salvacorts changed the title ~~(WIP) New bloom planning using chunk size TSDB stats~~ feat: New bloom planning using chunk size TSDB stats Oct 21, 2024

update docs

95aface

github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Oct 21, 2024

fix ci lint

8e6d9a7

salvacorts commented Oct 22, 2024

View reviewed changes

salvacorts marked this pull request as ready for review October 22, 2024 07:18

salvacorts requested a review from a team as a code owner October 22, 2024 07:18

rfratto approved these changes Oct 22, 2024

View reviewed changes

chaudum reviewed Oct 22, 2024

View reviewed changes

CR feedback and change strategy name

8d9e75d

salvacorts added 5 commits October 22, 2024 17:00

docs

f646b41

fix integration test

a75f7ad

Change default target size to 20GB

4fb17dd

docs

6b61326

Logs strategy name

b4d673d

chaudum approved these changes Oct 23, 2024

View reviewed changes

salvacorts merged commit 673ede1 into main Oct 23, 2024
61 checks passed

salvacorts deleted the salvacorts/tsdb-sats-bloom-planning branch October 23, 2024 10:52

loki-gh-app bot mentioned this pull request Nov 14, 2024

chore(k227): release 3.3.0 #14750

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: New bloom planning using chunk size TSDB stats #14547

feat: New bloom planning using chunk size TSDB stats #14547

salvacorts commented Oct 21, 2024 •

edited

Loading

salvacorts Oct 22, 2024

rfratto Oct 22, 2024

salvacorts Oct 22, 2024

rfratto Oct 22, 2024

salvacorts Oct 22, 2024

rfratto Oct 22, 2024

chaudum Oct 22, 2024

salvacorts Oct 22, 2024

salvacorts Oct 22, 2024

rfratto left a comment

rfratto Oct 22, 2024

salvacorts Oct 22, 2024 •

edited

Loading

chaudum Oct 22, 2024

salvacorts Oct 22, 2024

chaudum Oct 22, 2024

feat: New bloom planning using chunk size TSDB stats #14547

feat: New bloom planning using chunk size TSDB stats #14547

Conversation

salvacorts commented Oct 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rfratto left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salvacorts Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salvacorts commented Oct 21, 2024 •

edited

Loading

salvacorts Oct 22, 2024 •

edited

Loading