receive/multitsdb: add cuckoo filter on metric names #7787

niaurys · 2024-09-27T13:22:25Z

Add a cuckoo filter on metric names to reduce fan-out cost. In our case it reduces CPU usage by a lot

@GiedriusS idea 🙇‍♂️

This feature can be enabled with --enable-feature=metric-names-filter flag.

yeya24 · 2024-09-27T16:30:13Z

Hi, thanks for the PR. I am trying to understand the change better.
Since the cuckoo filter is added in Receiver and Receiver shards series by all labels hash, metrics should exist in all Receivers, right? As long as the metric doesn't have too few series associated.

If Receiver shards series by hash of the metric name, then I feel the filter could help because a metric should only exist in some receivers not all. Can you explain more of the usecase?

pkg/query/endpointset.go

anarcher · 2024-09-28T06:29:33Z

In my case, it seems that the fanout of thanos-query is affecting the CPU load of both thanos-query and thanos-receiver. (Interestingly, when I gave the same query pattern (e.g., many alert queries) to Cortex, it didn't cause a significant CPU load on cortex/ingester.)
Even without shard information, if we query (with redundancy) only the receivers that have the relevant series, wouldn't it at least reduce the burden on thanos-query? (Personally, I feel that it might also reduce the CPU load on the receiver :-)

MichaHoffmann · 2024-09-28T07:18:28Z

In my case, it seems that the fanout of thanos-query is affecting the CPU load of both thanos-query and thanos-receiver. (Interestingly, when I gave the same query pattern (e.g., many alert queries) to Cortex, it didn't cause a significant CPU load on cortex/ingester.) Even without shard information, if we query (with redundancy) only the receivers that have the relevant series, wouldn't it at least reduce the burden on thanos-query? (Personally, I feel that it might also reduce the CPU load on the receiver :-)

Because we shard on metric hash and not name label we would expect all receivers to have a series of the metric if it has sufficiently high cardinality right?

anarcher · 2024-09-28T07:52:43Z

Because we shard on metric hash and not name label we would expect all receivers to have a series of the metric if it has sufficiently high cardinality right?

You're right. However, as in this issue (#1611), there shouldn't be many empty responses, but in my case, there were quite a few. In my situation, it might be because I have many patterns that query multiple tenants. (This occurs especially in alert query patterns.)

yeya24 · 2024-09-28T18:04:39Z

Do you enable query tenancy?

Not sure I understand how multi tenant query play a role here. You meant some tenant has the metric while other tenant doesn't? Then it makes sense to have the metric name based filter to skip some tenants.

anarcher · 2024-09-29T03:22:07Z

Do you enable query tenancy?

Unfortunately, no.

Not sure I understand how multi tenant query play a role here. You meant some tenant has the metric while other tenant doesn't? Then it makes sense to have the metric name based filter to skip some tenants.

The tenants are created in a way that makes it a bit difficult to filter with extLset :(

I'm sorry this is a question somewhat different from the PR discussion, but what do you think about supporting regular expressions in extLset?

Thanks :-)

GiedriusS · 2024-09-30T08:49:43Z

The conversation here got a bit derailed 😄 indeed this PR solves the issue when queries don't have a tenant label matcher on them and then MultiTSDB is forced to spawn a goroutine for each TSDB, and call Select() inside of it. All of this is needed is because we don't know whether Select() will return anything before the first Next() call.

The consensus on Slack was that we will try this idea in MultiTSDB first and then we can apply to other places or generalize/extend it to encompass more label names, for instance.

pkg/filter/cuckoo.go

GiedriusS · 2024-09-30T09:33:44Z

pkg/store/proxy.go

@@ -594,6 +597,16 @@ func storeMatches(ctx context.Context, s Client, mint, maxt int64, matchers ...*
 	if !LabelSetsMatch(matchers, extLset...) {
 		return false, fmt.Sprintf("external labels %v does not match request label matchers: %v", extLset, matchers)
 	}
+
+	for _, m := range matchers {


Small suggestion: this will now add extra work all users of store/proxy.go. I suggest moving this whole loop into s.MatchesMetricName() instead.

So, it would be

func MatchesMetricName(matches []labels.Matchers) bool. The allow-all filter would just do no work at all and the cuckoo version would iterate through the given matchers.

niaurys · 2024-09-30T14:20:47Z

Hi, thanks for the PR. I am trying to understand the change better. Since the cuckoo filter is added in Receiver and Receiver shards series by all labels hash, metrics should exist in all Receivers, right? As long as the metric doesn't have too few series associated.

If Receiver shards series by hash of the metric name, then I feel the filter could help because a metric should only exist in some receivers not all. Can you explain more of the usecase?

Regarding use case: we have a ton of metrics in our organisation and some time ago we used one TSDB (multi Prometheus deployment with hashmod target distribution) to write all metrics to it. Now, we have migrated to Thanos Receive and decided to have instead more smaller TSDBs as in previous deployment Compactor usually will have problems with our very big metric blocks due to single TSDB. We decided to use Kubernetes namespace as tenant identifier except in one namespace with one app we divided even more as TSDB was too big without it.

Now we run 2 Receiver hashrings: one for all namespaces with 6 Receiver service and one for that namespace divided more with 32 Receive services respectively. All Receivers from both hashrings are used in query path (global instance), so if query is made we are waiting answer from all Receivers on all queries...

With cuckoo filter I would except to return early and save some resources in case some particular metric is not present. In our case probably quite high amount of metric are not located on that 32 services Receive hashring contains metrics from one app only.

yeya24 · 2024-09-30T17:02:53Z

@niaurys @GiedriusS Thanks for sharing the usecase. I think it makes sense for this usecase.

indeed this PR solves the issue when queries don't have a tenant label matcher on them and then MultiTSDB is forced to spawn a goroutine for each TSDB

This is not ideal. If all queries have associated tenant label then there is no such issue anymore. But understood that if we want a dynamic tenant label enforcer it is basically the same as this cuckoo filter.

Now, we have migrated to Thanos Receive and decided to have instead more smaller TSDBs as in previous deployment Compactor usually will have problems with our very big metric blocks due to single TSDB

Compaction and block vertical sharding is more a pain point to solve.

pkg/receive/multitsdb.go

fpetkovski · 2024-10-01T11:46:37Z

pkg/store/tsdb.go

+	storeFilter = filter.AllowAllStoreFilter{}
+	if metricNameFilterEnabled {
+		startFilterUpdate = true
+		storeFilter = filter.NewCuckooMetricNameStoreFilter(1000000) // about 1MB on 64bit machines.


we can extract this number into a constant like

const megabyte = 1024 *1024

There is also a units package which might have this constant already.

actually we set just capacity (which I now moved to constant), comment about memory comes from package itself: https://github.com/seiflotfy/cuckoofilter/blob/master/cuckoofilter.go#L18-L20. Now I have deleted comment at all :)

fpetkovski · 2024-10-01T11:47:23Z

pkg/store/tsdb.go

 		buffers: sync.Pool{New: func() interface{} {
 			b := make([]byte, 0, initialBufSize)
 			return &b
 		}},
 	}
+
+	if startFilterUpdate {
+		t := time.NewTicker(15 * time.Second)


Does this mean there could be a 15 seconds of additional delay when a new metric is written to receivers?

...yes, for now it seems sensible and simplest approach.

pkg/store/tsdb.go

pkg/receive/multitsdb.go

pkg/store/tsdb.go

Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>

fpetkovski

🚀 I really like the idea of starting with just in-memory filters.

pull-request-size bot added the size/L label Sep 27, 2024

niaurys marked this pull request as draft September 27, 2024 13:24

niaurys force-pushed the add_cuckoo_filter_on_metric_names branch from 5eb98d6 to 543ca2e Compare September 27, 2024 13:30

GiedriusS mentioned this pull request Sep 27, 2024

Reduce unnecessary StoreAPI series calls #1611

Open

anarcher reviewed Sep 28, 2024

View reviewed changes

pkg/query/endpointset.go Outdated Show resolved Hide resolved

niaurys force-pushed the add_cuckoo_filter_on_metric_names branch from 5608077 to 7398bc0 Compare September 30, 2024 09:15

GiedriusS reviewed Sep 30, 2024

View reviewed changes

niaurys marked this pull request as ready for review September 30, 2024 11:59

fpetkovski reviewed Oct 1, 2024

View reviewed changes

pkg/receive/multitsdb.go Show resolved Hide resolved

fpetkovski reviewed Oct 1, 2024

View reviewed changes

niaurys force-pushed the add_cuckoo_filter_on_metric_names branch from d4a2053 to 92a4d12 Compare October 1, 2024 13:28

niaurys added 7 commits October 1, 2024 16:37

receive/multitsdb: add cuckoo filter on metric names

1cd8d90

Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>

add enable-feature flag to Receiver docs, fix newEndpointRef typo

bc3a182

Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>

add test cases for testFilter

a4298bb

Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>

avoid copy in CuckooFilterMetricNameFilter

068c92c

Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>

use matchers in store filter

f3493a1

Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>

fix TSDB pruning

d2b6089

Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>

address PR comments, use options in tsbd initializations

a659d10

Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>

niaurys force-pushed the add_cuckoo_filter_on_metric_names branch from 92a4d12 to 0aeeee6 Compare October 1, 2024 13:43

rebase on main

701d312

Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>

niaurys force-pushed the add_cuckoo_filter_on_metric_names branch from 0aeeee6 to 701d312 Compare October 1, 2024 13:45

fpetkovski approved these changes Oct 1, 2024

View reviewed changes

anarcher approved these changes Oct 2, 2024

View reviewed changes

pedro-stanaka approved these changes Oct 2, 2024

View reviewed changes

fpetkovski merged commit f265c3b into thanos-io:main Oct 2, 2024
21 of 22 checks passed

niaurys deleted the add_cuckoo_filter_on_metric_names branch October 2, 2024 06:21

BrewTestBot mentioned this pull request Nov 25, 2024

thanos 0.37.0 Homebrew/homebrew-core#198927

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

receive/multitsdb: add cuckoo filter on metric names #7787

receive/multitsdb: add cuckoo filter on metric names #7787

niaurys commented Sep 27, 2024 •

edited

Loading

yeya24 commented Sep 27, 2024

anarcher commented Sep 28, 2024

MichaHoffmann commented Sep 28, 2024 •

edited

Loading

anarcher commented Sep 28, 2024

yeya24 commented Sep 28, 2024 •

edited

Loading

anarcher commented Sep 29, 2024

GiedriusS commented Sep 30, 2024

GiedriusS Sep 30, 2024

niaurys commented Sep 30, 2024

yeya24 commented Sep 30, 2024

fpetkovski Oct 1, 2024 •

edited

Loading

niaurys Oct 1, 2024

fpetkovski Oct 1, 2024

niaurys Oct 1, 2024

fpetkovski left a comment

receive/multitsdb: add cuckoo filter on metric names #7787

receive/multitsdb: add cuckoo filter on metric names #7787

Conversation

niaurys commented Sep 27, 2024 • edited Loading

yeya24 commented Sep 27, 2024

anarcher commented Sep 28, 2024

MichaHoffmann commented Sep 28, 2024 • edited Loading

anarcher commented Sep 28, 2024

yeya24 commented Sep 28, 2024 • edited Loading

anarcher commented Sep 29, 2024

GiedriusS commented Sep 30, 2024

GiedriusS Sep 30, 2024

Choose a reason for hiding this comment

niaurys commented Sep 30, 2024

yeya24 commented Sep 30, 2024

fpetkovski Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

niaurys Oct 1, 2024

Choose a reason for hiding this comment

fpetkovski Oct 1, 2024

Choose a reason for hiding this comment

niaurys Oct 1, 2024

Choose a reason for hiding this comment

fpetkovski left a comment

Choose a reason for hiding this comment

niaurys commented Sep 27, 2024 •

edited

Loading

MichaHoffmann commented Sep 28, 2024 •

edited

Loading

yeya24 commented Sep 28, 2024 •

edited

Loading

fpetkovski Oct 1, 2024 •

edited

Loading