Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive/multitsdb: add cuckoo filter on metric names #7787

Merged

Conversation

niaurys
Copy link
Contributor

@niaurys niaurys commented Sep 27, 2024

Add a cuckoo filter on metric names to reduce fan-out cost. In our case it reduces CPU usage by a lot
image

@GiedriusS idea πŸ™‡β€β™‚οΈ

This feature can be enabled with --enable-feature=metric-names-filter flag.

@niaurys niaurys marked this pull request as draft September 27, 2024 13:24
@niaurys niaurys force-pushed the add_cuckoo_filter_on_metric_names branch from 5eb98d6 to 543ca2e Compare September 27, 2024 13:30
@yeya24
Copy link
Contributor

yeya24 commented Sep 27, 2024

Hi, thanks for the PR. I am trying to understand the change better.
Since the cuckoo filter is added in Receiver and Receiver shards series by all labels hash, metrics should exist in all Receivers, right? As long as the metric doesn't have too few series associated.

If Receiver shards series by hash of the metric name, then I feel the filter could help because a metric should only exist in some receivers not all. Can you explain more of the usecase?

pkg/query/endpointset.go Outdated Show resolved Hide resolved
@anarcher
Copy link

In my case, it seems that the fanout of thanos-query is affecting the CPU load of both thanos-query and thanos-receiver. (Interestingly, when I gave the same query pattern (e.g., many alert queries) to Cortex, it didn't cause a significant CPU load on cortex/ingester.)
Even without shard information, if we query (with redundancy) only the receivers that have the relevant series, wouldn't it at least reduce the burden on thanos-query? (Personally, I feel that it might also reduce the CPU load on the receiver :-)

@MichaHoffmann
Copy link
Contributor

MichaHoffmann commented Sep 28, 2024

In my case, it seems that the fanout of thanos-query is affecting the CPU load of both thanos-query and thanos-receiver. (Interestingly, when I gave the same query pattern (e.g., many alert queries) to Cortex, it didn't cause a significant CPU load on cortex/ingester.) Even without shard information, if we query (with redundancy) only the receivers that have the relevant series, wouldn't it at least reduce the burden on thanos-query? (Personally, I feel that it might also reduce the CPU load on the receiver :-)

Because we shard on metric hash and not name label we would expect all receivers to have a series of the metric if it has sufficiently high cardinality right?

@anarcher
Copy link

Because we shard on metric hash and not name label we would expect all receivers to have a series of the metric if it has sufficiently high cardinality right?

You're right. However, as in this issue (#1611), there shouldn't be many empty responses, but in my case, there were quite a few. In my situation, it might be because I have many patterns that query multiple tenants. (This occurs especially in alert query patterns.)

@yeya24
Copy link
Contributor

yeya24 commented Sep 28, 2024

Do you enable query tenancy?

Not sure I understand how multi tenant query play a role here. You meant some tenant has the metric while other tenant doesn't? Then it makes sense to have the metric name based filter to skip some tenants.

@anarcher
Copy link

Do you enable query tenancy?

Unfortunately, no.

Not sure I understand how multi tenant query play a role here. You meant some tenant has the metric while other tenant doesn't? Then it makes sense to have the metric name based filter to skip some tenants.

The tenants are created in a way that makes it a bit difficult to filter with extLset :(

I'm sorry this is a question somewhat different from the PR discussion, but what do you think about supporting regular expressions in extLset?​​​​​​​​​​​​​​​​

Thanks :-)

@GiedriusS
Copy link
Member

The conversation here got a bit derailed πŸ˜„ indeed this PR solves the issue when queries don't have a tenant label matcher on them and then MultiTSDB is forced to spawn a goroutine for each TSDB, and call Select() inside of it. All of this is needed is because we don't know whether Select() will return anything before the first Next() call.

The consensus on Slack was that we will try this idea in MultiTSDB first and then we can apply to other places or generalize/extend it to encompass more label names, for instance.

@niaurys niaurys force-pushed the add_cuckoo_filter_on_metric_names branch from 5608077 to 7398bc0 Compare September 30, 2024 09:15
pkg/filter/cuckoo.go Outdated Show resolved Hide resolved
@@ -594,6 +597,16 @@ func storeMatches(ctx context.Context, s Client, mint, maxt int64, matchers ...*
if !LabelSetsMatch(matchers, extLset...) {
return false, fmt.Sprintf("external labels %v does not match request label matchers: %v", extLset, matchers)
}

for _, m := range matchers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestion: this will now add extra work all users of store/proxy.go. I suggest moving this whole loop into s.MatchesMetricName() instead.

So, it would be

func MatchesMetricName(matches []labels.Matchers) bool. The allow-all filter would just do no work at all and the cuckoo version would iterate through the given matchers.

@niaurys niaurys marked this pull request as ready for review September 30, 2024 11:59
@niaurys
Copy link
Contributor Author

niaurys commented Sep 30, 2024

Hi, thanks for the PR. I am trying to understand the change better. Since the cuckoo filter is added in Receiver and Receiver shards series by all labels hash, metrics should exist in all Receivers, right? As long as the metric doesn't have too few series associated.

If Receiver shards series by hash of the metric name, then I feel the filter could help because a metric should only exist in some receivers not all. Can you explain more of the usecase?

Regarding use case: we have a ton of metrics in our organisation and some time ago we used one TSDB (multi Prometheus deployment with hashmod target distribution) to write all metrics to it. Now, we have migrated to Thanos Receive and decided to have instead more smaller TSDBs as in previous deployment Compactor usually will have problems with our very big metric blocks due to single TSDB. We decided to use Kubernetes namespace as tenant identifier except in one namespace with one app we divided even more as TSDB was too big without it.

Now we run 2 Receiver hashrings: one for all namespaces with 6 Receiver service and one for that namespace divided more with 32 Receive services respectively. All Receivers from both hashrings are used in query path (global instance), so if query is made we are waiting answer from all Receivers on all queries...

With cuckoo filter I would except to return early and save some resources in case some particular metric is not present. In our case probably quite high amount of metric are not located on that 32 services Receive hashring contains metrics from one app only.

@yeya24
Copy link
Contributor

yeya24 commented Sep 30, 2024

@niaurys @GiedriusS Thanks for sharing the usecase. I think it makes sense for this usecase.

indeed this PR solves the issue when queries don't have a tenant label matcher on them and then MultiTSDB is forced to spawn a goroutine for each TSDB

This is not ideal. If all queries have associated tenant label then there is no such issue anymore. But understood that if we want a dynamic tenant label enforcer it is basically the same as this cuckoo filter.

Now, we have migrated to Thanos Receive and decided to have instead more smaller TSDBs as in previous deployment Compactor usually will have problems with our very big metric blocks due to single TSDB

Compaction and block vertical sharding is more a pain point to solve.

storeFilter = filter.AllowAllStoreFilter{}
if metricNameFilterEnabled {
startFilterUpdate = true
storeFilter = filter.NewCuckooMetricNameStoreFilter(1000000) // about 1MB on 64bit machines.
Copy link
Contributor

@fpetkovski fpetkovski Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can extract this number into a constant like

const megabyte = 1024 *1024

There is also a units package which might have this constant already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we set just capacity (which I now moved to constant), comment about memory comes from package itself: https://github.com/seiflotfy/cuckoofilter/blob/master/cuckoofilter.go#L18-L20. Now I have deleted comment at all :)

buffers: sync.Pool{New: func() interface{} {
b := make([]byte, 0, initialBufSize)
return &b
}},
}

if startFilterUpdate {
t := time.NewTicker(15 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean there could be a 15 seconds of additional delay when a new metric is written to receivers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...yes, for now it seems sensible and simplest approach.

pkg/store/tsdb.go Outdated Show resolved Hide resolved
pkg/receive/multitsdb.go Outdated Show resolved Hide resolved
pkg/store/tsdb.go Outdated Show resolved Hide resolved
@niaurys niaurys force-pushed the add_cuckoo_filter_on_metric_names branch from d4a2053 to 92a4d12 Compare October 1, 2024 13:28
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
@niaurys niaurys force-pushed the add_cuckoo_filter_on_metric_names branch from 92a4d12 to 0aeeee6 Compare October 1, 2024 13:43
Signed-off-by: Mindaugas Niaura <mindaugas.niaura@vinted.com>
@niaurys niaurys force-pushed the add_cuckoo_filter_on_metric_names branch from 0aeeee6 to 701d312 Compare October 1, 2024 13:45
Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸš€ I really like the idea of starting with just in-memory filters.

@fpetkovski fpetkovski merged commit f265c3b into thanos-io:main Oct 2, 2024
21 of 22 checks passed
@niaurys niaurys deleted the add_cuckoo_filter_on_metric_names branch October 2, 2024 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants