Resort store response set on internal label dedup #6317

fpetkovski · 2023-04-25T07:36:49Z

When deduplicating on labels which are stored internally in TSDB, the store response set needs to be resorted after replica labels are removed.

In order to detect when deduplication by internal labels happens, this PR adds a cuckoo filter with all label names to all store implementations. When a replica label is present in this filter, the store will resort the Series response set before returning it to the querier.

Fixes #6257.
Closes #6296.

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

Verification

stale · 2023-06-18T07:10:43Z

Hello 👋 Looks like there was no activity on this amazing PR for the last 30 days.
Do you mind updating us on the status? Is there anything we can help with? If you plan to still work on it, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next week, this issue will be closed (we can always reopen a PR if you get back to this!). Alternatively, use remind command if you wish to be reminded at some point in future.

pkg/store/proxy_test.go

pkg/bloom/bloom.go

douglascamata

There's a lot of reasoning kind of hidden behind the "bloom filter" name.

We are adding it in many different places, including the gRPC definition of the Store API, and its purpose is still not clear. On top of this, it also couples the Store API definition to one specific implementation (bloom filter) of "a way to do this thing".

I think this happens because it is named after what is it and not what it does. In Go, we kind of don't need to name things after what they are -- their type will tell us this.

But what is this bloom filter used for? Seems like is doing a "quick check" for whether a series request includes internal replica labels. So potentially we can name this better? Maybe "internal label checker" seems like a good proposal. One day we might switch from a bloom filter to something else, or add another mechanism (i.e. what if I wanted to store this information within a Redis or Memcache instance?) and things would be more extensible.

Now on a more technical question: why do we need to transfer the bloom filter over gRPC? Storing on Redis with the client-side cache features we get from the rueidis lib seem like a nice approach too.

yeya24 · 2023-07-14T17:43:52Z

pkg/store/proxy_heap.go

-	var labelsToRemove map[string]struct{}
-	if !st.SupportsWithoutReplicaLabels() && len(req.WithoutReplicaLabels) > 0 {
+	labelsToRemove := make(map[string]struct{})
+	dedupByInternalLabel := hasInternalReplicaLabels(st, req)


Can we put this behind a feature flag? For Cortex, this code path seems an unnecessary overhead

Does cortex use the proxy_heap? But yes, we should add a FF in the bucket store

Yeah proxy heap will be used by store gateway for lazy series? So it is always used.

Ah yes, that's correct.

Added a feature flag in fpetkovski#5

douglascamata · 2023-07-20T12:51:07Z

Was talking with @saswatamcode earlier today and I was wondering whether we could add some metrics to the global sorting and bloom filter to try to quantify how many times a global sort is being executed or skipped. Potentially could also measure how long the bloom filter update is taking.

What do you think?

fpetkovski · 2023-07-20T14:12:59Z

Sounds like a good idea 👍

moadz

Some small nits for cleanliness but otherwise thanks for doing this :) i'm excited about what we can do wtih this bloomfilter now it's there

moadz · 2023-07-20T08:43:19Z

test/e2e/query_test.go

-			// This test is expected to fail until the bug outlined in https://github.com/thanos-io/thanos/issues/6257
-			// is fixed. This means that it will return double the expected series until then.
+			// This is a regression test for the bug outlined in https://github.com/thanos-io/thanos/issues/6257.
+			// Until the bug was fixed, this testcase would return double the expected series.
 			expectedDedupBug: true,


Should we be cleaning up this bool?

Great found, sir. We should get rid of this bool.

moadz · 2023-07-20T08:44:36Z

test/e2e/receive_test.go

-		// This should've returned only 2 series, but is returning 4 until the problem reported in
-		// https://github.com/thanos-io/thanos/issues/6257 is fixed
+		// This is a regression test for the bug outlined in https://github.com/thanos-io/thanos/issues/6257.
+		// Until the bug was fixed, this testcase would return 4 series instead of 2.


// Until the bug was fixed, this testcase would return 4 series instead of 2.

nit: Unnecessary clarification.

moadz · 2023-07-20T08:46:45Z

cmd/thanos/query.go

 					return &infopb.StoreInfo{
 						MinTime:                      mint,
 						MaxTime:                      maxt,
 						SupportsSharding:             true,
 						SupportsWithoutReplicaLabels: true,
 						TsdbInfos:                    proxy.TSDBInfos(),
+						LabelNamesBloom:              infopb.NewBloomFilter(labelNamesBloom),


nit: LabelNamesBloom does not elude to what it's actually used for. Type can be inferred, should instead allude to what it's used for. In this case 'indexedLabels' for example.

So in this case, the store info is a data transfer struct, so I think it's fine to keep the Bloom suffix. I think it's important to know what those bytes actually are. The same way we have MinTime and MaxTime

moadz · 2023-07-20T08:53:39Z

cmd/thanos/query.go

+	// Start bloom name filter updater.
+	{
+		ctx, cancel := context.WithCancel(context.Background())
+		level.Debug(logger).Log("msg", "setting up periodic label names bloom filter update")
+		g.Add(func() error {
+			return runutil.Repeat(10*time.Second, ctx.Done(), func() error {
+				level.Debug(logger).Log("msg", "Starting label names bloom filter update")
+
+				if err := proxy.UpdateLabelNamesBloom(ctx); err != nil {
+					return err
+				}
+
+				level.Debug(logger).Log("msg", "Finished label names bloom filter update")
+				return nil
+			})
+		}, func(err error) {
+			cancel()
+		})
+	}


nit: Might be cleaner have goroutine errors/invocation handled in a different object (e.g. StoreLabelIndexer) that takes a store Client and invokes UpdateLabelNamesBloom at some refresh interval. We need to do this for every store, and doing it inline is a bit of a refactoring nightmare.

moadz · 2023-07-20T08:54:37Z

pkg/bloom/bloom.go

+	"github.com/bits-and-blooms/bloom"
+)
+
+const FilterErrorRate = 0.01


nit: might be good to add some clarification on what this error rate represents (margin of error that bloom filter will return false for a value that it does contain)

moadz · 2023-07-20T08:56:32Z

pkg/receive/multitsdb.go

-	labelSetFunc  func() []labelpb.ZLabelSet
-	timeRangeFunc func() (int64, int64)
-	tsdbOpts      *tsdb.Options
+	store *store.TSDBStore


Nice cleanup <3

moadz · 2023-07-20T08:59:48Z

pkg/store/bucket.go

+	bmtx            sync.Mutex
+	labelNamesBloom bloom.Filter


superNit: Doesn't seem like the right place for the mutex to be managed, perhaps this should be moved into LabelNamesBloom for cleaner concurrency safeness.

moadz · 2023-07-20T09:56:29Z

pkg/store/bucket.go

+				mtx.Lock()
+				for _, n := range result {


superNit: Another argument for moving bmtx is that we're juggling more than one mutex in this func, would reduce mental load reading if we didn't have to reason about concurrency in multiple dimensions :) (supernit for a reason
)

moadz · 2023-07-20T09:57:56Z

pkg/store/bucket.go

+	g, _ := errgroup.WithContext(ctx)
+
+	var mtx sync.Mutex
+	names := make(map[string]struct{})


Is there a reason why we're using a map instead of struct if we never populate struct is always empty?

pkg/store/bucket.go

pkg/bloom/bloom.go

pkg/store/proxy.go

When deduplicating on labels which are stored internally in TSDB, the store response set needs to be resorted after replica labels are removed. In order to detect when deduplication by internal labels happens, this PR adds a bloom filter with all label names to the Info response. When a replica label is present in this bloom filter for an individual store, the proxy heap would resort a response set from that store before merging in the result with the rest of the set. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski · 2023-07-31T12:58:51Z

@GiedriusS @saswatamcode @douglascamata @moadz I have modified this PR to use a cuckoo filter and resort the series response in the store itself. Please take another look, the implementation is now simpler since we dont have to send additional data to the querier.

saswatamcode

Thanks! This looks good to me! Let's get this merged and release v0.32! 🙂

pkg/stringset/set.go

GiedriusS · 2023-07-31T14:01:30Z

pkg/stringset/set.go

+}
+
+func NewFromStrings(items ...string) Set {
+	f := cuckoo.NewFilter(uint(len(items)))


Maybe we could estimate the size of the underlying slices as per the comments here https://github.com/seiflotfy/cuckoofilter/blob/master/cuckoofilter.go#L21-L26 and add some way of adding an upper limit for this, something like maybe 5MB by default?

Hm, what action do we take if the limit is exceeded?

I think a good choice would be to print a warning message and then always force sorting? 🤔

I wonder if that is a good tradeoff though, because 5MB is a fairly low price to pay compared to the increase in memory required for buffering and resorting series. If 1000000 is ~1MB, I think it will be very hard to have so many label names for memory of the filter to be a problem. Maybe we should check behavior in production before we add limits?

👍 makes sense

pkg/stringset/set.go

bwplotka

Very elegant implementation, thanks for this!

I think it's worth the try. A little bit concerned on the overhead for the whole system (generally should be insignificant, but we have to try to know). I also wonder how system handles eventual consistency (I assume we give inaccurate query results?). Perhaps there is a benefit to turn on/off this filtering system on demand?

cmd/thanos/store.go

bwplotka · 2023-07-31T18:29:32Z

cmd/thanos/store.go

+	// Start bloom name filter updater.
+	{
+		ctx, cancel := context.WithCancel(context.Background())
+		level.Debug(logger).Log("msg", "setting up periodic update for label names")


Suggested change

level.Debug(logger).Log("msg", "setting up periodic update for label names")

level.Info(logger).Log("msg", "setting up periodic update for label names")

I guess info would make sense here

pkg/store/bucket.go

GiedriusS · 2023-07-31T20:07:05Z

As for consistency it's the same like with a sharded Thanos Store - blocks are not loaded at the same time on all nodes.

#6317 (comment) perhaps the limit flag could serve as a way to disable this i.e. setting to 0 would disable this functionality on a node and show that all label names are available.

Not in all setups reading data from remote object storage costs. And also I would be against removing the optimizations since they cut down query duration by 30%-40%.

douglascamata · 2023-07-31T20:26:40Z

I think in case of lack of consistency, or even a false positive, what happens is a global resort. Am I right?

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

bwplotka · 2023-08-01T16:47:43Z

Not in all setups reading data from remote object storage costs. And also I would be against removing the optimizations since they cut down query duration by 30%-40%.

@GiedriusS happy to hear I didn't break Thanos for nothing =DDDDD

Coincidentally with this implementation we go into Monarch design even more (public info: https://www.vldb.org/pvldb/vol13/p3181-adams.pdf, and yes, I'm biased 🙃). Essentially Monarch has Field Hints, (so kind of our labels) that it updates and consult on every query 🙈

So... let's double check consistency issue, otherwise LGTM (:

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

GiedriusS

👍 let's see how much RAM the filters will need

GiedriusS · 2023-08-09T09:18:45Z

pkg/store/bucket.go

@@ -1240,7 +1253,9 @@ func debugFoundBlockSetOverview(logger log.Logger, mint, maxt, maxResolutionMill
 }

 // Series implements the storepb.StoreServer interface.
-func (s *BucketStore) Series(req *storepb.SeriesRequest, srv storepb.Store_SeriesServer) (err error) {
+func (s *BucketStore) Series(req *storepb.SeriesRequest, seriesSrv storepb.Store_SeriesServer) (err error) {
+	srv := newFlushableServer(seriesSrv, s.LabelNamesSet(), req.WithoutReplicaLabels)


This will be a yet another way of performing a race inside of the storegateway but let's work on fixing this now before the release 👍 😄

I actually might be wrong because the Close() functions will still run at the same time 🤔 let's merge this and test it

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski · 2023-08-10T06:01:25Z

Ok, time to try it out :)

douglascamata · 2023-08-10T09:31:18Z

Awesome! So happy to see this merged. Thanks a lot, folks! 🙇

douglascamata · 2023-08-10T15:22:10Z

Btw, small nit: next time we need to remember to squash and merge. 😅

pull-request-size bot added the size/L label Apr 25, 2023

fpetkovski mentioned this pull request Apr 25, 2023

Deduplication returning deduped and non-deduped results in 0.31.0+ #6257

Closed

stale bot added the stale label Jun 18, 2023

saswatamcode mentioned this pull request Jul 13, 2023

Resort store response set on internal label dedup #6529

Closed

2 tasks

fpetkovski force-pushed the resort-dataset-on-internal-dedup branch from e0a0529 to 4f3c8ad Compare July 13, 2023 07:46

stale bot removed the stale label Jul 13, 2023

pull-request-size bot added size/XL and removed size/L labels Jul 14, 2023

saswatamcode reviewed Jul 14, 2023

View reviewed changes

pkg/store/proxy_test.go Outdated Show resolved Hide resolved

pkg/bloom/bloom.go Outdated Show resolved Hide resolved

douglascamata reviewed Jul 14, 2023

View reviewed changes

yeya24 reviewed Jul 14, 2023

View reviewed changes

saswatamcode mentioned this pull request Jul 20, 2023

Feature flag & add basic unit tests for Proxy/BlockStore UpdateLabelNamesBloom() fpetkovski/thanos#5

Merged

2 tasks

moadz approved these changes Jul 20, 2023

View reviewed changes

fpetkovski force-pushed the resort-dataset-on-internal-dedup branch from 032e64b to a29cc58 Compare July 21, 2023 05:52

GiedriusS reviewed Jul 27, 2023

View reviewed changes

pkg/bloom/bloom.go Outdated Show resolved Hide resolved

fpetkovski force-pushed the resort-dataset-on-internal-dedup branch from 61a00d7 to 341e874 Compare July 27, 2023 13:21

sonatype-lift bot reviewed Jul 27, 2023

View reviewed changes

pkg/store/proxy.go Outdated Show resolved Hide resolved

pull-request-size bot added size/L and removed size/XL labels Jul 27, 2023

fpetkovski added 4 commits July 28, 2023 14:45

Resort data in TSDB

00ce595

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Fix bucket_test.go

9d4628b

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Fix TSDB store

0ea795e

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski force-pushed the resort-dataset-on-internal-dedup branch from 4fb3558 to 0ea795e Compare July 29, 2023 07:12

pull-request-size bot added size/XL and removed size/L labels Jul 29, 2023

Remove print statements

dd9b052

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski added 5 commits July 29, 2023 09:28

Flush at end of Series call

6d37f8a

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Fix bucket test

c779697

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Cleanup code

f230301

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Clean up e2e/query_test.go

26cbd33

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Fix bucket test

2f659ad

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski marked this pull request as ready for review July 31, 2023 12:57

saswatamcode previously approved these changes Jul 31, 2023

View reviewed changes

pkg/stringset/set.go Show resolved Hide resolved

GiedriusS reviewed Jul 31, 2023

View reviewed changes

pkg/stringset/set.go Show resolved Hide resolved

bwplotka reviewed Jul 31, 2023

View reviewed changes

Use index header to read labels

db076c8

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski dismissed saswatamcode’s stale review via db076c8 August 1, 2023 06:53

Remove redundant comments

e185d04

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski mentioned this pull request Aug 9, 2023

[Store gateway] Highly increased latency on "get_range" bucket operations on 0.30.2 #6540

Closed

Merge branch 'main' into resort-dataset-on-internal-dedup

87bcbce

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

GiedriusS previously approved these changes Aug 9, 2023

View reviewed changes

Fix lint

8c511ac

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

fpetkovski dismissed GiedriusS’s stale review via 8c511ac August 9, 2023 09:28

saswatamcode approved these changes Aug 10, 2023

View reviewed changes

fpetkovski merged commit 84567ec into thanos-io:main Aug 10, 2023

jtb-sre mentioned this pull request Apr 6, 2024

0.32.0 caused spike in network traffic #7213

Open

	level.Debug(logger).Log("msg", "setting up periodic update for label names")
	level.Info(logger).Log("msg", "setting up periodic update for label names")

Resort store response set on internal label dedup #6317

Resort store response set on internal label dedup #6317

Conversation

fpetkovski commented Apr 25, 2023 • edited Loading

Changes

Verification

stale bot commented Jun 18, 2023

douglascamata left a comment

Choose a reason for hiding this comment

yeya24 Jul 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

douglascamata commented Jul 20, 2023

fpetkovski commented Jul 20, 2023

moadz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpetkovski Jul 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpetkovski commented Jul 31, 2023

saswatamcode left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GiedriusS Aug 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bwplotka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GiedriusS commented Jul 31, 2023 • edited Loading

douglascamata commented Jul 31, 2023

bwplotka commented Aug 1, 2023

GiedriusS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpetkovski commented Aug 10, 2023

douglascamata commented Aug 10, 2023

douglascamata commented Aug 10, 2023

fpetkovski commented Apr 25, 2023 •

edited

Loading

yeya24 Jul 14, 2023 •

edited

Loading

fpetkovski Jul 20, 2023 •

edited

Loading

GiedriusS Aug 1, 2023 •

edited

Loading

GiedriusS commented Jul 31, 2023 •

edited

Loading