Added per-tenant in-process sharding support to compactor #2599

pracucci · 2020-05-15T13:46:12Z

NOTE: Rolled back by #2628

What this PR does:
We're hitting some vertical scalability limits on the compactor. We have a large user with 30+M active series and compacting 2h blocks take more than 2h. The TSDB compactor uses a single CPU core (there's no parallelisation), so we can't really vertically scale up unless sharding blocks.

In this PR I'm introducing per-tenant in-process sharding support to the compactor, leveraging on the fact that Thanos compactor can parallelise compaction of different blocks groups.

The way it works is quite simple:

Add the ingester ID as external label to blocks uploaded by the ingester
Add a new metadata fetcher filter in the compactor which replaces the ingester ID label with a shard ID calculated on the hash of the ingester ID

Guaranteed properties:

Same ingester blocks are always compacted together, even for multiple levels of compaction (2nd, 3rd, ... level)

Out of the scope of this PR:

Per-tenant config overrides

Which issue(s) this PR fixes:
N/A

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pstibrany

Amazing PR!

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci · 2020-05-15T15:43:59Z

Unfortunately this causes problems with the deduplication on the read path, because the new external labels end up being treated as additional series labels. I'm working on a solution.

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pkg/storegateway/bucket_stores.go

pracucci · 2020-05-18T06:08:23Z

Unfortunately this causes problems with the deduplication on the read path, because the new external labels end up being treated as additional series labels. I'm working on a solution.

The solution I've adopted (commit) is pretty simple and follows what also Thanos is doing. The idea is to remove any external label used to identify replicas/shards directly when iterating the bucket.

pracucci · 2020-05-18T06:17:01Z

Unfortunately this causes problems with the deduplication on the read path, because the new external labels end up being treated as additional series labels. I'm working on a solution.

A quick explanation of the problem and solution.

The Thanos BucketStore keeps blocks grouped by external labels. When such blocks are queried, the external labels are added to the returned Series().

Before this PR we had only the user ID as external label, which is constant across all blocks of an user, so doesn't matter at which "point in time" you remove such external label, we've just to remove it.

However, the new external labels (ingester ID and shard ID) are variable. For the same user we have blocks with different external labels. Think about 2 single series metric_a{} and metric_b{} which are replicated 3 ways and goes to 3 ingesters. When we query it from BucketStore (before compaction) we'll fetch it from 3 blocks (1 per ingester), so we'll get:

metric_a{__ingester_id__="1"}
metric_b{__ingester_id__="1"}
metric_a{__ingester_id__="2"}
metric_b{__ingester_id__="2"}
metric_a{__ingester_id__="3"}
metric_b{__ingester_id__="3"}

In the querier, we assume that a Series() response contains series sorted by label (correct) and thus we assume that all chunks for 1 series are consecutive in the response (in order for the deduplication to work correctly). However, given we now have variable external labels this is no more true and thus the deduplication doesn't work as expected.

The solution (which is also what Thanos does) is removing these external labels that we don't want at query time directly in the metadata fetcher.

Signed-off-by: Marco Pracucci <marco@pracucci.com>

…ect#2599) * Added per-tenant in-process sharding support to compactor Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added concurrency config option Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed filter Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated CHANGELOG Signed-off-by: Marco Pracucci <marco@pracucci.com> * Improved distribution test Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed external labels removal at query time Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed removal of external labels when querying back blocks from storage Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added unit test Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed linter Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Alex Le <leqiyue@amazon.com>

pracucci added 3 commits May 15, 2020 15:31

Added per-tenant in-process sharding support to compactor

b2c4005

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Added concurrency config option

9aa37d9

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Fixed filter

3a5666b

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci requested a review from pstibrany May 15, 2020 13:46

pull-request-size bot added the size/L label May 15, 2020

pracucci added 2 commits May 15, 2020 15:50

Updated CHANGELOG

0467868

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Improved distribution test

20ad436

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pstibrany approved these changes May 15, 2020

View reviewed changes

Fixed external labels removal at query time

e9e9f89

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Fixed removal of external labels when querying back blocks from storage

b559cc3

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pstibrany reviewed May 18, 2020

View reviewed changes

pkg/storegateway/bucket_stores.go Show resolved Hide resolved

pracucci added 2 commits May 18, 2020 09:27

Added unit test

bf51fd4

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Fixed linter

df62410

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci merged commit e43cb34 into cortexproject:master May 18, 2020

pracucci deleted the parallelise-compactor branch May 18, 2020 09:00

This was referenced May 20, 2020

Introduced time-based concurrent compaction #2616

Closed

Rolledback compactor blocks sharding #2628

Merged

Micro optimization in newBlockQuerierSeries() #2633

Merged

alanprot added a commit to alanprot/cortex that referenced this pull request Dec 9, 2022

clean up - cortexproject#2599

5b94d0f

alanprot mentioned this pull request Dec 9, 2022

chore: Small code clean up #5031

Merged

pull bot pushed a commit to boost-entropy-k8s/cortex that referenced this pull request Dec 9, 2022

clean up - cortexproject#2599 (cortexproject#5031)

352b713

alanprot added a commit that referenced this pull request Dec 9, 2022

clean up - #2599 (#5031)

0d9facf

alexqyle pushed a commit to alexqyle/cortex that referenced this pull request May 2, 2023

clean up - cortexproject#2599 (cortexproject#5031)

e83e079

Signed-off-by: Alex Le <leqiyue@amazon.com>

alexqyle pushed a commit to alexqyle/cortex that referenced this pull request May 2, 2023

clean up - cortexproject#2599 (cortexproject#5031)

15a91a6

Signed-off-by: Alex Le <leqiyue@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added per-tenant in-process sharding support to compactor #2599

Added per-tenant in-process sharding support to compactor #2599

pracucci commented May 15, 2020 •

edited by bboreham

Loading

pstibrany left a comment

pracucci commented May 15, 2020

pracucci commented May 18, 2020

pracucci commented May 18, 2020

Added per-tenant in-process sharding support to compactor #2599

Added per-tenant in-process sharding support to compactor #2599

Conversation

pracucci commented May 15, 2020 • edited by bboreham Loading

pstibrany left a comment

Choose a reason for hiding this comment

pracucci commented May 15, 2020

pracucci commented May 18, 2020

pracucci commented May 18, 2020

pracucci commented May 15, 2020 •

edited by bboreham

Loading