Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added per-tenant in-process sharding support to compactor #2599

Merged
merged 9 commits into from
May 18, 2020

Conversation

pracucci
Copy link
Contributor

@pracucci pracucci commented May 15, 2020

NOTE: Rolled back by #2628

What this PR does:
We're hitting some vertical scalability limits on the compactor. We have a large user with 30+M active series and compacting 2h blocks take more than 2h. The TSDB compactor uses a single CPU core (there's no parallelisation), so we can't really vertically scale up unless sharding blocks.

In this PR I'm introducing per-tenant in-process sharding support to the compactor, leveraging on the fact that Thanos compactor can parallelise compaction of different blocks groups.

The way it works is quite simple:

  1. Add the ingester ID as external label to blocks uploaded by the ingester
  2. Add a new metadata fetcher filter in the compactor which replaces the ingester ID label with a shard ID calculated on the hash of the ingester ID

Guaranteed properties:

  • Same ingester blocks are always compacted together, even for multiple levels of compaction (2nd, 3rd, ... level)

Out of the scope of this PR:

  • Per-tenant config overrides

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pracucci added 3 commits May 15, 2020 15:31
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci requested a review from pstibrany May 15, 2020 13:46
pracucci added 2 commits May 15, 2020 15:50
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing PR!

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci
Copy link
Contributor Author

Unfortunately this causes problems with the deduplication on the read path, because the new external labels end up being treated as additional series labels. I'm working on a solution.

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci
Copy link
Contributor Author

Unfortunately this causes problems with the deduplication on the read path, because the new external labels end up being treated as additional series labels. I'm working on a solution.

The solution I've adopted (commit) is pretty simple and follows what also Thanos is doing. The idea is to remove any external label used to identify replicas/shards directly when iterating the bucket.

@pracucci
Copy link
Contributor Author

Unfortunately this causes problems with the deduplication on the read path, because the new external labels end up being treated as additional series labels. I'm working on a solution.

A quick explanation of the problem and solution.

The Thanos BucketStore keeps blocks grouped by external labels. When such blocks are queried, the external labels are added to the returned Series().

Before this PR we had only the user ID as external label, which is constant across all blocks of an user, so doesn't matter at which "point in time" you remove such external label, we've just to remove it.

However, the new external labels (ingester ID and shard ID) are variable. For the same user we have blocks with different external labels. Think about 2 single series metric_a{} and metric_b{} which are replicated 3 ways and goes to 3 ingesters. When we query it from BucketStore (before compaction) we'll fetch it from 3 blocks (1 per ingester), so we'll get:

metric_a{__ingester_id__="1"}
metric_b{__ingester_id__="1"}
metric_a{__ingester_id__="2"}
metric_b{__ingester_id__="2"}
metric_a{__ingester_id__="3"}
metric_b{__ingester_id__="3"}

In the querier, we assume that a Series() response contains series sorted by label (correct) and thus we assume that all chunks for 1 series are consecutive in the response (in order for the deduplication to work correctly). However, given we now have variable external labels this is no more true and thus the deduplication doesn't work as expected.

The solution (which is also what Thanos does) is removing these external labels that we don't want at query time directly in the metadata fetcher.

pracucci added 2 commits May 18, 2020 09:27
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci merged commit e43cb34 into cortexproject:master May 18, 2020
@pracucci pracucci deleted the parallelise-compactor branch May 18, 2020 09:00
pracucci added a commit to grafana/cortex that referenced this pull request May 18, 2020
…ect#2599)

* Added per-tenant in-process sharding support to compactor

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added concurrency config option

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed filter

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Updated CHANGELOG

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Improved distribution test

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed external labels removal at query time

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed removal of external labels when querying back blocks from storage

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added unit test

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed linter

Signed-off-by: Marco Pracucci <marco@pracucci.com>
alanprot added a commit to alanprot/cortex that referenced this pull request Dec 9, 2022
pull bot pushed a commit to boost-entropy-k8s/cortex that referenced this pull request Dec 9, 2022
alanprot added a commit that referenced this pull request Dec 9, 2022
alexqyle pushed a commit to alexqyle/cortex that referenced this pull request May 2, 2023
Signed-off-by: Alex Le <leqiyue@amazon.com>
alexqyle pushed a commit to alexqyle/cortex that referenced this pull request May 2, 2023
Signed-off-by: Alex Le <leqiyue@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants