Add shuffle sharding grouper/planner #4357

ac1214 · 2021-07-10T02:29:26Z

Signed-off-by: Albert ac1214@users.noreply.github.com

What this PR does:

Implements generation of parallelize plans for the proposal outlined in #4272 using a shuffle sharding grouper and planner. Currently the parallelizable plans are generated but every compactor runs every planned compaction, the actual sharding will happen in a subsequent PR.

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

ac1214 · 2021-07-10T02:37:28Z

I wanted to discuss a change in the block compaction behavior that this PR would introduce.

The current implementation of the Thanos compactor will always compact the first set of overlapping blocks if there exists such a set. This means that if the most recently ingested set of blocks from multiple ingesters are overlapping, the blocks will be compacted. If this happens, there is potentially a “missing” block in the compaction with the Thanos planner since there may be an ingester that has not fully uploaded the block when the compaction begins. So if there are 3 overlapping blocks when the compaction begins, and they are the latest blocks passed to the Thanos planner, the planner will plan a compaction of those 3 blocks even if there is a potential fourth ingester that has yet to upload a block.

With this PR, overlapping blocks will not be compacted if they are the last set of blocks meaning in the example above, the 3 blocks won’t be compacted if there are the latest ones and they don’t cover a full range.

In a real-world situation, this would only have an impact on customers who stop ingesting blocks. The impact will be that the last group of n blocks, where n is the number of ingesters will remain uncompacted for as long as they are the latest blocks. The impact of leaving the last n blocks uncompacted would be increased storage size as well as query time (if they continue to query even after stopping ingesting blocks). One thing to note with the Thanos approach, there can be duplicate work if the blocks are compacted, and another block that overlaps is uploaded after the compaction begins.

A couple of different approaches I considered were adding grouping overlapping blocks before grouping by compactable ranges, this results in the compaction behavior being the same using these changes compared to Thanos. Another approach is if there are no new blocks after the time defined by the smallest block range passes from the max time of all the blocks, the block which are overlapping can be compacted, even if they are the latest blocks.

Something else that I considered is making this a toggle to allow the user to define their own preference, but I think that this isn't ideal as it would either lead to having to support the toggle indefinitely or eventually having to have users switch to a single behavior.

Small example illustrating what’s mentioned above

4 total blocks with 1 block incoming (not yet uploaded)

block 1: {
  MinTime: 0, MaxTime: 20
}
block 2: {
  MinTime: 21, MaxTime: 40
}
block 3: {
  MinTime: 21, MaxTime: 40
}
block 4: {
  MinTime: 21, MaxTime: 40
}
block 5 (incoming): {
  MinTime: 21, MaxTime: 40
}

Thanos compaction

The above blocks with the current (Thanos) compaction with time ranges [20, 120, 240] would result in blocks:

block 1: {
  MinTime: 0, MaxTime: 20
}
block 2/3/4: {
  MinTime: 21, MaxTime: 40
}

Afterwards, once block 5 is fully uploaded the final resulting blocks from a single run of the compaction will be

block 1: {
  MinTime: 0, MaxTime: 20
}
block 2/3/4: {
  MinTime: 21, MaxTime: 40
}
block 5: {
  MinTime: 21, MaxTime: 40
}

With these blocks, another compaction will need to be done to fully compact the overlapping blocks 2-5.

New compaction behavior

With this PR and the shuffle-sharding strategy, the blocks would remain uncompacted. And would wait until a more recent block than 2-5 is uploaded. Once that block is uploaded blocks 2-5 would be impacted in 1 compaction.
If there is a block 6 uploaded with MinTime: 41, MaxTime: 60 after block 5 is fully uploaded, then the resulting blocks after a single compaction would be.

block 1: {
  MinTime: 0, MaxTime: 20
}
block 2/3/4/5: {
  MinTime: 21, MaxTime: 40
}
block 6: {
  MinTime: 41, MaxTime: 60
}

The downside with this approach is that the uncompacted blocks 2-5 were stored for a longer time compared to the current (Thanos) approach as it was waiting for a more recent block to be uploaded before compacting the blocks. In the above with this PR if block 6 didn't exist, then blocks 2-5 would never be compacted as they would remain as the most recent blocks.

I was wondering what your thoughts were about which approach would be preferable?

ac1214 · 2021-07-14T02:15:44Z

This PR replaces and implements the changes recommended in #4318

ac1214 · 2021-07-15T14:40:10Z

I wanted to discuss a change in the block compaction behavior that this PR would introduce.

The current implementation of the Thanos compactor will always compact the first set of overlapping blocks if there exists such a set. This means that if the most recently ingested set of blocks from multiple ingesters are overlapping, the blocks will be compacted. If this happens, there is potentially a “missing” block in the compaction with the Thanos planner since there may be an ingester that has not fully uploaded the block when the compaction begins. So if there are 3 overlapping blocks when the compaction begins, and they are the latest blocks passed to the Thanos planner, the planner will plan a compaction of those 3 blocks even if there is a potential fourth ingester that has yet to upload a block.

With this PR, overlapping blocks will not be compacted if they are the last set of blocks meaning in the example above, the 3 blocks won’t be compacted if there are the latest ones and they don’t cover a full range.

In a real-world situation, this would only have an impact on customers who stop ingesting blocks. The impact will be that the last group of n blocks, where n is the number of ingesters will remain uncompacted for as long as they are the latest blocks. The impact of leaving the last n blocks uncompacted would be increased storage size as well as query time (if they continue to query even after stopping ingesting blocks). One thing to note with the Thanos approach, there can be duplicate work if the blocks are compacted, and another block that overlaps is uploaded after the compaction begins.

A couple of different approaches I considered were adding grouping overlapping blocks before grouping by compactable ranges, this results in the compaction behavior being the same using these changes compared to Thanos. Another approach is if there are no new blocks after the time defined by the smallest block range passes from the max time of all the blocks, the block which are overlapping can be compacted, even if they are the latest blocks.

Something else that I considered is making this a toggle to allow the user to define their own preference, but I think that this isn't ideal as it would either lead to having to support the toggle indefinitely or eventually having to have users switch to a single behavior.

Small example illustrating what’s mentioned above

4 total blocks with 1 block incoming (not yet uploaded)
block 1: {
  MinTime: 0, MaxTime: 20
}
block 2: {
  MinTime: 21, MaxTime: 40
}
block 3: {
  MinTime: 21, MaxTime: 40
}
block 4: {
  MinTime: 21, MaxTime: 40
}
block 5 (incoming): {
  MinTime: 21, MaxTime: 40
}
Thanos compaction

The above blocks with the current (Thanos) compaction with time ranges [20, 120, 240] would result in blocks:
block 1: {
  MinTime: 0, MaxTime: 20
}
block 2/3/4: {
  MinTime: 21, MaxTime: 40
}
Afterwards, once block 5 is fully uploaded the final resulting blocks from a single run of the compaction will be
block 1: {
  MinTime: 0, MaxTime: 20
}
block 2/3/4: {
  MinTime: 21, MaxTime: 40
}
block 5: {
  MinTime: 21, MaxTime: 40
}
With these blocks, another compaction will need to be done to fully compact the overlapping blocks 2-5.

New compaction behavior

With this PR and the shuffle-sharding strategy, the blocks would remain uncompacted. And would wait until a more recent block than 2-5 is uploaded. Once that block is uploaded blocks 2-5 would be impacted in 1 compaction.
If there is a block 6 uploaded with MinTime: 41, MaxTime: 60 after block 5 is fully uploaded, then the resulting blocks after a single compaction would be.
block 1: {
  MinTime: 0, MaxTime: 20
}
block 2/3/4/5: {
  MinTime: 21, MaxTime: 40
}
block 6: {
  MinTime: 41, MaxTime: 60
}
The downside with this approach is that the uncompacted blocks 2-5 were stored for a longer time compared to the current (Thanos) approach as it was waiting for a more recent block to be uploaded before compacting the blocks. In the above with this PR if block 6 didn't exist, then blocks 2-5 would never be compacted as they would remain as the most recent blocks.

I was wondering what your thoughts were about which approach would be preferable?

Discussed in the community call and leaving the blocks uncompacted is okay.

jeromeinsf · 2021-09-03T19:55:52Z

@pracucci and/or @bboreham how can we help with the review of this?

bboreham

The code is long and I didn't read through every line. Broadly it looks ok.
I did wonder why the word "thanos" shows up so often - if the code is copied from Thanos it should say so, and if not can you just explain your thinking to me?

CHANGELOG.md

bboreham · 2021-09-15T14:28:29Z

pkg/compactor/shuffle_sharding_grouper.go

+		garbageCollectedBlocks:   garbageCollectedBlocks,
+		hashFunc:                 hashFunc,
+		compactions: promauto.With(reg).NewCounterVec(prometheus.CounterOpts{
+			Name: "thanos_compact_group_compactions_total",


Do we want to add new metrics in Cortex starting "thanos_"?

Added a note where the metrics were copied from in Thanos. With these changes wouldn't the metrics in Cortex remain the same? They are only used when creating a new group using compact.NewGroup which is what is being done now (https://github.com/cortexproject/cortex/blob/master/vendor/github.com/thanos-io/thanos/pkg/compact/compact.go#L262-L312)

Then it might be better to expose those metrics as another function in Thanos?

thejosephstevens · 2021-11-04T02:59:20Z

Any way I can help with this PR? We're running into limits in our compaction (we have about 25M active time series in a single-tenant cortex). I'd be happy to run pre-release compactor builds if this needs some kind of validation.

alvinlin123 · 2022-01-14T01:03:27Z

error message from build is

go: updates to go.mod needed, disabled by -mod=vendor
	(Go version in go.mod is at least 1.14 and vendor directory exists.)
	to update it:
	go mod tidy
make: *** [Makefile:164: cmd/blocksconvert/blocksconvert] Error 1

Since this looks like a useful PR that we want to merge, but I don't won the original branch, I will create a new branch to work on resolving the error.

Signed-off-by: Albert <ac1214@users.noreply.github.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

…ortexproject#4262) * add MaxRetries to WaitInstanceState Signed-off-by: Albert <ac1214@users.noreply.github.com> * update CHANGELOG.md Signed-off-by: Albert <ac1214@users.noreply.github.com> * Add timeout for waiting on compactor to become ACTIVE in the ring. Signed-off-by: Albert <ac1214@users.noreply.github.com> * add MaxRetries variable back to WaitInstanceState Signed-off-by: Albert <ac1214@users.noreply.github.com> * Fix linting issues Signed-off-by: Albert <ac1214@users.noreply.github.com> * Remove duplicate entry from changelog Signed-off-by: Albert <ac1214@users.noreply.github.com> * Address PR comments and set timeout to be configurable Signed-off-by: Albert <ac1214@users.noreply.github.com> * Address PR comments and fix tests Signed-off-by: Albert <ac1214@users.noreply.github.com> * Update unit tests Signed-off-by: Albert <ac1214@users.noreply.github.com> * Update changelog and fix linting Signed-off-by: Albert <ac1214@users.noreply.github.com> * Fixed CHANGELOG entry order Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Albert <ac1214@users.noreply.github.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

* MergeIterator: allocate less memory at first We were allocating 24x the number of streams of batches, where each batch holds up to 12 samples. By allowing `c.batches` to reallocate when needed, we avoid the need to pre-allocate enough memory for all possible scenarios. * chunk_test: fix innacurate end time on chunks The `through` time is supposed to be the last time in the chunk, and having it one step higher was throwing off other tests and benchmarks. * MergeIterator benchmark: add more realistic sizes At 15-second scrape intervals a chunk covers 30 minutes, so 1,000 chunks is about three weeks, a highly un-representative test. Instant queries, such as those done by the ruler, will only fetch one chunk from each ingester. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

* Expose default configuration values for memberlist. Set the defaults for various memberlist configuration values based on the "Default LAN" configuration. The only result of this change is that the defaults are now visible and are in the documentation. This also means that if the default values change, then the changes are visible in the documentation, where as before they would have gone unnoticed. To prevent this being a breaking change, the existing behaviour is retained, in case anyone is explicitly setting the values to zero and expecting the default to be used. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> * Remove use of zero value as default value indicator. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> * Review comments. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> * Review comments. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

cortexproject#4342) * Allow setting ring heartbeat timeout to zero to disable timeout check. This change allows the various ring heartbeat timeouts to be configured with zero, as a means of disabling the timeout. This is expected to be used with a separate enhancement to allow disabling heartbeats. When the heartbeat timeout is disabled, instances will always appear as healthy in the ring. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> * Review comments. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

…time. (cortexproject#4317) * Add a new config and metric for reporting ruler query execution wall time. Signed-off-by: Tyler Reid <tyler.reid@grafana.com> * Spacing and PR number fixup Signed-off-by: Tyler Reid <tyler.reid@grafana.com> * Wrap the defer in a function to make it defer after the return rather than after the if block. Add a unit test to validate we're tracking time correctly. Signed-off-by: Tyler Reid <tyler.reid@grafana.com> * Use seconds for our duration rather than nanoseconds Signed-off-by: Tyler Reid <tyler.reid@grafana.com> * Review comment fixes Signed-off-by: Tyler Reid <tyler.reid@grafana.com> * Update config flag in the config docs Signed-off-by: Tyler Reid <tyler.reid@grafana.com> * Pass counter rather than counter vector for metrics query function Signed-off-by: Tyler Reid <tyler.reid@grafana.com> * Fix comment in MetricsQueryFunction Signed-off-by: Tyler Reid <tyler.reid@grafana.com> * Move query metric and log to separate function. Add log message for ruler query time. Signed-off-by: Tyler Reid <tyler.reid@grafana.com> * Update config file and change log to show this a per user metric Signed-off-by: Tyler Reid <tyler.reid@grafana.com> * code review fixes Signed-off-by: Tyler Reid <tyler.reid@grafana.com> * update log message for ruler query metrics Signed-off-by: Tyler Reid <tyler.reid@grafana.com> * Remove append and just use the array for key values in the log messag Signed-off-by: Tyler Reid <tyler.reid@grafana.com> * Add query-frontend component to front end log message Signed-off-by: Tyler Reid <tyler.reid@grafana.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

I thought it would be good to put a security page into the docs, so that it shows up in a search. Content is just pointing at other resources. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

…xproject#4345) * Optimise memberlist kv store access by storing data unencoded. The following profile data was taken from running 50 idle ingesters with memberlist, with almost everything at default values (5s heartbeats): ``` 52.16% mergeBytesValueForKey +- 52.16% mergeValueForKey +- 47.84% computeNewValue +- 27.24% codec Proto Decode +- 26.25% mergeWithTime ``` It is apparent from the this that a lot of time is spent on the memberlist receive path, as might be expected, specifically, the merging of the update into the current state. The cost however is not in decoding the incoming states (occurs in `mergeBytesValueForKey` before `mergeValueForKey`), but in fact decoding _current state_ of the value in the store (as it is stored encoded). The ring state was measured at 123K (50 ingesters), so it makes sense that decoding could be costly. This can be avoided by storing the value in it's decoded `Mergeable` form. When doing this, care has to be taken to deep copy the value when accessed, as it is modified in place before being updated in the store, and accessed outside the store mutex. Note a side effect of this change is that is no longer straightforward to expose the `memberlist_kv_store_value_bytes` metric, as this reported the size of the encoded data, therefore it has been removed. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> * Typo. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> * Review comments. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

…o. (cortexproject#4344) * Allow disabling of ring heartbeats by setting relevant options to zero. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> * Review comments. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

…#4346) * Expose configuration of memberlist packet compression. Allows manually specifying whether memberlist should compress packets via a new configuration flag: `-memberlist.enable-compression`. This typically has little benefit for Cortex, as the ring state messages are already compressed with Snappy, the second layer of compression does not achieve any additional saving. It's not clear cut whether there might still be some benefit for internal memberlist messages; this needs to be evaluated in a environment of some reasonable scale. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> * Review comments. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> * Review comments. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

…exproject#4348) It was only waiting one second for the second sync to complete, which is probably too harsh a deadline than necessary for overloaded systems. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

…xproject#4349) The test is writing a single silence and checking a metric which indicates whether replicating the silence has been attempted yet. This is so we can check later on that no replication activity occurs. The assertions later on in the test are passing, but the first one is not, indicating that the replication doesn't trigger early enough. This makes sense because the replication is not synchronous with the writing of the silence. Signed-off-by: Steve Simpson <steve.simpson@grafana.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

) * Add proposal document Signed-off-by: Gofman <ilang@147dda11e800.ant.amazon.com> Signed-off-by: ilangofman <igofman99@gmail.com> * Minor text modifications Signed-off-by: ilangofman <igofman99@gmail.com> * Implement requested changes to the proposal Signed-off-by: ilangofman <igofman99@gmail.com> * Fix mention of Compactor instead of purger in proposal Signed-off-by: ilangofman <igofman99@gmail.com> * Fixed wording and spelling in proposal Signed-off-by: ilangofman <igofman99@gmail.com> * Update the cache invalidation method Signed-off-by: ilangofman <igofman99@gmail.com> * Fix wording on cache invalidation section Signed-off-by: ilangofman <igofman99@gmail.com> * Minor wording additions Signed-off-by: ilangofman <igofman99@gmail.com> * Remove white-noise from text Signed-off-by: ilangofman <igofman99@gmail.com> * Remove the deleting state and change cache invalidation Signed-off-by: ilangofman <igofman99@gmail.com> * Add deleted state and update cache invalidation Signed-off-by: ilangofman <igofman99@gmail.com> * Add one word to clear things up Signed-off-by: ilangofman <igofman99@gmail.com> * update api limits section Signed-off-by: ilangofman <igofman99@gmail.com> * ran clean white noise Signed-off-by: ilangofman <igofman99@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Signed-off-by: Albert <ac1214@users.noreply.github.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Conventionally the minimum time would be before the maximum. Apparently none of the tests were depending on this. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

We need to add the merged value back to the map. Extract merging as a separate function so it can be tested. Adapt the existing test to cover multiple series. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Rearrange `CHANGELOG.md` to conform to instructions in `pull_request_template.md`. Also add a `-` to a CLI flag to conform to instructions in `design-patterns-and-conventions.md`. Signed-off-by: Andrew Seigner <andrew@sig.gy> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

* Introduce `http` config settings in Azure storage Cortex v1.11.0 included thanos-io/thanos#3970, which added configuration options to Azure's http client and transport, replacing usage of `http.DefaultClient`. Unfortunately since Cortex was not setting this config, Cortex implicitly switched from `http.DefaultClient` to all empty values (e.g. `MaxIdleConns: 0` rather than 100). Introduce `http` config settings to Azure storage. This motivated moving `s3.HTTPConfig` into a new `pkg/storage/bucket/config` package, to allow `azure` and `s3` to share it. Also update the instructions for running the website to include installing `embedmd`. Signed-off-by: Andrew Seigner <andrew@sig.gy> * feedback: `config.HTTP` -> `http.Config` also back out changelog cleanup Signed-off-by: Andrew Seigner <andrew@sig.gy> * Back out accidental changelog addition Signed-off-by: Andrew Seigner <andrew@sig.gy> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

* Update Thanos to latest main Update Thanos dependency to include thanos-io/thanos#4928, to conserve memory. Signed-off-by: Andrew Seigner <andrew@sig.gy> * Update changelog to summarize user-facing changes Signed-off-by: Andrew Seigner <andrew@sig.gy> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

* Adding test case for dropping metrics by name to understand better flow of distributor Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> * Adding test case and new metric for dropped samples Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> * Updating CHANGELOG with new changes Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> * Fixing linting problem on distributor file Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> * Reusing discarded samples metric from validate package Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> * Compare labelset with len() instead of comparing to nil Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> * Undoing unnecessary changes on tests and distributor Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> * Small rename on comment Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> * Fixing linting offenses Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> * Reseting validation dropped samples metric to avoid getting metrics from other test runs Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> * Resolving problems after rebase conflicts Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> * Registering counter for dropped metrics in test Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> * Checking if user label drop configuration did not drop __name__ label Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> * Do not check for name label, adding new test Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

* Disable block deletion marks migration by default Flag is named `-compactor.block-deletion-marks-migration-enabled`. This feature was added in v1.7, so we expect most users to have upgraded by now. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

…ct#4602) * Upgrade Go to 1.17.5 for integration tests Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> * Upgrade to Go 1.17 in Dockerfiles Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

* Update build image. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <pstibrany@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

This reverts commit f2656f8. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

…#4440)" (cortexproject#4613) This reverts commit a635a1e. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

* Federated ruler proposal Signed-off-by: Rees Dooley <rees.dooley@shopify.com> Co-authored-by: Rees Dooley <rdooley@Reess-MacBook-Pro.local> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

) This reverts commit 19f3802. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

…exproject#4614) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

…er (cortexproject#4615) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

…t#4617) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

)" (cortexproject#4611) This reverts commit 32b1b40. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

…project#4619) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Move the change log line to unreleased section Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Signed-off-by: Alvin Lin <alvinlin@amazon.com>

alvinlin123 · 2022-01-14T22:21:56Z

Please see #4624 instead.

pull-request-size bot added the size/XL label Jul 10, 2021

pull-request-size bot added size/XXL and removed size/XL labels Jul 13, 2021

ac1214 mentioned this pull request Jul 23, 2021

Add shuffle sharding for compactor ac1214/cortex#4

Open

3 tasks

ac1214 mentioned this pull request Aug 18, 2021

Add metrics for shuffle sharding #4432

Closed

3 tasks

bboreham approved these changes Sep 15, 2021

View reviewed changes

alvinlin123 approved these changes Jan 14, 2022

View reviewed changes

alvinlin123 mentioned this pull request Jan 14, 2022

Add shuffle sharding grouper/planner (Clone of PR 4357) #4621

Closed

3 tasks

ac1214 and others added 16 commits January 14, 2022 13:56

add shuffle sharding grouper/planner

28cf3b6

Signed-off-by: Albert <ac1214@users.noreply.github.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

update CHANGELOG.md

366536a

Signed-off-by: Albert <ac1214@users.noreply.github.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Add a security doc (cortexproject#4337)

9ceb7d6

I thought it would be good to put a security page into the docs, so that it shows up in a search. Content is just pointing at other resources. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

update changelog

b0365b7

Signed-off-by: Albert <ac1214@users.noreply.github.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Add unit tests for shuffle sharding planner

205ed16

Signed-off-by: Albert <ac1214@users.noreply.github.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

bboreham and others added 26 commits January 14, 2022 14:01

distributor-queryable tests: make time go forward (cortexproject#4561)

f12fe5f

Conventionally the minimum time would be before the maximum. Apparently none of the tests were depending on this. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Add a note about remote read in HA Pair handling (cortexproject#4500)

92a1358

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Rebuild and update build image from master. (cortexproject#4604)

03b423f

* Update build image. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <pstibrany@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Upgrade to dskit@01ce9286d7d5 (cortexproject#4601)

77e5a2b

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Revert "Migrate to dskit/ring (cortexproject#4539)" (cortexproject#4606)

2199af9

This reverts commit f2656f8. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Revert "Update cortex to use runtime config from dskit (cortexproject…

dac80e5

…#4440)" (cortexproject#4613) This reverts commit a635a1e. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Federated ruler proposal (cortexproject#4477)

3cfbb16

* Federated ruler proposal Signed-off-by: Rees Dooley <rees.dooley@shopify.com> Co-authored-by: Rees Dooley <rdooley@Reess-MacBook-Pro.local> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Revert "Chore: Use dskit/grpc* (cortexproject#4523)" (cortexproject#4612

aad9e9f

) This reverts commit 19f3802. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Reintroduce pkg/util/concurrency, in place of dskit/concurrency (cort…

6c86e9e

…exproject#4614) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Reintroduce pkg/util/limiter/rate_limiter.go, in place of dskit/limit…

d1edf7a

…er (cortexproject#4615) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Reintroduce pkg/util/modules, in place of dskit/modules (cortexprojec…

c95da7b

…t#4617) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Revert "Use kv package from github.com/grafana/dskit (cortexproject#4436

1d7281f

)" (cortexproject#4611) This reverts commit 32b1b40. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Reintroduce pkg/util/middleware, in place of dskit/middleware (cortex…

b887d1a

…project#4619) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Reintroduce pkg/util/test, in place of dskit/test (cortexproject#4618)

624863f

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Update CHANGELOG.md

3760f2b

Move the change log line to unreleased section Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Add missing parameter to compact.NewGroup

e36c332

Signed-off-by: Alvin Lin <alvinlin@amazon.com>

Add missing parameter when creating shuffle sharding grouper

82d1c12

Signed-off-by: Alvin Lin <alvinlin@amazon.com>

add missing argument

e458cbd

Signed-off-by: Alvin Lin <alvinlin@amazon.com>

fix up changelog

8e78a51

Signed-off-by: Alvin Lin <alvinlin@amazon.com>

alvinlin123 force-pushed the shuffle-sharding-compactor branch from db0595b to 8e78a51 Compare January 14, 2022 22:03

alvinlin123 closed this Jan 14, 2022

alvinlin123 mentioned this pull request Jan 14, 2022

Add shuffle sharding grouper/planner (Clone of PR 4357) #4624

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add shuffle sharding grouper/planner #4357

Add shuffle sharding grouper/planner #4357

Uh oh!

ac1214 commented Jul 10, 2021 •

edited

Loading

Uh oh!

ac1214 commented Jul 10, 2021 •

edited

Loading

Uh oh!

ac1214 commented Jul 14, 2021

Uh oh!

ac1214 commented Jul 15, 2021

Uh oh!

jeromeinsf commented Sep 3, 2021

Uh oh!

bboreham left a comment

Uh oh!

Uh oh!

bboreham Sep 15, 2021

Uh oh!

ac1214 Sep 23, 2021 •

edited

Loading

Uh oh!

yeya24 Nov 6, 2021

Uh oh!

thejosephstevens commented Nov 4, 2021

Uh oh!

alvinlin123 commented Jan 14, 2022 •

edited

Loading

Uh oh!

alvinlin123 commented Jan 14, 2022 •

edited

Loading

Uh oh!

Uh oh!

Add shuffle sharding grouper/planner #4357

Add shuffle sharding grouper/planner #4357

Uh oh!

Conversation

ac1214 commented Jul 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ac1214 commented Jul 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ac1214 commented Jul 14, 2021

Uh oh!

ac1214 commented Jul 15, 2021

Uh oh!

jeromeinsf commented Sep 3, 2021

Uh oh!

bboreham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bboreham Sep 15, 2021

Choose a reason for hiding this comment

Uh oh!

ac1214 Sep 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yeya24 Nov 6, 2021

Choose a reason for hiding this comment

Uh oh!

thejosephstevens commented Nov 4, 2021

Uh oh!

alvinlin123 commented Jan 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alvinlin123 commented Jan 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ac1214 commented Jul 10, 2021 •

edited

Loading

ac1214 commented Jul 10, 2021 •

edited

Loading

ac1214 Sep 23, 2021 •

edited

Loading

alvinlin123 commented Jan 14, 2022 •

edited

Loading

alvinlin123 commented Jan 14, 2022 •

edited

Loading