-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance MimirRequestLatency runbook with more advice #1967
Conversation
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
@@ -213,6 +217,11 @@ How to **investigate**: | |||
- Check `Memcached Overview` dashboard | |||
- If memcached eviction rate is high, then you should scale up memcached replicas. Check the recommendations by `Mimir / Scaling` dashboard and make reasonable adjustments as necessary. | |||
- If memcached eviction rate is zero or very low, then it may be caused by "first time" queries | |||
- Cache query timeouts | |||
- Check store gateway logs and look for warnings about timed out Memcached queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please spell it "store-gateway"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show an example of log message to grep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pracucci I added an example similar to what I used myself. It's pretty ad-hoc, but does the job.
…equest-latency-runbook
…equest-latency-runbook
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
- Consider scaling up number of queriers if they're not auto-scaled; if auto-scaled, check auto-scaling parameters | ||
- If queries are not waiting in queue due to busy queriers | ||
- Consider enabling query sharding if not already enabled, to increase query parallelism | ||
- If query sharding already enabled, consider increasing total number of query shards (`query_sharding_total_shards`) for tenants submitting slow queries, so their queries can be further parallelized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to recall that tuning the number of shards isn't exactly as straightforward as it seems. Is there an existing doc we could link people to that describes how to pick a number of shards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All doc we have is at docs/sources/operators-guide/architecture/query-sharding/index.md
. I think the main feedback here is to just increase it, and see if it improve things. We could be more specific and say consider doubling the query shards and check if reduce high-cardinality query latency: if it doesn't, then rollback.
- Cache query timeouts | ||
- Check store-gateway logs and look for warnings about timed out Memcached queries | ||
- If there are indeed a lot of timed out Memcached queries, consider whether the store-gateway Memcached timeout setting (`-blocks-storage.bucket-store.chunks-cache.memcached.timeout`) is sufficient | ||
- If queries are waiting in queue due to busy queriers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not saying how to check it. The Mimir / Queries
dashboard has panels named "Queue length". Goal is to have that queue length 0 (except few sporadic spikes). If that queue length is > 0 for some time, then we need to scale up queriers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
- Consider scaling up number of queriers if they're not auto-scaled; if auto-scaled, check auto-scaling parameters | ||
- If queries are not waiting in queue due to busy queriers | ||
- Consider enabling query sharding if not already enabled, to increase query parallelism | ||
- If query sharding already enabled, consider increasing total number of query shards (`query_sharding_total_shards`) for tenants submitting slow queries, so their queries can be further parallelized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All doc we have is at docs/sources/operators-guide/architecture/query-sharding/index.md
. I think the main feedback here is to just increase it, and see if it improve things. We could be more specific and say consider doubling the query shards and check if reduce high-cardinality query latency: if it doesn't, then rollback.
Co-authored-by: Marco Pracucci <marco@pracucci.com>
We want to pull in the indexheader package from Thanos so that we can add some experimental alternative implementations of BinaryReader. In order to also pull in the unit tests for this package, we need the replacements for e2eutil.Copy and e2eutil.CreateBlock. This change does two things: 1. Copy in e2eutil/copy.go and fix it up accordingly. 2. Move CreateBlock into a package to avoid circular imports.
* make propagation of forwarding errors optional Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * add test for disabled error propagation Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * leave error propagation enabled by default Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * update help Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * update docs * better wording Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Use the common workflow from the helm-chart repo. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Copy thanos/pkg/block/indexheader. * Update provenance. * Fix linter error due to error variable name. * Use require instead of e2eutil. * Replace usage of e2eutil.Copy * Replace usage of e2eutil.CreateBlock with local version. * Replace use of Thanos indexheader with local copy. * Add faillint check for upstream indexheader. * Fix goleak ignore for NewReaderPool. * Update vendor directory.
* Rename chart back to mimir-distributed Apparently the helm option --devel is needed to trigger using beta versions. This should be enough protection for accidental use. Avoids renaming issues. * Version bump helm chart Do version bump to a beta version but nothing else until we double check that such beta chart cannot be accidentally selected with helm tooling. * Enable helm chart release from main branch Release process tested ok on test branch. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Test if helm release triggers correctly. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
…equest-latency-runbook
…equest-latency-runbook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! A couple of nits before merging. Thanks!
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Mistakenly left two lines when updating the provenance for the file.
…equest-latency-runbook
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Enhance MimirRequestLatency runbook with more advice Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>
* Upgrade alpine to 3.16.0 * Enhance MimirRequestLatency runbook with more advice (#1967) * Enhance MimirRequestLatency runbook with more advice Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Include helm-docs in build and CI (#2026) * Update the mimir build image and its build doc Dockerfile: Add helm-docs package to the image. how-to: Write down the requirements for build in more detail. Add information about build on linux. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Expand make doc with helm-docs command This enables generating the helm chart README with the same make doc command as all other documentation. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Update docs/internal/how-to-update-the-build-image.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update contributing guides for the helm chart (#2008) * Update contributing guides for the helm chart Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Turn off helm version increment check in CI This enables periodic releases, as opposed to requiring version bump for release at every PR. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Add extraEnvFrom to all services and enable injection into mimir config (#2017) Add `extraEnvFrom` capability to all Mimir services to enable injecting secrets via environment variables. Enable `-config.exand-env=true` option in all Mimir services to be able to take secrets/settings from the environment and inject them into the Mimir configuration file. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Docs: fix mimir-mixin installation instructions (#2015) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Docs: make documentation a first class citizen in CHANGELOG (#2025) Signed-off-by: Marco Pracucci <marco@pracucci.com> * upgrade to alpine 3.16.0 * upgrade alpine to 3.16.0 Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com> Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
* Extend Makefile and Dockerfiles to support multiarch builds for all Go binaries. (#1759) * Extend Dockerfiles to support multiarch builds for all Go binaries. By calling any of make push-multiarch-./cmd/metaconvert/.uptodate make push-multiarch-./cmd/mimir/.uptodate make push-multiarch-./cmd/query-tee/.uptodate make push-multiarch-./cmd/mimir-continuous-test/.uptodate make push-multiarch-./cmd/mimirtool/.uptodate make push-multiarch-./operations/mimir-rules-action/.uptodate Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Update to latest dskit and memberlist fork (#1758) * Update to latest dskit and memberlist fork Fixes #1743 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Update changelog Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * update cli parameter description (#1760) Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * mimirtool config: Add more retained old defaults (#1762) * mimirtool config: Add more retained old defaults The following parameters have their old defaults retained even when `--update-defaults` is used with `mimirtool config covert`: * `activity_tracker.filepath` * `alertmanager.data_dir` * `blocks_storage.filesystem.dir` * `compactor.data_dir` * `ruler.rule_path` * `ruler_storage.filesystem.dir` * `graphite.querier.schemas.backend` (only in GEM) These are filepaths for which the new defaults don't make more sense than the old ones. In fact updating these can lead to subpar migration experience because components start using directories that don't exist. Because activity_tracker.filepath changed its name since cortex the tests needed to allow for differentiating old common options and new ones. This is something that was already there for GEM and was added for cortex/mimir too. Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * dashboards: add flag to skip gateway (#1761) * dashboards: add flag to skip gateway The gateway component seems to be an enterprise component, so groups that aren't running enterprise shouldn't need the empty panels and rows in their dashboards. This patch adds a flag to drop gateway-related widgets from the mixin dashboards. Signed-off-by: Josh Carp <jm.carp@gmail.com> * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Gracefully shutdown querier when using query-scheduler (#1756) * Gracefully shutdown querier when using query-scheduler Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed comment Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added TestQueuesOnTerminatingQuerier Signed-off-by: Marco Pracucci <marco@pracucci.com> * Commented executionContext Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update pkg/querier/worker/util.go Co-authored-by: Peter Štibraný <pstibrany@gmail.com> * Fixed typo in suggestion Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed superfluous time sensitive assertion Signed-off-by: Marco Pracucci <marco@pracucci.com> * Commented newExecutionContext() Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> * Graceful shutdown querier without query-scheduler (#1767) * Graceful shutdown querier with not using query-scheduler Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Improved comment Signed-off-by: Marco Pracucci <marco@pracucci.com> * Refactoring Signed-off-by: Marco Pracucci <marco@pracucci.com> * Increase continuous test query timeout (#1777) * Increase mimir-continuous-test query timeout from 30s to 60 Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added PR number to CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Increased default -tests.run-interval from 1m to 5m (#1778) * Increased default -tests.run-interval from 1m to 5m Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added PR number to CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix flaky tests on querier graceful shutdown (#1779) * Fix flaky tests on querier graceful shutdown Signed-off-by: Marco Pracucci <marco@pracucci.com> * Remove spurious newline Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update build image and GitHub workflow (#1781) * Update build-image to use golang:1.17.8-bullseye, and add skopeo to build image. Skopeo will be used in subsequent PR to push multiarch images. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Update build image. Use ubuntu-latest for workflow steps. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * api: remote duplicated remote read querier handler (#1776) * Publish multiarch images (#1772) * Publish multiarch images. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Tag with extra tag, if pushing tagged commit or release. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Split building of docker images and archiving them into tar. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * When tagging with test, use --all. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Only run deploy step on tags or weekly release branches. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Don't tag with test anymore. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Address review feedback. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Fix license check. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * K6: Take into account HTTP status code 202 (#1787) When using `K6_HA_REPLICAS > 1`, Mimir will accept all HTTP calls but a part of those call will receive a status code `202`. The following commit makes this status code as expected otherwise user receive the following error: ``` reads_inat write (file:///.../mimir-k6/load-testing-with-k6.js:254:8(137)) reads_inat native executor=ramping-arrival-rate scenario=writing_metrics source=stacktrace ERRO[0015] GoError: ERR: write failed. Status: 202. Body: replicas did not mach, rejecting sample: replica=replica_1, elected=replica_0 ``` At the end of the benchmark summary display errors: ``` ✗ write worked ↳ 20% — ✓ 23 / ✗ 92 ``` Example of load testing: ```shell ./k6 run load-testing-with-k6.js \ -e K6_SCHEME="https" \ -e K6_WRITE_HOSTNAME="${mimir}" \ -e K6_READ_HOSTNAME="${mimir}" \ -e K6_USERNAME="${user}" \ -e K6_WRITE_TOKEN="${password}" \ -e K6_READ_TOKEN="${password}" \ -e K6_HA_CLUSTERS="1" \ -e K6_HA_REPLICAS="3" \ -e K6_DURATION_MIN="5" ``` Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com> * replace model.Metric with labels.Labels in distributor.MetricsForLabelMatchers() (#1788) * Streaming remote read (#1735) * implement read v2 * updated CHANGELOG.md * extend maxBytesInFram comment. * addressed PR feedback * addressed PR feedback * addressed PR feedback * use indexed xor chunk function to assert stream remote read tests * updated CHANGELOG.md Co-authored-by: Miguel Ángel Ortuño <miguel.ortuno@grafana.com> * Upgrade dskit (#1791) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix mimir-continuous-test when changing configured num-series (#1775) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Do not export per user and integration Alertmanager metrics when value is 0 (#1783) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Print version+arch of Mimir loaded to Docker. (#1793) * Print version+arch of Mimir loaded to Docker. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Use debug log for distributor. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Remove unused metrics cortex_distributor_ingester_queries_total and cortex_distributor_ingester_query_failures_total (#1797) * Remove unused metrics cortex_distributor_ingester_queries_total and cortex_distributor_ingester_query_failures_total Signed-off-by: Marco Pracucci <marco@pracucci.com> * Remove unused fields Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added options support to SendSumOfCountersPerUser() (#1794) * Added options support to SendSumOfCountersPerUser() Signed-off-by: Marco Pracucci <marco@pracucci.com> * Renamed SkipZeroValueMetrics() to WithSkipZeroValueMetrics() Signed-off-by: Marco Pracucci <marco@pracucci.com> * Changed all Grafana dashboards UIDs to not conflict with Cortex ones, to let people install both while migrating from Cortex to Mimir (#1801) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Adopt mixin convention to set dashboard UIDs based on md5(filename) (#1808) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add support for store_gateway_zone args (#1807) Allow customizing mimir cli flags per zone for the store gateway. Copied the same solution as we have for ingesters. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Add protection to store-gateway to not drop all blocks if unhealthy in the ring (#1806) * Add protection to store-gateway to not drop all blocks if unhealthy in the ring Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update CHANGELOG.md Co-authored-by: Peter Štibraný <pstibrany@gmail.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> * Removed cortex_distributor_ingester_appends_total and cortex_distributor_ingester_append_failures_total unused metrics (#1799) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Remove unused clientConfig from ingester (#1814) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add tracing to `mimir-continuous-test` (#1795) * Extract and test TracerTransport functionality We need to use a TracerTransport in mimir-continous-test. We have that in the frontend package, but I don't want to import frontend from the mimir-continous-test, so we extract it to util/instrumentation. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Set up global tracer in mimir-continuous-test Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Add tracing to the client and spans to the tests Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Add jaeger-mixin to mimir-continuous test container Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * make license Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Add traces to the write path Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Chore: remove unused code from BucketStore (#1816) * Removed unused Info() and advLabelSets from BucketStore Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed unused FilterConfig from BucketStore Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed unused relabelConfig from store-gateway tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed unused function expectedTouchedBlockOps() Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed unused recorder from BucketStore tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * go mod vendor Signed-off-by: Marco Pracucci <marco@pracucci.com> * Refactoring: force removal of all blocks when BucketStore is closed (#1817) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Simplify FilterUsers() logic in store-gateway (#1819) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Migrate admin CSS to bootstrap 5 (#1821) * Migrate admin CSS to bootstrap 5 When I added bootstrap, for some reason I imported bootstrap 3 which was originally launched in 2013. Before adding more CSS styles, let's migrate to modern Bootstrap 5 launched in 2021. This doesn't require an explicit jquery dependency anymore. Also re-styled admin header to adapt properly to mobile devices screens. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * ruler: make use of dskit `grpcclient.Config` on remote evaluation client (#1818) * ruler: use dskit grpc client for remote evaluation * addressed PR feedback * Memberlist status page CSS (#1824) * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update dskit to 4d7238067788a04f3dd921400dcf7a7657116907 This includes changes from https://github.com/grafana/dskit/pull/163 Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Custom memberlist status template Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Include `import` in jsonnet snippets (#1826) * Do not drop blocks in the store-gateway if missing in the ring (#1823) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Upgraded dskit to fix temporary partial query results when shuffle sharding is enabled and hash ring backend storage is flushed / reset (#1829) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Docs: ruler remote evaluation (#1714) * include documentation for remote rule evaluation * Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * address PR feedback * Update docs/sources/operators-guide/architecture/components/ruler/index.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update docs/sources/operators-guide/architecture/components/ruler/index.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update docs/sources/operators-guide/architecture/components/ruler/index.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update docs/sources/operators-guide/architecture/components/ruler/index.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update docs/sources/operators-guide/architecture/components/ruler/index.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * addressed PR feedback * addressed PR feedback * Update docs/sources/operators-guide/architecture/components/ruler/index.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update docs/sources/operators-guide/running-production-environment/planning-capacity.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update docs/sources/operators-guide/running-production-environment/planning-capacity.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * addressed PR feedback Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Alertmanager: Do not validate alertmanager configuration if it's not running. (#1835) Allows other targets to start up even if an invalid alertmanager configuration is passed in. Fixes #1784 * Alertmanager: Allow usage with `local` storage type, with appropriate warnings. (#1836) An oversight when we removed non-sharding modes of operation is that the `local` storage type stopped working. Unfortunately it is not conceptually simple to support this type fully, as alertmanager requires remote storage shared between all replicas, to support recovering tenant state to an arbitrary replica following an all-replica outage. To support provisioning of alerts with `local` storage, but persisting of state to remote storage, we would need to allow different storage configurations. This change fixes the issue in a more naive way, so that the alertmanager can at least be started up for testing or development purposes, but persisting state will always fail. A second PR will propose allowing the `Persister` to be disabled. Although this configuration is not recommended for production used, as long as the number of replicas is equal to the replication factor, then tenants will never move between replicas, and so the local snapshot behaviour of the upstream alertmanager will be sufficient. Fixes #1638 * Mixin: Additions to Top tenants dashboard regarding sample rate and discard rate. (#1842) Adds the following rows to the "Top tenants" dashboard: - By samples rate growth - By discarded samples rate - By discarded samples rate growth These queries are useful for determining what tenants are potentially putting excess load on distributors and ingesters (and if it increased recently). * Use concurrent open/close operations in compactor unit tests (#1844) Open and close files concurrently in compactor unit tests to expose bugs that implicitly rely on ordering. Exposes bugs such as https://github.com/prometheus/prometheus/pull/10108 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Mixin: Show ingestion rate limit and rule group limit on Tenants dashboard. (#1845) Whilst diagnosing a recent issue, we thought it would be useful to show the current ingestion rate limit for the tenant. As the limit is applied to `cortex_distributor_received_samples_total`, the limit is shown on the panel which displays this metric. ("Distributor samples received (accepted) rate"). Also added `ruler_max_rule_groups_per_tenant` while in the area. We don't currently display the number of exemplars in storage on the dashboard anywhere, so cannot add `max_global_exemplars_per_user` right now. * Jsonnet: Preparatory refactoring to simplify deploying parallel query paths. (#1846) This change extracts some of the jsonnet used to build query deployments (querier, query-scheduler, query-frontend) such that it is easier to deploy secondary query paths. The use case for this is primarily to develop a query path deployment for ruler remote-evaluation, but there may be other use cases too. * Removed double space in Log (#1849) * Reference 'monolithic mode' instead of 'single binary' in logs (#1847) Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Extend safeTemplateFilepath to cover more cases. (#1833) * Extend safeTemplateFilepath to cover more cases. - template name ../tmpfile, stored into /tmp dir - empty template name - template name being just "." Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Relax mimir-continuous-test pressure when deployed with Jsonnet (#1853) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add 2.1.0-rc.0 header (#1857) * Prepare release 2.1 (#1859) * Update VERSION to 2.1-rc.0 * Add relevant changelog entries for user facing PRs since mimir-2.0.0 * Add patch in semver VERSION * Adding updated ruler diagrams. (#1861) * Create v2-1.md (#1848) * Create v2-1.md * Update and rename v2-1.md to v2.1.md updated the header and renamed the file. * Update v2.1.md Missing the upgrade configurations. * Update v2.1.md added bug description * Update v2.1.md bug fix writeup. * Update v2.1.md Added the series count description * Apply suggestions from code review Co-authored-by: Peter Štibraný <pstibrany@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update v2.1.md * Update v2.1.md updated tsdb isolation wording. * Ran make doc. * Fixed a broken relref. * Update docs/sources/release-notes/v2.1.md Co-authored-by: Peter Štibraný <pstibrany@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Allow custom data source regex in mixin dashboards (#1802) * dashboards: update grafana-builder The following commit update grafana-builder version and brings in: * enable toolip by default (#665) * Add 'Data Source' label for the default datasource template variable. (#672) * add dashboard link func (#683) * make allValue configurable (#703) * Allow datasource's regex to be configured Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com> * Allow custom data source regex in mixin dashboards The current dashboards offer the possibility to select a data source among all prometheus data sources in the organization. Depending on the number of data sources the list could be rather big (>10). Not all data sources host Mimir metrics as such listing them is not helpful for the users. Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com> * Revert back change that was enabling shared tooltips Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Dashboards: Fix `container_memory_usage_bytes:sum` recording rule (#1865) * Dashboards: Fix `container_memory_usage_bytes:sum` recording rule This change causes recording rules that reference `container_memory_usage_bytes` to omit series that do not contain the required labels for rules to run successfully, by requiring a non-empty `image` label. Signed-off-by: Peter Fern <github@0xc0dedbad.com> * Update CHANGELOG Signed-off-by: Peter Fern <github@0xc0dedbad.com> * Add compiled rules Signed-off-by: Peter Fern <github@0xc0dedbad.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Deprecate -distributor.extend-writes and set it always to false (#1856) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Remove DCO from contributors guidelines (#1867) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Create v2-1.md (#1848) * Create v2-1.md * Update and rename v2-1.md to v2.1.md updated the header and renamed the file. * Update v2.1.md Missing the upgrade configurations. * Update v2.1.md added bug description * Update v2.1.md bug fix writeup. * Update v2.1.md Added the series count description * Apply suggestions from code review Co-authored-by: Peter Štibraný <pstibrany@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update v2.1.md * Update v2.1.md updated tsdb isolation wording. * Ran make doc. * Fixed a broken relref. * Update docs/sources/release-notes/v2.1.md Co-authored-by: Peter Štibraný <pstibrany@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Adding updated ruler diagrams. (#1861) * Deprecate -distributor.extend-writes and set it always to false (#1856) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Bump version to 2.1.0-rc.1 to include cherry-picked * List Johanna as 2.1.0 release shepherd (#1871) * fix(mixin): add missing alertmanager hashring members (#1870) * fix(mixin): add missing alertmanager hashring members * docs(CHANGELOG): add changelog entry * Docs: clarify 'Set rule group' API specification (#1869) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Simplify documentation publishing logic (#1820) * Simplify documentation publishing logic Split into two pipelines, one that runs on main and one that runs on release branches and tags. Use `has-matching-release-tag` workflow to determine whether to release documentation on release branch and tags. `has-matching-release-tag` is documented in https://github.com/grafana/grafana-github-actions/blob/main/has-matching-release-tag/action.yaml Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Remove script no longer used for documentation releases Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Add missing clone step for the website-sync action Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Update RELEASE instructions to reflect automated docs publishing Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Remove conditional from website clone for next publishing Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Fix capitalization of Jsonnet and Tanka (#1875) Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Checkout the repository as part of the documentation sync (#1876) * Checkout the repository as part of the documentation sync I assumed this was already done but the GitHub docs confirm that it is required. https://docs.github.com/en/github-ae@latest/actions/using-workflows/about-workflows#about-workflows Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Allow manual triggering of workflow Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Fix manual workflow dispatch (#1877) TIL that if you edit the workflow in the GitHub UI, it will lint your workflow file and make sure that all the keys conform to the schema. * Simplify documentation publishing logic (#1820) * Simplify documentation publishing logic Split into two pipelines, one that runs on main and one that runs on release branches and tags. Use `has-matching-release-tag` workflow to determine whether to release documentation on release branch and tags. `has-matching-release-tag` is documented in https://github.com/grafana/grafana-github-actions/blob/main/has-matching-release-tag/action.yaml Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Remove script no longer used for documentation releases Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Add missing clone step for the website-sync action Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Update RELEASE instructions to reflect automated docs publishing Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Remove conditional from website clone for next publishing Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Checkout the repository as part of the documentation sync (#1876) * Checkout the repository as part of the documentation sync I assumed this was already done but the GitHub docs confirm that it is required. https://docs.github.com/en/github-ae@latest/actions/using-workflows/about-workflows#about-workflows Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Allow manual triggering of workflow Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Fix manual workflow dispatch (#1877) TIL that if you edit the workflow in the GitHub UI, it will lint your workflow file and make sure that all the keys conform to the schema. * Chore: cleanup unused alertmanager config in Mimir jsonnet (#1873) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update mimir-prometheus to ceaa77f1 (#1883) * Update mimir-prometheus to ceaa77f1 This includes the fix https://github.com/grafana/mimir-prometheus/pull/234 for https://github.com/grafana/mimir/issues/1866 Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Fix changelog Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Bump version to 2.1.0-rc.1 to include cherry-picked (#1872) * Increased default configuration for -server.grpc-max-recv-msg-size-bytes and -server.grpc-max-send-msg-size-bytes from 4MB to 100MB (#1884) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Split mimir_queries rule group so that it doesn't have more than 20 rules (#1885) * Split mimir_queries rule group so that it doesn't have more than 20 rules. * Add check for number of rules in the group. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Add alert for store-gateways without blocks (#1882) * Add alert for store-gateways without blocks Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Clarify messages Co-authored-by: Marco Pracucci <marco@pracucci.com> * Replace "Store Gateway" with "store-gateway" Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Rename alert to StoreGatewayNoSyncedTenants Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Rebuild mixin Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Fix flaky integration tests caused by 'metric not found' (#1891) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Docs: Explain the runtime override of active series matchers (#1868) * Updated docs/sources/operators-guide/configuring/configuring-custom-trackers.md; made some tweaks to the examples; changed name interesting-service and also-interesting-service to service1 and service2 respectively Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> Co-authored-by: Jennifer Villa <jen.villa@grafana.com> * Update to latest Thanos for Memcached fixes (#1837) Update our vendor of Thanos to pull in the most recent changes to the Memcached client. In particular, these changes prevent the client from starting many goroutines as part of batching before they are able to make progress. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Fixed deceiving error log "failed to update cached shipped blocks after shipper initialisation" (#1893) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix TestRulerEvaluationDelay flakyness (#1892) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix `MimirRulerMissedEvaluations` text and add playbook (#1895) * Correct magnitude on MimirRulerMissedEvaluations Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add playbook for MimirRulerMissedEvaluations Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Remove trailing spaces Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Conform to tech doc style. (#1904) * Use a dedicated threadpool for store-gateway requests (#1812) Remove the use of a dedicated threadpool for index-header operations because the call overhead is prohibitively expensive. Instead, use a dedicated threadpool for entire store-gateway requests so that the cost of switching between threads is only paid a single time. This allows for isolation in the case of page faults during mmap accesses without too much overhead. Fixes #1804 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Upgrade consideration for active_series_custom_trackers_config (#1897) * Upgrade consideration for active_series_custom_trackers_config * Update docs/sources/release-notes/v2.1.md Co-authored-by: Jennifer Villa <jen.villa@grafana.com> * Update docs/sources/release-notes/v2.1.md Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Jennifer Villa <jen.villa@grafana.com> * Upgrade consideration for active_series_custom_trackers_config (#1897) * Upgrade consideration for active_series_custom_trackers_config * Update docs/sources/release-notes/v2.1.md Co-authored-by: Jennifer Villa <jen.villa@grafana.com> * Update docs/sources/release-notes/v2.1.md Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Jennifer Villa <jen.villa@grafana.com> * fix(mixin): do not trigger TooMuchMemory alerts if no container limits are supplied (#1905) * fix(mixin): do not trigger `MimirAllocatingTooMuchMemory` or `EtcdAllocatingTooMuchMemory` alerts if no container limits are supplied * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Fix MimirCompactorHasNotUploadedBlocks alert false positive when Mimir is deployed in monolithic mode (#1902) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Set defaults to query ingesters, not store, for recent data (#1909) Set queriers to _not_ query storage (store-gateways) for recent data and set the store-gateways to ignore recent uncompacted blocks. Default values are set to match what we use in the Mimir jsonnet. Fixes #1639 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Revert distributor log level to warn in integration tests (#1910) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Improved error returned by -querier.query-store-after validation (#1914) * Improved error returned by -querier.query-store-after validation Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update pkg/querier/querier.go Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Remove jsonnet configuration settings that match default values (#1915) * Remove jsonnet configuration settings that match default values Follow up to #1909 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Docs: recommend fast disks for ingesters and store-gateways (#1903) * Docs: recommend fast disks for ingesters and store-gateways Signed-off-by: Marco Pracucci <marco@pracucci.com> * Apply suggestions from code review Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Update docs/sources/operators-guide/running-production-environment/production-tips/index.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Update docs/sources/operators-guide/running-production-environment/production-tips/index.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Improve series, sample, metadata and exemplars validation errors (#1907) * Improved error messages returned by ValidateSample(), ValidateExemplar(), ValidateMetadata() and ValidateLabels() Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Apply suggestions from code review Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Fixed unit tests after error messages edit Signed-off-by: Marco Pracucci <marco@pracucci.com> * Manually applied a suggestion to error message Signed-off-by: Marco Pracucci <marco@pracucci.com> * Renamed globalerrors pkg to singular form Signed-off-by: Marco Pracucci <marco@pracucci.com> * Cleanup globalerror package based on Oleg's feedback Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed formatting support from globalerror.ID's message generation function Signed-off-by: Marco Pracucci <marco@pracucci.com> * Changed another error message based on feedback Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update operations/mimir-mixin/docs/playbooks.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Rephrased label name/value length error message based on feedback received in the test file Signed-off-by: Marco Pracucci <marco@pracucci.com> * Final fixes to error messages Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * mixin-tool: adapt screenshots dockerimage to support arm64 (#1916) Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * Ingester ring endpoint fix (#1918) * /ingester/ring is also available via distributor. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Revert unintended change. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Configuration files for GrafanaCon 2022 presentation. (#1881) * Configuration files for GrafanaCon 2022 presentation. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Update dskit to bring "Parallelize memberlist notified message processing" PR (#1912) * Update dskit to bring "Parallelize memberlist notified message processing" PR. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Account for StatefulSets and Depl-s named by the helm chart (#1913) Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Change shuffle sharding ingester lookback default config (#1921) * Change shuffle sharding ingester lookback default config Use the same default value for ingester lookback as the "query ingesters within" setting to reduce the number of things that need to be changed from their defaults. This change also removes use of the `-blocks-storage.tsdb.close-idle-tsdb-timeout` flag in jsonnet since the value being used matches the default. Follow up to #1915 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Changelog Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Improved ValidateMetadata() errors (#1919) * Improved ValidateMetadata() errors Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added PR number to CHANGELOG Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update pkg/util/validation/errors.go Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com> * Converted all ValidationError to be non-pointers Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed unused variable Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed unit test Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed markdown linter Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com> * mixin/dashboards: ruler query path dashboards (#1911) * mixin: added ruler query path dashboards Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * docs: added ruler reads & ruler reads resources dashboard screenshots Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * updated CHANGELOD.md Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * Mark query_ingesters_within and query_store_after as advanced (#1929) * Mark query_ingesters_within and query_store_after as advanced Now that they have good defaults that match what we run in production, they shouldn't need to be tuned by users in most cases. Fixes #1924 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Remove empty chunks panel from Queries dashboard (#1928) * Remove empty chunks panel from Queries dashboard Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Make MimirGossipMembersMismatch less sensitive, and make it fire fewer alerts. (#1926) * Make MimirGossipMembersMismatch less sensitive, and make it fire fewer alerts. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Update config value for -querier.query-ingesters-within to work with … (#1930) * Update config value for -querier.query-ingesters-within to work with new default value for -querier.query-store-after * Remove config for -querier.query-ingesters-within as they are set to default * Update Thanos vendor for memcache improvements (#1920) Update our vendor of Thanos so that memcache keys are grouped by the server they are owned by before being split into batches. Fixes #423 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Move usage generation to separate package (#1934) * Move usage function into a separate package and export it Signed-off-by: Patryk Prus <patryk.prus@grafana.com> * Add function to add to flag category overrides at runtime Signed-off-by: Patryk Prus <patryk.prus@grafana.com> * Document CHANGELOG scopes * Add documentation about changelog scopes * update CHANGELOG for #1934 * Improve instance limits, ingester limits, query limiter, some querier errors (#1888) * Add errors IDs to pkg/ingester/instance_limits.go Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add errors IDs to pkg/ingester/limiter.go Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add errors IDs to pkg/querier/blocks_store_queryable.go Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Differentiate max-ingester-ingestion-rate from distributor Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update playbooks.md Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Correct misspelled flags Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Correct strings in tests as well Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Re-iterated on ingesters limit errors Signed-off-by: Marco Pracucci <marco@pracucci.com> * Re-iterated on ingesters per-tenant limit errors Signed-off-by: Marco Pracucci <marco@pracucci.com> * Apply suggestions from code review Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Re-iterated on query per-tenant limit errors Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added PR number to CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Apply suggestions from code review Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Mention the cardinality API endpoint in the err-mimir-max-series-per-metric runbook Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update operations/mimir-mixin/docs/playbooks.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Fixed InstanceLimits receiver name to be consistent Signed-off-by: Marco Pracucci <marco@pracucci.com> * Clarify metadata is stored in memory Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed linter and tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed more tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update pkg/querier/blocks_store_queryable.go Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com> * Fix english grammar about 'how to fix it' Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com> * make ingesters use heartbeat timeout instead of period to fix the bug… (#1933) * make ingesters use heartbeat timeout instead of period to fix the bug where they sometimes appear as unhealthy * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update VERSION to 2.1.0 * Update dashboard screenshots (#1940) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix version in changelog * Update mimir tests to use new 2.1.0 image * Add minimum Grafana version to mixin dashboards (#1943) Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Bump grafana/mimir image to 2.1.0 for backward compatibility testing (#1942) * Chore: renamed source files for remote ruler dashboards (#1937) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Move the mimir-distributed helm chart into the mimir repository (#1925) * Initial copy of mimir-distributed helm chart This commit is not expected to work in CI. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Update github action for helm lint and test Set the working directory for github actions for helm actions. Set more consistent name for github actions. Set chart name for testing. Ignore generated helm doc from prettier. Do not do release for now of helm chart. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Add bucket prefix configuration (#1686) * Add bucket prefix configuration Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add allowed chars validation for storage prefix Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add unit tests for PrefixedBucketClient Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add CHANGELOG entry Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Use grafana/regexp instead of regexp Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Improve validation of storage_prefix Update docs and add validate for .. and . Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add some tests for AM and ruler bucket validaiton Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add tests for bucket prefix with filesystem client Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update helm text too Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update everything Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Simplify validation for storage_prefix Only accept alphanumeric characters for the storage_prefix to prevent mistypings and misunderstandings when the prefix ends with a slash or contains slashes and dots Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Make stronger assertions in bucket validation test Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Make stronger assertions in bucket prefix test Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Assert on errors, not on strings Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Exclude YAML field names from error message Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Include full image tag on rollout dashboard (#1932) * Make version matcher in rollout dashboard work for non-weekly images Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add CHANGELOG.md entry Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * docs: move federated rule groups documentation to its own section (#1906) * docs: move federated rule groups documentation to its own section Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * Make networking panels pod matchers work with helm chart (#1927) * Make networking panels pod matchers work with helm chart The pods created by the helm chart follow a format of <helm_release_name>-mimir-<ingester|distributor|...>. This is a problem for all places that use the per_instance_label for matching. The per_instance_label is mostly used in aggregations (sum by (pod), count by (pod), ...). The networking panels are the only ones that use it for matching. Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Replace .* with a stronger regex in pod matchers Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add CHANGELOG.md entry Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add max query length error to errors catalog (#1939) * Add max query length error to errors catalogue Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added PR number to CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Apply suggestions from code review Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Remove image spec from demo file. (#1946) * Remove image spec from demo file. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Fix rejected identity accept encoding (#1864) * Fix rejected identity accept-encoding When a request comes in with header: Accept-Encoding: gzip;q=1, identity;q=0 we should gzip the response even if it's smaller than the defined minimum size. We achieve this by fixing the github.com/nytimes/gziphandler code, and bringing the fixed code into this repository since: - they don't seem to be maintaining it anymore - we don't want to use a replace directive as it's very likely to be lost in codebases depending on this. - it's a little amount of code (500 lines) Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Add API test for gzip Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * make lint pkg/util/gziphandler Mostly handling errors, also removed the deprecated http.CloseNotifier functionality and related code. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Fix comment Co-authored-by: Marco Pracucci <marco@pracucci.com> * Add faillint for github.com/nytimes/gziphandler Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * make lint Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Fix faillint paths Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * If there's content-encoding, start plain write Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * If less than min-size, don't encode Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Refactor `handleContentType` to handle by default Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Rename acceptsIdentity to rejectsIdentity, Hopefully this will minimise the amount of double negations making the code clearer. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Fix comment Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Distributor: added per-tenant request limit (#1843) * distributor: added request limiter logic Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * updated CHANGELOG.md * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * distributor: added type plans rate limits Assuming a minimum sane value of 100 samples per request, we've set default request limits for each user tier. * docs: added request limit distributor documentation * rebuilt jsonnet test output * make linter happy * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * updated reference help Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * Add bucket prefix to experimental features (#1951) * Add bucket prefix to experimental features Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update flag status of storage_prefix to experimental Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Copy thanos shipper (#1957) * Copy shipper from Thanos. * Remove support for uploading compacted blocks. * Always allow out-of-order uploads. Removed unused overlap checker. * Rename Shipper interface to BlocksUploader, and ThanosShipper to Shipper. * Extract readShippedBlocks method from user_tsdb.go * Added shipper unit tests (copied and adapted from original tests) * Add faillint rule to avoid using Thanos shipper. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Adjust the name of the tag expected by documentation publishing (#1974) Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Use github.com/colega/grafana-tools-sdk fork (#1973) * Use github.com/colega/grafana-tools-sdk fork See https://github.com/grafana/cortex-tools/pull/248 for more context (this is the same change). The grafana-tools/sdk dependency will eventually be removed entirely from analyse commands. Signed-off-by: hjet <hjet@users.noreply.github.com> * Update CHANGELOG.md Signed-off-by: hjet <hjet@users.noreply.github.com> * mod tidy * Deprecate -ingester.ring.join-after (#1965) * Deprecate -ingester.ring.join-after Signed-off-by: Marco Pracucci <marco@pracucci.com> * Addressed review feedback Signed-off-by: Marco Pracucci <marco@pracucci.com> * Dashboards: disable gateway panels by default (#1955) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Docs: rename 'playbooks' to 'runbooks' and move them to doc (#1970) * Docs: rename 'playbooks' to 'runbooks' and move them to doc Signed-off-by: Marco Pracucci <marco@pracucci.com> * Named runbooks folder as 'mimir-runbooks/' to make it easy to import in Grafana Labs internal infrastructure as code Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix anchors check because they're case insensitive Signed-off-by: Marco Pracucci <marco@pracucci.com> * Apply suggestions from code review Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Preparation of e2eutils for Thanos indexheader unit tests. (#1982) We want to pull in the indexheader package from Thanos so that we can add some experimental alternative implementations of BinaryReader. In order to also pull in the unit tests for this package, we need the replacements for e2eutil.Copy and e2eutil.CreateBlock. This change does two things: 1. Copy in e2eutil/copy.go and fix it up accordingly. 2. Move CreateBlock into a package to avoid circular imports. * Make propagation of forwarding errors configurable (#1978) * make propagation of forwarding errors optional Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * add test for disabled error propagation Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * leave error propagation enabled by default Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * update help Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * update docs * better wording Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * Release the mimir-distributed-beta helm chart (#1948) Use the common workflow from the helm-chart repo. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Copy Thanos block/indexheader package (#1983) * Copy thanos/pkg/block/indexheader. * Update provenance. * Fix linter error due to error variable name. * Use require instead of e2eutil. * Replace usage of e2eutil.Copy * Replace usage of e2eutil.CreateBlock with local version. * Replace use of Thanos indexheader with local copy. * Add faillint check for upstream indexheader. * Fix goleak ignore for NewReaderPool. * Update vendor directory. * Prepare mimir beta chart release (#1995) * Rename chart back to mimir-distributed Apparently the helm option --devel is needed to trigger using beta versions. This should be enough protection for accidental use. Avoids renaming issues. * Version bump helm chart Do version bump to a beta version but nothing else until we double check that such beta chart cannot be accidentally selected with helm tooling. * Enable helm chart release from main branch Release process tested ok on test branch. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Bump version of helm chart (#1996) Test if helm release triggers correctly. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Update gopkg.in/yaml.v3 (#1989) This updates to a version that contains the fix to CVE-2022-28948. * Remove hardlinking in Shipper code. (#1969) * Remove hardlinking in Shipper code. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * [helm] use grpc round robin for distributor clients (#1991) * Use GRPC round-robin for gateway -> distributor requests Fixes https://github.com/grafana/mimir/issues/1987 Update chart version and changelog Use the headless distributor service for the nginx gateway Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Fix binary_reader.go header text. (#1999) Mistakenly left two lines when updating the provenance for the file. * Workaround to keep using old memcached bitnami chart for now (#1998) * Workaround to keep using old memcached bitnami chart for now See also: https://github.com/grafana/helm-charts/pull/1438 Also clean up unused chart repositories from ct.yaml. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * [helm] add results cache (#1993) * [helm] Add query-frontend results cache Fixes https://github.com/grafana/helm-charts/issues/1403 * Add PR to CHANGELOG Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Fix README Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Disable distributor.extend-writes & ingester.ring.unregister-on-shutdown (#1994) Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Update CHANGELOG.md (#1992) * [helm] Prepare image bump for 2.1 release (#2001) * Prepare image bump for 2.1 release Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Fix README template to reference 2.1 Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Add nice link text to CHANGELOG Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Update CHANGELOG.md * Publish helm charts from release branches (#2002) * Update Thanos with https://github.com/thanos-io/thanos/pull/5400. (#2006) * Replace hardcoded intervals with $__rate_interval in dashboards (#2011) * Replace hardcoded intervals with $__rate_interval in dashboards Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add CHANGELOG.md entry Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Standardise error messages for distributor instance limits (#1984) * standardise error messages for distributor instance limits * Apply suggestions from code review Co-authored-by: Marco Pracucci <marco@pracucci.com> * Apply suggestions from code review Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * apply code review suggestions to rest of doc for consistency * manually apply suggestion from code review Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Remove tutorials/ symlink (#2007) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add querier autoscaler support to jsonnet (#2013) * Add querier autoscaler support to jsonnet Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed autoscaling.libsonnet import Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add a check to Mimir jsonnet to ensure query-scheduler is enabled when enabling querier autoscaling (#2023) * Add a check to Mimir jsonnet to ensure query-scheduler is enabled when enabling querier autoscaling Signed-off-by: Marco Pracucci <marco@pracucci.com> * Shouldn't be an exported object Signed-off-by: Marco Pracucci <marco@pracucci.com> * Don't include external labels in blocks uploaded by Ingester (#1972) * Remove support for external labels. * Fixed comments. * Don't use TenantID label. Filter out the label during compaction. * CHANGELOG.md * Use public function from Thanos. * Use new UploadBlock function, move GrpcContextMetadataTenantID constant. * Rename tsdb2 import to mimir_tsdb. * Fix tests. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Enhance MimirRequestLatency runbook with more advice (#1967) * Enhance MimirRequestLatency runbook with more advice Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Include helm-docs in build and CI (#2026) * Update the mimir build image and its build doc Dockerfile: Add helm-docs package to the image. how-to: Write down the requirements for build in more detail. Add information about build on linux. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Expand make doc with helm-docs command This enables generating the helm chart README with the same make doc command as all other documentation. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Update docs/internal/how-to-update-the-build-image.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update contributing guides for the helm chart (#2008) * Update contributing guides for the helm chart Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Turn off helm version increment check in CI This enables periodic releases, as opposed to requiring version bump for release at every PR. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Add extraEnvFrom to all services and enable injection into mimir config (#2017) Add `extraEnvFrom` capability to all Mimir services to enable injecting secrets via environment variables. Enable `-config.exand-env=true` option in all Mimir services to be able to take secrets/settings from the environment and inject them into the Mimir configuration file. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Docs: fix mimir-mixin installation instructions (#2015) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Docs: make documentation a first class citizen in CHANGELOG (#2025) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Helm: add global.extraEnv and global.extraEnvFrom (#2031) * Helm: add global.extraEnv and global.extraEnvFrom Enables setting environment and env injection in one place for mimir + nginx. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Upgrade alpine to 3.16.0 (#2028) * Upgrade alpine to 3.16.0 * Enhance MimirRequestLatency runbook with more advice (#1967) * Enhance MimirRequestLatency runbook with more advice Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Include helm-docs in build and CI (#2026) * Update the mimir build image and its build doc Dockerfile: Add helm-docs package to the image. how-to: Write down the requirements for build in more detail. Add information about build on linux. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Expand make doc with helm-docs command This enables generating the helm chart README with the same make doc command as all other documentation. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Update docs/internal/how-to-update-the-build-image.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update contributing guides for the helm chart (#2008) * Update contributing guides for the helm chart Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Turn off helm version increment check in CI This enables periodic releases, as opposed to requiring version bump for release at every PR. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Add extraEnvFrom to all services and enable injection into mimir config (#2017) Add `extraEnvFrom` capability to all Mimir services to enable injecting secrets via environment variables. Enable `-config.exand-env=true` option in all Mimir services to be able to take secrets/settings from the environment and inject them into the Mimir configuration file. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Docs: fix mimir-mixin installation instructions (#2015) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Docs: make documentation a first class citizen in CHANGELOG (#2025) Signed-off-by: Marco Pracucci <marco@pracucci.com> * upgrade to alpine 3.16.0 * upgrade alpine to 3.16.0 Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com> Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Helm: release our first weekly (#2033) This should be automated, bu…
* Extend Makefile and Dockerfiles to support multiarch builds for all Go binaries. (#1759) * Extend Dockerfiles to support multiarch builds for all Go binaries. By calling any of make push-multiarch-./cmd/metaconvert/.uptodate make push-multiarch-./cmd/mimir/.uptodate make push-multiarch-./cmd/query-tee/.uptodate make push-multiarch-./cmd/mimir-continuous-test/.uptodate make push-multiarch-./cmd/mimirtool/.uptodate make push-multiarch-./operations/mimir-rules-action/.uptodate Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Update to latest dskit and memberlist fork (#1758) * Update to latest dskit and memberlist fork Fixes #1743 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Update changelog Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * update cli parameter description (#1760) Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * mimirtool config: Add more retained old defaults (#1762) * mimirtool config: Add more retained old defaults The following parameters have their old defaults retained even when `--update-defaults` is used with `mimirtool config covert`: * `activity_tracker.filepath` * `alertmanager.data_dir` * `blocks_storage.filesystem.dir` * `compactor.data_dir` * `ruler.rule_path` * `ruler_storage.filesystem.dir` * `graphite.querier.schemas.backend` (only in GEM) These are filepaths for which the new defaults don't make more sense than the old ones. In fact updating these can lead to subpar migration experience because components start using directories that don't exist. Because activity_tracker.filepath changed its name since cortex the tests needed to allow for differentiating old common options and new ones. This is something that was already there for GEM and was added for cortex/mimir too. Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * dashboards: add flag to skip gateway (#1761) * dashboards: add flag to skip gateway The gateway component seems to be an enterprise component, so groups that aren't running enterprise shouldn't need the empty panels and rows in their dashboards. This patch adds a flag to drop gateway-related widgets from the mixin dashboards. Signed-off-by: Josh Carp <jm.carp@gmail.com> * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Gracefully shutdown querier when using query-scheduler (#1756) * Gracefully shutdown querier when using query-scheduler Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed comment Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added TestQueuesOnTerminatingQuerier Signed-off-by: Marco Pracucci <marco@pracucci.com> * Commented executionContext Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update pkg/querier/worker/util.go Co-authored-by: Peter Štibraný <pstibrany@gmail.com> * Fixed typo in suggestion Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed superfluous time sensitive assertion Signed-off-by: Marco Pracucci <marco@pracucci.com> * Commented newExecutionContext() Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> * Graceful shutdown querier without query-scheduler (#1767) * Graceful shutdown querier with not using query-scheduler Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Improved comment Signed-off-by: Marco Pracucci <marco@pracucci.com> * Refactoring Signed-off-by: Marco Pracucci <marco@pracucci.com> * Increase continuous test query timeout (#1777) * Increase mimir-continuous-test query timeout from 30s to 60 Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added PR number to CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Increased default -tests.run-interval from 1m to 5m (#1778) * Increased default -tests.run-interval from 1m to 5m Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added PR number to CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix flaky tests on querier graceful shutdown (#1779) * Fix flaky tests on querier graceful shutdown Signed-off-by: Marco Pracucci <marco@pracucci.com> * Remove spurious newline Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update build image and GitHub workflow (#1781) * Update build-image to use golang:1.17.8-bullseye, and add skopeo to build image. Skopeo will be used in subsequent PR to push multiarch images. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Update build image. Use ubuntu-latest for workflow steps. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * api: remote duplicated remote read querier handler (#1776) * Publish multiarch images (#1772) * Publish multiarch images. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Tag with extra tag, if pushing tagged commit or release. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Split building of docker images and archiving them into tar. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * When tagging with test, use --all. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Only run deploy step on tags or weekly release branches. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Don't tag with test anymore. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Address review feedback. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Fix license check. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * K6: Take into account HTTP status code 202 (#1787) When using `K6_HA_REPLICAS > 1`, Mimir will accept all HTTP calls but a part of those call will receive a status code `202`. The following commit makes this status code as expected otherwise user receive the following error: ``` reads_inat write (file:///.../mimir-k6/load-testing-with-k6.js:254:8(137)) reads_inat native executor=ramping-arrival-rate scenario=writing_metrics source=stacktrace ERRO[0015] GoError: ERR: write failed. Status: 202. Body: replicas did not mach, rejecting sample: replica=replica_1, elected=replica_0 ``` At the end of the benchmark summary display errors: ``` ✗ write worked ↳ 20% — ✓ 23 / ✗ 92 ``` Example of load testing: ```shell ./k6 run load-testing-with-k6.js \ -e K6_SCHEME="https" \ -e K6_WRITE_HOSTNAME="${mimir}" \ -e K6_READ_HOSTNAME="${mimir}" \ -e K6_USERNAME="${user}" \ -e K6_WRITE_TOKEN="${password}" \ -e K6_READ_TOKEN="${password}" \ -e K6_HA_CLUSTERS="1" \ -e K6_HA_REPLICAS="3" \ -e K6_DURATION_MIN="5" ``` Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com> * replace model.Metric with labels.Labels in distributor.MetricsForLabelMatchers() (#1788) * Streaming remote read (#1735) * implement read v2 * updated CHANGELOG.md * extend maxBytesInFram comment. * addressed PR feedback * addressed PR feedback * addressed PR feedback * use indexed xor chunk function to assert stream remote read tests * updated CHANGELOG.md Co-authored-by: Miguel Ángel Ortuño <miguel.ortuno@grafana.com> * Upgrade dskit (#1791) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix mimir-continuous-test when changing configured num-series (#1775) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Do not export per user and integration Alertmanager metrics when value is 0 (#1783) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Print version+arch of Mimir loaded to Docker. (#1793) * Print version+arch of Mimir loaded to Docker. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Use debug log for distributor. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Remove unused metrics cortex_distributor_ingester_queries_total and cortex_distributor_ingester_query_failures_total (#1797) * Remove unused metrics cortex_distributor_ingester_queries_total and cortex_distributor_ingester_query_failures_total Signed-off-by: Marco Pracucci <marco@pracucci.com> * Remove unused fields Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added options support to SendSumOfCountersPerUser() (#1794) * Added options support to SendSumOfCountersPerUser() Signed-off-by: Marco Pracucci <marco@pracucci.com> * Renamed SkipZeroValueMetrics() to WithSkipZeroValueMetrics() Signed-off-by: Marco Pracucci <marco@pracucci.com> * Changed all Grafana dashboards UIDs to not conflict with Cortex ones, to let people install both while migrating from Cortex to Mimir (#1801) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Adopt mixin convention to set dashboard UIDs based on md5(filename) (#1808) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add support for store_gateway_zone args (#1807) Allow customizing mimir cli flags per zone for the store gateway. Copied the same solution as we have for ingesters. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Add protection to store-gateway to not drop all blocks if unhealthy in the ring (#1806) * Add protection to store-gateway to not drop all blocks if unhealthy in the ring Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update CHANGELOG.md Co-authored-by: Peter Štibraný <pstibrany@gmail.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> * Removed cortex_distributor_ingester_appends_total and cortex_distributor_ingester_append_failures_total unused metrics (#1799) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Remove unused clientConfig from ingester (#1814) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add tracing to `mimir-continuous-test` (#1795) * Extract and test TracerTransport functionality We need to use a TracerTransport in mimir-continous-test. We have that in the frontend package, but I don't want to import frontend from the mimir-continous-test, so we extract it to util/instrumentation. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Set up global tracer in mimir-continuous-test Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Add tracing to the client and spans to the tests Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Add jaeger-mixin to mimir-continuous test container Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * make license Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Add traces to the write path Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Chore: remove unused code from BucketStore (#1816) * Removed unused Info() and advLabelSets from BucketStore Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed unused FilterConfig from BucketStore Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed unused relabelConfig from store-gateway tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed unused function expectedTouchedBlockOps() Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed unused recorder from BucketStore tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * go mod vendor Signed-off-by: Marco Pracucci <marco@pracucci.com> * Refactoring: force removal of all blocks when BucketStore is closed (#1817) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Simplify FilterUsers() logic in store-gateway (#1819) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Migrate admin CSS to bootstrap 5 (#1821) * Migrate admin CSS to bootstrap 5 When I added bootstrap, for some reason I imported bootstrap 3 which was originally launched in 2013. Before adding more CSS styles, let's migrate to modern Bootstrap 5 launched in 2021. This doesn't require an explicit jquery dependency anymore. Also re-styled admin header to adapt properly to mobile devices screens. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * ruler: make use of dskit `grpcclient.Config` on remote evaluation client (#1818) * ruler: use dskit grpc client for remote evaluation * addressed PR feedback * Memberlist status page CSS (#1824) * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update dskit to 4d7238067788a04f3dd921400dcf7a7657116907 This includes changes from https://github.com/grafana/dskit/pull/163 Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Custom memberlist status template Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Include `import` in jsonnet snippets (#1826) * Do not drop blocks in the store-gateway if missing in the ring (#1823) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Upgraded dskit to fix temporary partial query results when shuffle sharding is enabled and hash ring backend storage is flushed / reset (#1829) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Docs: ruler remote evaluation (#1714) * include documentation for remote rule evaluation * Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * address PR feedback * Update docs/sources/operators-guide/architecture/components/ruler/index.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update docs/sources/operators-guide/architecture/components/ruler/index.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update docs/sources/operators-guide/architecture/components/ruler/index.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update docs/sources/operators-guide/architecture/components/ruler/index.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update docs/sources/operators-guide/architecture/components/ruler/index.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * addressed PR feedback * addressed PR feedback * Update docs/sources/operators-guide/architecture/components/ruler/index.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update docs/sources/operators-guide/running-production-environment/planning-capacity.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update docs/sources/operators-guide/running-production-environment/planning-capacity.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * addressed PR feedback Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Alertmanager: Do not validate alertmanager configuration if it's not running. (#1835) Allows other targets to start up even if an invalid alertmanager configuration is passed in. Fixes #1784 * Alertmanager: Allow usage with `local` storage type, with appropriate warnings. (#1836) An oversight when we removed non-sharding modes of operation is that the `local` storage type stopped working. Unfortunately it is not conceptually simple to support this type fully, as alertmanager requires remote storage shared between all replicas, to support recovering tenant state to an arbitrary replica following an all-replica outage. To support provisioning of alerts with `local` storage, but persisting of state to remote storage, we would need to allow different storage configurations. This change fixes the issue in a more naive way, so that the alertmanager can at least be started up for testing or development purposes, but persisting state will always fail. A second PR will propose allowing the `Persister` to be disabled. Although this configuration is not recommended for production used, as long as the number of replicas is equal to the replication factor, then tenants will never move between replicas, and so the local snapshot behaviour of the upstream alertmanager will be sufficient. Fixes #1638 * Mixin: Additions to Top tenants dashboard regarding sample rate and discard rate. (#1842) Adds the following rows to the "Top tenants" dashboard: - By samples rate growth - By discarded samples rate - By discarded samples rate growth These queries are useful for determining what tenants are potentially putting excess load on distributors and ingesters (and if it increased recently). * Use concurrent open/close operations in compactor unit tests (#1844) Open and close files concurrently in compactor unit tests to expose bugs that implicitly rely on ordering. Exposes bugs such as https://github.com/prometheus/prometheus/pull/10108 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Mixin: Show ingestion rate limit and rule group limit on Tenants dashboard. (#1845) Whilst diagnosing a recent issue, we thought it would be useful to show the current ingestion rate limit for the tenant. As the limit is applied to `cortex_distributor_received_samples_total`, the limit is shown on the panel which displays this metric. ("Distributor samples received (accepted) rate"). Also added `ruler_max_rule_groups_per_tenant` while in the area. We don't currently display the number of exemplars in storage on the dashboard anywhere, so cannot add `max_global_exemplars_per_user` right now. * Jsonnet: Preparatory refactoring to simplify deploying parallel query paths. (#1846) This change extracts some of the jsonnet used to build query deployments (querier, query-scheduler, query-frontend) such that it is easier to deploy secondary query paths. The use case for this is primarily to develop a query path deployment for ruler remote-evaluation, but there may be other use cases too. * Removed double space in Log (#1849) * Reference 'monolithic mode' instead of 'single binary' in logs (#1847) Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Extend safeTemplateFilepath to cover more cases. (#1833) * Extend safeTemplateFilepath to cover more cases. - template name ../tmpfile, stored into /tmp dir - empty template name - template name being just "." Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Relax mimir-continuous-test pressure when deployed with Jsonnet (#1853) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add 2.1.0-rc.0 header (#1857) * Prepare release 2.1 (#1859) * Update VERSION to 2.1-rc.0 * Add relevant changelog entries for user facing PRs since mimir-2.0.0 * Add patch in semver VERSION * Adding updated ruler diagrams. (#1861) * Create v2-1.md (#1848) * Create v2-1.md * Update and rename v2-1.md to v2.1.md updated the header and renamed the file. * Update v2.1.md Missing the upgrade configurations. * Update v2.1.md added bug description * Update v2.1.md bug fix writeup. * Update v2.1.md Added the series count description * Apply suggestions from code review Co-authored-by: Peter Štibraný <pstibrany@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update v2.1.md * Update v2.1.md updated tsdb isolation wording. * Ran make doc. * Fixed a broken relref. * Update docs/sources/release-notes/v2.1.md Co-authored-by: Peter Štibraný <pstibrany@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Allow custom data source regex in mixin dashboards (#1802) * dashboards: update grafana-builder The following commit update grafana-builder version and brings in: * enable toolip by default (#665) * Add 'Data Source' label for the default datasource template variable. (#672) * add dashboard link func (#683) * make allValue configurable (#703) * Allow datasource's regex to be configured Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com> * Allow custom data source regex in mixin dashboards The current dashboards offer the possibility to select a data source among all prometheus data sources in the organization. Depending on the number of data sources the list could be rather big (>10). Not all data sources host Mimir metrics as such listing them is not helpful for the users. Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com> * Revert back change that was enabling shared tooltips Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Dashboards: Fix `container_memory_usage_bytes:sum` recording rule (#1865) * Dashboards: Fix `container_memory_usage_bytes:sum` recording rule This change causes recording rules that reference `container_memory_usage_bytes` to omit series that do not contain the required labels for rules to run successfully, by requiring a non-empty `image` label. Signed-off-by: Peter Fern <github@0xc0dedbad.com> * Update CHANGELOG Signed-off-by: Peter Fern <github@0xc0dedbad.com> * Add compiled rules Signed-off-by: Peter Fern <github@0xc0dedbad.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Deprecate -distributor.extend-writes and set it always to false (#1856) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Remove DCO from contributors guidelines (#1867) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Create v2-1.md (#1848) * Create v2-1.md * Update and rename v2-1.md to v2.1.md updated the header and renamed the file. * Update v2.1.md Missing the upgrade configurations. * Update v2.1.md added bug description * Update v2.1.md bug fix writeup. * Update v2.1.md Added the series count description * Apply suggestions from code review Co-authored-by: Peter Štibraný <pstibrany@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update v2.1.md * Update v2.1.md updated tsdb isolation wording. * Ran make doc. * Fixed a broken relref. * Update docs/sources/release-notes/v2.1.md Co-authored-by: Peter Štibraný <pstibrany@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Adding updated ruler diagrams. (#1861) * Deprecate -distributor.extend-writes and set it always to false (#1856) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Bump version to 2.1.0-rc.1 to include cherry-picked * List Johanna as 2.1.0 release shepherd (#1871) * fix(mixin): add missing alertmanager hashring members (#1870) * fix(mixin): add missing alertmanager hashring members * docs(CHANGELOG): add changelog entry * Docs: clarify 'Set rule group' API specification (#1869) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Simplify documentation publishing logic (#1820) * Simplify documentation publishing logic Split into two pipelines, one that runs on main and one that runs on release branches and tags. Use `has-matching-release-tag` workflow to determine whether to release documentation on release branch and tags. `has-matching-release-tag` is documented in https://github.com/grafana/grafana-github-actions/blob/main/has-matching-release-tag/action.yaml Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Remove script no longer used for documentation releases Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Add missing clone step for the website-sync action Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Update RELEASE instructions to reflect automated docs publishing Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Remove conditional from website clone for next publishing Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Fix capitalization of Jsonnet and Tanka (#1875) Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Checkout the repository as part of the documentation sync (#1876) * Checkout the repository as part of the documentation sync I assumed this was already done but the GitHub docs confirm that it is required. https://docs.github.com/en/github-ae@latest/actions/using-workflows/about-workflows#about-workflows Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Allow manual triggering of workflow Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Fix manual workflow dispatch (#1877) TIL that if you edit the workflow in the GitHub UI, it will lint your workflow file and make sure that all the keys conform to the schema. * Simplify documentation publishing logic (#1820) * Simplify documentation publishing logic Split into two pipelines, one that runs on main and one that runs on release branches and tags. Use `has-matching-release-tag` workflow to determine whether to release documentation on release branch and tags. `has-matching-release-tag` is documented in https://github.com/grafana/grafana-github-actions/blob/main/has-matching-release-tag/action.yaml Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Remove script no longer used for documentation releases Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Add missing clone step for the website-sync action Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Update RELEASE instructions to reflect automated docs publishing Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Remove conditional from website clone for next publishing Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Checkout the repository as part of the documentation sync (#1876) * Checkout the repository as part of the documentation sync I assumed this was already done but the GitHub docs confirm that it is required. https://docs.github.com/en/github-ae@latest/actions/using-workflows/about-workflows#about-workflows Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Allow manual triggering of workflow Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Fix manual workflow dispatch (#1877) TIL that if you edit the workflow in the GitHub UI, it will lint your workflow file and make sure that all the keys conform to the schema. * Chore: cleanup unused alertmanager config in Mimir jsonnet (#1873) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update mimir-prometheus to ceaa77f1 (#1883) * Update mimir-prometheus to ceaa77f1 This includes the fix https://github.com/grafana/mimir-prometheus/pull/234 for https://github.com/grafana/mimir/issues/1866 Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Fix changelog Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Bump version to 2.1.0-rc.1 to include cherry-picked (#1872) * Increased default configuration for -server.grpc-max-recv-msg-size-bytes and -server.grpc-max-send-msg-size-bytes from 4MB to 100MB (#1884) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Split mimir_queries rule group so that it doesn't have more than 20 rules (#1885) * Split mimir_queries rule group so that it doesn't have more than 20 rules. * Add check for number of rules in the group. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Add alert for store-gateways without blocks (#1882) * Add alert for store-gateways without blocks Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Clarify messages Co-authored-by: Marco Pracucci <marco@pracucci.com> * Replace "Store Gateway" with "store-gateway" Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Rename alert to StoreGatewayNoSyncedTenants Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Rebuild mixin Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Fix flaky integration tests caused by 'metric not found' (#1891) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Docs: Explain the runtime override of active series matchers (#1868) * Updated docs/sources/operators-guide/configuring/configuring-custom-trackers.md; made some tweaks to the examples; changed name interesting-service and also-interesting-service to service1 and service2 respectively Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> Co-authored-by: Jennifer Villa <jen.villa@grafana.com> * Update to latest Thanos for Memcached fixes (#1837) Update our vendor of Thanos to pull in the most recent changes to the Memcached client. In particular, these changes prevent the client from starting many goroutines as part of batching before they are able to make progress. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Fixed deceiving error log "failed to update cached shipped blocks after shipper initialisation" (#1893) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix TestRulerEvaluationDelay flakyness (#1892) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix `MimirRulerMissedEvaluations` text and add playbook (#1895) * Correct magnitude on MimirRulerMissedEvaluations Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add playbook for MimirRulerMissedEvaluations Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Remove trailing spaces Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Conform to tech doc style. (#1904) * Use a dedicated threadpool for store-gateway requests (#1812) Remove the use of a dedicated threadpool for index-header operations because the call overhead is prohibitively expensive. Instead, use a dedicated threadpool for entire store-gateway requests so that the cost of switching between threads is only paid a single time. This allows for isolation in the case of page faults during mmap accesses without too much overhead. Fixes #1804 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Upgrade consideration for active_series_custom_trackers_config (#1897) * Upgrade consideration for active_series_custom_trackers_config * Update docs/sources/release-notes/v2.1.md Co-authored-by: Jennifer Villa <jen.villa@grafana.com> * Update docs/sources/release-notes/v2.1.md Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Jennifer Villa <jen.villa@grafana.com> * Upgrade consideration for active_series_custom_trackers_config (#1897) * Upgrade consideration for active_series_custom_trackers_config * Update docs/sources/release-notes/v2.1.md Co-authored-by: Jennifer Villa <jen.villa@grafana.com> * Update docs/sources/release-notes/v2.1.md Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Jennifer Villa <jen.villa@grafana.com> * fix(mixin): do not trigger TooMuchMemory alerts if no container limits are supplied (#1905) * fix(mixin): do not trigger `MimirAllocatingTooMuchMemory` or `EtcdAllocatingTooMuchMemory` alerts if no container limits are supplied * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Fix MimirCompactorHasNotUploadedBlocks alert false positive when Mimir is deployed in monolithic mode (#1902) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Set defaults to query ingesters, not store, for recent data (#1909) Set queriers to _not_ query storage (store-gateways) for recent data and set the store-gateways to ignore recent uncompacted blocks. Default values are set to match what we use in the Mimir jsonnet. Fixes #1639 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Revert distributor log level to warn in integration tests (#1910) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Improved error returned by -querier.query-store-after validation (#1914) * Improved error returned by -querier.query-store-after validation Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update pkg/querier/querier.go Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Remove jsonnet configuration settings that match default values (#1915) * Remove jsonnet configuration settings that match default values Follow up to #1909 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Docs: recommend fast disks for ingesters and store-gateways (#1903) * Docs: recommend fast disks for ingesters and store-gateways Signed-off-by: Marco Pracucci <marco@pracucci.com> * Apply suggestions from code review Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Update docs/sources/operators-guide/running-production-environment/production-tips/index.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Update docs/sources/operators-guide/running-production-environment/production-tips/index.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Improve series, sample, metadata and exemplars validation errors (#1907) * Improved error messages returned by ValidateSample(), ValidateExemplar(), ValidateMetadata() and ValidateLabels() Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Apply suggestions from code review Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Fixed unit tests after error messages edit Signed-off-by: Marco Pracucci <marco@pracucci.com> * Manually applied a suggestion to error message Signed-off-by: Marco Pracucci <marco@pracucci.com> * Renamed globalerrors pkg to singular form Signed-off-by: Marco Pracucci <marco@pracucci.com> * Cleanup globalerror package based on Oleg's feedback Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed formatting support from globalerror.ID's message generation function Signed-off-by: Marco Pracucci <marco@pracucci.com> * Changed another error message based on feedback Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update operations/mimir-mixin/docs/playbooks.md Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Rephrased label name/value length error message based on feedback received in the test file Signed-off-by: Marco Pracucci <marco@pracucci.com> * Final fixes to error messages Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * mixin-tool: adapt screenshots dockerimage to support arm64 (#1916) Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * Ingester ring endpoint fix (#1918) * /ingester/ring is also available via distributor. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Revert unintended change. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Configuration files for GrafanaCon 2022 presentation. (#1881) * Configuration files for GrafanaCon 2022 presentation. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Update dskit to bring "Parallelize memberlist notified message processing" PR (#1912) * Update dskit to bring "Parallelize memberlist notified message processing" PR. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Account for StatefulSets and Depl-s named by the helm chart (#1913) Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Change shuffle sharding ingester lookback default config (#1921) * Change shuffle sharding ingester lookback default config Use the same default value for ingester lookback as the "query ingesters within" setting to reduce the number of things that need to be changed from their defaults. This change also removes use of the `-blocks-storage.tsdb.close-idle-tsdb-timeout` flag in jsonnet since the value being used matches the default. Follow up to #1915 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Changelog Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Improved ValidateMetadata() errors (#1919) * Improved ValidateMetadata() errors Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added PR number to CHANGELOG Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update pkg/util/validation/errors.go Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com> * Converted all ValidationError to be non-pointers Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed unused variable Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed unit test Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed markdown linter Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com> * mixin/dashboards: ruler query path dashboards (#1911) * mixin: added ruler query path dashboards Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * docs: added ruler reads & ruler reads resources dashboard screenshots Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * updated CHANGELOD.md Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * Mark query_ingesters_within and query_store_after as advanced (#1929) * Mark query_ingesters_within and query_store_after as advanced Now that they have good defaults that match what we run in production, they shouldn't need to be tuned by users in most cases. Fixes #1924 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Remove empty chunks panel from Queries dashboard (#1928) * Remove empty chunks panel from Queries dashboard Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Make MimirGossipMembersMismatch less sensitive, and make it fire fewer alerts. (#1926) * Make MimirGossipMembersMismatch less sensitive, and make it fire fewer alerts. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Update config value for -querier.query-ingesters-within to work with … (#1930) * Update config value for -querier.query-ingesters-within to work with new default value for -querier.query-store-after * Remove config for -querier.query-ingesters-within as they are set to default * Update Thanos vendor for memcache improvements (#1920) Update our vendor of Thanos so that memcache keys are grouped by the server they are owned by before being split into batches. Fixes #423 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Move usage generation to separate package (#1934) * Move usage function into a separate package and export it Signed-off-by: Patryk Prus <patryk.prus@grafana.com> * Add function to add to flag category overrides at runtime Signed-off-by: Patryk Prus <patryk.prus@grafana.com> * Document CHANGELOG scopes * Add documentation about changelog scopes * update CHANGELOG for #1934 * Improve instance limits, ingester limits, query limiter, some querier errors (#1888) * Add errors IDs to pkg/ingester/instance_limits.go Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add errors IDs to pkg/ingester/limiter.go Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add errors IDs to pkg/querier/blocks_store_queryable.go Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Differentiate max-ingester-ingestion-rate from distributor Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update playbooks.md Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Correct misspelled flags Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Correct strings in tests as well Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Re-iterated on ingesters limit errors Signed-off-by: Marco Pracucci <marco@pracucci.com> * Re-iterated on ingesters per-tenant limit errors Signed-off-by: Marco Pracucci <marco@pracucci.com> * Apply suggestions from code review Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Re-iterated on query per-tenant limit errors Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added PR number to CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Apply suggestions from code review Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Mention the cardinality API endpoint in the err-mimir-max-series-per-metric runbook Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update operations/mimir-mixin/docs/playbooks.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Fixed InstanceLimits receiver name to be consistent Signed-off-by: Marco Pracucci <marco@pracucci.com> * Clarify metadata is stored in memory Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed linter and tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed more tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update pkg/querier/blocks_store_queryable.go Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com> * Fix english grammar about 'how to fix it' Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com> * make ingesters use heartbeat timeout instead of period to fix the bug… (#1933) * make ingesters use heartbeat timeout instead of period to fix the bug where they sometimes appear as unhealthy * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * Update VERSION to 2.1.0 * Update dashboard screenshots (#1940) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix version in changelog * Update mimir tests to use new 2.1.0 image * Add minimum Grafana version to mixin dashboards (#1943) Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Bump grafana/mimir image to 2.1.0 for backward compatibility testing (#1942) * Chore: renamed source files for remote ruler dashboards (#1937) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Move the mimir-distributed helm chart into the mimir repository (#1925) * Initial copy of mimir-distributed helm chart This commit is not expected to work in CI. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Update github action for helm lint and test Set the working directory for github actions for helm actions. Set more consistent name for github actions. Set chart name for testing. Ignore generated helm doc from prettier. Do not do release for now of helm chart. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Add bucket prefix configuration (#1686) * Add bucket prefix configuration Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add allowed chars validation for storage prefix Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add unit tests for PrefixedBucketClient Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add CHANGELOG entry Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Use grafana/regexp instead of regexp Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Improve validation of storage_prefix Update docs and add validate for .. and . Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add some tests for AM and ruler bucket validaiton Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add tests for bucket prefix with filesystem client Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update helm text too Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update everything Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Simplify validation for storage_prefix Only accept alphanumeric characters for the storage_prefix to prevent mistypings and misunderstandings when the prefix ends with a slash or contains slashes and dots Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Make stronger assertions in bucket validation test Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Make stronger assertions in bucket prefix test Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Assert on errors, not on strings Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Exclude YAML field names from error message Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Include full image tag on rollout dashboard (#1932) * Make version matcher in rollout dashboard work for non-weekly images Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add CHANGELOG.md entry Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Co-authored-by: Marco Pracucci <marco@pracucci.com> * docs: move federated rule groups documentation to its own section (#1906) * docs: move federated rule groups documentation to its own section Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * Make networking panels pod matchers work with helm chart (#1927) * Make networking panels pod matchers work with helm chart The pods created by the helm chart follow a format of <helm_release_name>-mimir-<ingester|distributor|...>. This is a problem for all places that use the per_instance_label for matching. The per_instance_label is mostly used in aggregations (sum by (pod), count by (pod), ...). The networking panels are the only ones that use it for matching. Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Replace .* with a stronger regex in pod matchers Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add CHANGELOG.md entry Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add max query length error to errors catalog (#1939) * Add max query length error to errors catalogue Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added PR number to CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Apply suggestions from code review Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Remove image spec from demo file. (#1946) * Remove image spec from demo file. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Fix rejected identity accept encoding (#1864) * Fix rejected identity accept-encoding When a request comes in with header: Accept-Encoding: gzip;q=1, identity;q=0 we should gzip the response even if it's smaller than the defined minimum size. We achieve this by fixing the github.com/nytimes/gziphandler code, and bringing the fixed code into this repository since: - they don't seem to be maintaining it anymore - we don't want to use a replace directive as it's very likely to be lost in codebases depending on this. - it's a little amount of code (500 lines) Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Add API test for gzip Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * make lint pkg/util/gziphandler Mostly handling errors, also removed the deprecated http.CloseNotifier functionality and related code. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Fix comment Co-authored-by: Marco Pracucci <marco@pracucci.com> * Add faillint for github.com/nytimes/gziphandler Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * make lint Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Fix faillint paths Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * If there's content-encoding, start plain write Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * If less than min-size, don't encode Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Refactor `handleContentType` to handle by default Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Rename acceptsIdentity to rejectsIdentity, Hopefully this will minimise the amount of double negations making the code clearer. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Fix comment Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Distributor: added per-tenant request limit (#1843) * distributor: added request limiter logic Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * updated CHANGELOG.md * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * distributor: added type plans rate limits Assuming a minimum sane value of 100 samples per request, we've set default request limits for each user tier. * docs: added request limit distributor documentation * rebuilt jsonnet test output * make linter happy * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * updated reference help Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * addressed PR feedback Signed-off-by: Miguel Ángel Ortuño <ortuman@gmail.com> * Add bucket prefix to experimental features (#1951) * Add bucket prefix to experimental features Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update flag status of storage_prefix to experimental Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Copy thanos shipper (#1957) * Copy shipper from Thanos. * Remove support for uploading compacted blocks. * Always allow out-of-order uploads. Removed unused overlap checker. * Rename Shipper interface to BlocksUploader, and ThanosShipper to Shipper. * Extract readShippedBlocks method from user_tsdb.go * Added shipper unit tests (copied and adapted from original tests) * Add faillint rule to avoid using Thanos shipper. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Adjust the name of the tag expected by documentation publishing (#1974) Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Use github.com/colega/grafana-tools-sdk fork (#1973) * Use github.com/colega/grafana-tools-sdk fork See https://github.com/grafana/cortex-tools/pull/248 for more context (this is the same change). The grafana-tools/sdk dependency will eventually be removed entirely from analyse commands. Signed-off-by: hjet <hjet@users.noreply.github.com> * Update CHANGELOG.md Signed-off-by: hjet <hjet@users.noreply.github.com> * mod tidy * Deprecate -ingester.ring.join-after (#1965) * Deprecate -ingester.ring.join-after Signed-off-by: Marco Pracucci <marco@pracucci.com> * Addressed review feedback Signed-off-by: Marco Pracucci <marco@pracucci.com> * Dashboards: disable gateway panels by default (#1955) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Docs: rename 'playbooks' to 'runbooks' and move them to doc (#1970) * Docs: rename 'playbooks' to 'runbooks' and move them to doc Signed-off-by: Marco Pracucci <marco@pracucci.com> * Named runbooks folder as 'mimir-runbooks/' to make it easy to import in Grafana Labs internal infrastructure as code Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix anchors check because they're case insensitive Signed-off-by: Marco Pracucci <marco@pracucci.com> * Apply suggestions from code review Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Preparation of e2eutils for Thanos indexheader unit tests. (#1982) We want to pull in the indexheader package from Thanos so that we can add some experimental alternative implementations of BinaryReader. In order to also pull in the unit tests for this package, we need the replacements for e2eutil.Copy and e2eutil.CreateBlock. This change does two things: 1. Copy in e2eutil/copy.go and fix it up accordingly. 2. Move CreateBlock into a package to avoid circular imports. * Make propagation of forwarding errors configurable (#1978) * make propagation of forwarding errors optional Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * add test for disabled error propagation Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * leave error propagation enabled by default Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * update help Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * update docs * better wording Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * Release the mimir-distributed-beta helm chart (#1948) Use the common workflow from the helm-chart repo. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Copy Thanos block/indexheader package (#1983) * Copy thanos/pkg/block/indexheader. * Update provenance. * Fix linter error due to error variable name. * Use require instead of e2eutil. * Replace usage of e2eutil.Copy * Replace usage of e2eutil.CreateBlock with local version. * Replace use of Thanos indexheader with local copy. * Add faillint check for upstream indexheader. * Fix goleak ignore for NewReaderPool. * Update vendor directory. * Prepare mimir beta chart release (#1995) * Rename chart back to mimir-distributed Apparently the helm option --devel is needed to trigger using beta versions. This should be enough protection for accidental use. Avoids renaming issues. * Version bump helm chart Do version bump to a beta version but nothing else until we double check that such beta chart cannot be accidentally selected with helm tooling. * Enable helm chart release from main branch Release process tested ok on test branch. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Bump version of helm chart (#1996) Test if helm release triggers correctly. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Update gopkg.in/yaml.v3 (#1989) This updates to a version that contains the fix to CVE-2022-28948. * Remove hardlinking in Shipper code. (#1969) * Remove hardlinking in Shipper code. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * [helm] use grpc round robin for distributor clients (#1991) * Use GRPC round-robin for gateway -> distributor requests Fixes https://github.com/grafana/mimir/issues/1987 Update chart version and changelog Use the headless distributor service for the nginx gateway Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Fix binary_reader.go header text. (#1999) Mistakenly left two lines when updating the provenance for the file. * Workaround to keep using old memcached bitnami chart for now (#1998) * Workaround to keep using old memcached bitnami chart for now See also: https://github.com/grafana/helm-charts/pull/1438 Also clean up unused chart repositories from ct.yaml. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * [helm] add results cache (#1993) * [helm] Add query-frontend results cache Fixes https://github.com/grafana/helm-charts/issues/1403 * Add PR to CHANGELOG Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Fix README Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Disable distributor.extend-writes & ingester.ring.unregister-on-shutdown (#1994) Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Update CHANGELOG.md (#1992) * [helm] Prepare image bump for 2.1 release (#2001) * Prepare image bump for 2.1 release Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Fix README template to reference 2.1 Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Add nice link text to CHANGELOG Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com> * Update CHANGELOG.md * Publish helm charts from release branches (#2002) * Update Thanos with https://github.com/thanos-io/thanos/pull/5400. (#2006) * Replace hardcoded intervals with $__rate_interval in dashboards (#2011) * Replace hardcoded intervals with $__rate_interval in dashboards Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add CHANGELOG.md entry Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Standardise error messages for distributor instance limits (#1984) * standardise error messages for distributor instance limits * Apply suggestions from code review Co-authored-by: Marco Pracucci <marco@pracucci.com> * Apply suggestions from code review Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * apply code review suggestions to rest of doc for consistency * manually apply suggestion from code review Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * Remove tutorials/ symlink (#2007) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add querier autoscaler support to jsonnet (#2013) * Add querier autoscaler support to jsonnet Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed autoscaling.libsonnet import Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add a check to Mimir jsonnet to ensure query-scheduler is enabled when enabling querier autoscaling (#2023) * Add a check to Mimir jsonnet to ensure query-scheduler is enabled when enabling querier autoscaling Signed-off-by: Marco Pracucci <marco@pracucci.com> * Shouldn't be an exported object Signed-off-by: Marco Pracucci <marco@pracucci.com> * Don't include external labels in blocks uploaded by Ingester (#1972) * Remove support for external labels. * Fixed comments. * Don't use TenantID label. Filter out the label during compaction. * CHANGELOG.md * Use public function from Thanos. * Use new UploadBlock function, move GrpcContextMetadataTenantID constant. * Rename tsdb2 import to mimir_tsdb. * Fix tests. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Enhance MimirRequestLatency runbook with more advice (#1967) * Enhance MimirRequestLatency runbook with more advice Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Include helm-docs in build and CI (#2026) * Update the mimir build image and its build doc Dockerfile: Add helm-docs package to the image. how-to: Write down the requirements for build in more detail. Add information about build on linux. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Expand make doc with helm-docs command This enables generating the helm chart README with the same make doc command as all other documentation. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Update docs/internal/how-to-update-the-build-image.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update contributing guides for the helm chart (#2008) * Update contributing guides for the helm chart Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Turn off helm version increment check in CI This enables periodic releases, as opposed to requiring version bump for release at every PR. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Add extraEnvFrom to all services and enable injection into mimir config (#2017) Add `extraEnvFrom` capability to all Mimir services to enable injecting secrets via environment variables. Enable `-config.exand-env=true` option in all Mimir services to be able to take secrets/settings from the environment and inject them into the Mimir configuration file. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Docs: fix mimir-mixin installation instructions (#2015) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Docs: make documentation a first class citizen in CHANGELOG (#2025) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Helm: add global.extraEnv and global.extraEnvFrom (#2031) * Helm: add global.extraEnv and global.extraEnvFrom Enables setting environment and env injection in one place for mimir + nginx. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Upgrade alpine to 3.16.0 (#2028) * Upgrade alpine to 3.16.0 * Enhance MimirRequestLatency runbook with more advice (#1967) * Enhance MimirRequestLatency runbook with more advice Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Include helm-docs in build and CI (#2026) * Update the mimir build image and its build doc Dockerfile: Add helm-docs package to the image. how-to: Write down the requirements for build in more detail. Add information about build on linux. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Expand make doc with helm-docs command This enables generating the helm chart README with the same make doc command as all other documentation. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Update docs/internal/how-to-update-the-build-image.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update contributing guides for the helm chart (#2008) * Update contributing guides for the helm chart Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Turn off helm version increment check in CI This enables periodic releases, as opposed to requiring version bump for release at every PR. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Add extraEnvFrom to all services and enable injection into mimir config (#2017) Add `extraEnvFrom` capability to all Mimir services to enable injecting secrets via environment variables. Enable `-config.exand-env=true` option in all Mimir services to be able to take secrets/settings from the environment and inject them into the Mimir configuration file. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * Docs: fix mimir-mixin installation instructions (#2015) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Docs: make documentation a first class citizen in CHANGELOG (#2025) Signed-off-by: Marco Pracucci <marco@pracucci.com> * upgrade to alpine 3.16.0 * upgrade alpine to 3.16.0 Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com> Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Helm: release our first weekly (#2033) This should be automated, but…
What this PR does
Enhance
MimirRequestLatency
runbook with more practical advice.Which issue(s) this PR fixes or relates to
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]