Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance MimirRequestLatency runbook with more advice #1967

Merged
merged 21 commits into from
Jun 3, 2022
Merged
Changes from 5 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
83dbd76
Enhance MimirRequestLatency runbook with more advice
aknuds1 May 31, 2022
4e8fd08
Merge remote-tracking branch 'origin/main' into chore/enhance-mimir-r…
aknuds1 May 31, 2022
ca12851
Merge remote-tracking branch 'origin/main' into chore/enhance-mimir-r…
aknuds1 Jun 1, 2022
77cff05
Remove part on OOM-ing
aknuds1 Jun 1, 2022
ef623c6
Tweak advice regarding queriers and query sharding
aknuds1 Jun 1, 2022
06d1f38
Update docs/sources/operators-guide/mimir-runbooks/_index.md
aknuds1 Jun 2, 2022
27472ff
Preparation of e2eutils for Thanos indexheader unit tests. (#1982)
stevesg Jun 1, 2022
90893c1
Make propagation of forwarding errors configurable (#1978)
replay Jun 1, 2022
9aea9b7
Release the mimir-distributed-beta helm chart (#1948)
krajorama Jun 1, 2022
a3ecb22
Copy Thanos block/indexheader package (#1983)
stevesg Jun 1, 2022
1252c06
Prepare mimir beta chart release (#1995)
krajorama Jun 1, 2022
67711ac
Bump version of helm chart (#1996)
krajorama Jun 1, 2022
b80e560
Revise query sharding advice
aknuds1 Jun 2, 2022
8fb0833
Fix link
aknuds1 Jun 2, 2022
00a45b4
Add example Memcached timeout query
aknuds1 Jun 2, 2022
9e1ce54
Merge remote-tracking branch 'origin/main' into chore/enhance-mimir-r…
aknuds1 Jun 2, 2022
2e7b272
Merge remote-tracking branch 'origin/main' into chore/enhance-mimir-r…
aknuds1 Jun 2, 2022
071a1a4
Update docs/sources/operators-guide/mimir-runbooks/_index.md
aknuds1 Jun 3, 2022
cec869f
Fix binary_reader.go header text. (#1999)
stevesg Jun 2, 2022
97e63ce
Merge remote-tracking branch 'origin/main' into chore/enhance-mimir-r…
aknuds1 Jun 3, 2022
5293bc6
Address feedback
aknuds1 Jun 3, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/sources/operators-guide/mimir-runbooks/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,14 @@ How to **investigate**:
- Check `Memcached Overview` dashboard
- If memcached eviction rate is high, then you should scale up memcached replicas. Check the recommendations by `Mimir / Scaling` dashboard and make reasonable adjustments as necessary.
- If memcached eviction rate is zero or very low, then it may be caused by "first time" queries
- Cache query timeouts
- Check store-gateway logs and look for warnings about timed out Memcached queries
- If there are indeed a lot of timed out Memcached queries, consider whether the store-gateway Memcached timeout setting (`-blocks-storage.bucket-store.chunks-cache.memcached.timeout`) is sufficient
- If queries are waiting in queue due to busy queriers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not saying how to check it. The Mimir / Queries dashboard has panels named "Queue length". Goal is to have that queue length 0 (except few sporadic spikes). If that queue length is > 0 for some time, then we need to scale up queriers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- Consider scaling up number of queriers if they're not auto-scaled; if auto-scaled, check auto-scaling parameters
- If queries are not waiting in queue due to busy queriers
- Consider enabling query sharding if not already enabled, to increase query parallelism
aknuds1 marked this conversation as resolved.
Show resolved Hide resolved
- If query sharding already enabled, consider increasing total number of query shards (`query_sharding_total_shards`) for tenants submitting slow queries, so their queries can be further parallelized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to recall that tuning the number of shards isn't exactly as straightforward as it seems. Is there an existing doc we could link people to that describes how to pick a number of shards?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All doc we have is at docs/sources/operators-guide/architecture/query-sharding/index.md. I think the main feedback here is to just increase it, and see if it improve things. We could be more specific and say consider doubling the query shards and check if reduce high-cardinality query latency: if it doesn't, then rollback.


#### Alertmanager

Expand Down