-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grafana dashboard shows "too many outstanding requests" after upgrade to v2.4.2 #5123
Comments
Hi, I resolved the pb for my part by increasing two default value with: querier: It's not perfect, but this error, helped me to understand better and better the architecture. |
Hi, I solved my pb with:
My dashboard is complete now in 5s :) It works for me because i had a lot of small request. Too much for my docker, loki process. Reduce them was the solution. |
see #5204 |
For completeness, here's the needed config
|
This helped partially, I still see the error every now and then. |
You can raise I'm still working on finding the right trade-off between memory-usage and speed. Below, you'll see my current partial configuration, relevant to this specific issue.
|
This part seems to help. I never ran into this issue with 2.4.1. Something changed in 2.4.2, I hope they restore the default values to what it was before. |
This worked for my setup, thanks! |
I can also confirm that on v2.4.2 you will face this issue if you keep new default value. Switching value back to old default from version v2.4.1 solve my problem.
|
Bump, this is a serious issue. Please fix Loki team. |
I'm not able to solve my problem using none of the above values/options on version 2.4.2. We rolled back our loki to version 2.4.1 and this solved our issue. Let's wait for the Loki team fix. |
2.5.0 also have this problem |
select {
case queue <- req:
q.queueLength.WithLabelValues(userID).Inc()
q.cond.Broadcast()
// Call this function while holding a lock. This guarantees that no querier can fetch the request before function returns.
if successFn != nil {
successFn()
}
return nil
//default:
// q.discardedRequests.WithLabelValues(userID).Inc()
// return ErrTooManyRequests
} After removing this part of the code, the problem was alleviated |
We got the same error with v2.5.0. |
Is there an ETA for a fix? |
i can confirm this issue exists after an upgrade to the newest version, i can't even roll back to 2.4.1, i may note that 2.4.1 uses v1beta tags and will not be available on gcp very soon |
We also had a lot of "403 too many outstanding requests" on loki 2.5.0 and 2.4.2. |
@wuestkamp the issue is really that the 2.4.1 has security issues and is deprecated soon by the new k8 cluster versions |
So why is grafana labs not fixing this issue? I don't understand. Why is it so hard? |
@benisai i wish i knew ensure you are only using this in an isolated network the CVE's could lead to break ins and Grafana is a data pod with potentially lots of customer logs etc... don't endanger your company by running old versions |
Homelab only. But still the issue persist without a fix. Or is there a fix? |
I'm too lazy to set up a configuration file, so I just downgraded to 2.4.1 (homelab). |
That's works form me with ansible
|
Hi, any updates ? |
I increased both the values query_scheduler:
max_outstanding_requests_per_tenant: 4096
frontend:
max_outstanding_per_tenant: 4096
query_range:
parallelise_shardable_queries: true
limits_config:
split_queries_by_interval: 15m
max_query_parallelism: 32 The default values for |
@stefan-fast Thank you so much for your help. |
This doesnt work with 2.6.2 |
The cause of the issue is that parallelism has been enabled by default, but it limits the number of queries that you can queue at the same time to a low number by default. The best solution right now is to edit the max number of outstanding requests. For simple deployments (single-binary or SSD mode), add the following configuration: query_scheduler:
max_outstanding_requests_per_tenant: 10000 If you deployed in microservices mode, use this config: frontend:
max_outstanding_per_tenant: 10000 |
…rrect explanation for how to disable. (grafana#6715) <!-- Thanks for sending a pull request! Before submitting: 1. Read our CONTRIBUTING.md guide 2. Name your PR as `<Feature Area>: Describe your change`. a. Do not end the title with punctuation. It will be added in the changelog. b. Start with an imperative verb. Example: Fix the latency between System A and System B. c. Use sentence case, not title case. d. Use a complete phrase or sentence. The PR title will appear in a changelog, so help other people understand what your change will be. 3. Rebase your PR if it gets out of sync with main --> **What this PR does / why we need it**: I noticed when responding to grafana#5123 the docs did not correctly explain how to disable splitting queries by time. I searched through the code and confirmed `0` is the correct value to disable this feature **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: <!-- Note about CHANGELOG entries, if a change adds: * an important feature * fixes an issue present in a previous release, * causes a change in operation that would be useful for an operator of Loki to know then please add a CHANGELOG entry. For documentation changes, build changes, simple fixes etc please skip this step. We are attempting to curate a changelog of the most relevant and important changes to be easier to ingest by end users of Loki. Note about the upgrade guide, if this changes: * default configuration values * metric names or label names * changes existing log lines such as the metrics.go query output line * configuration parameters * anything to do with any API * any other change that would require special attention or extra steps to upgrade Please document clearly what changed AND what needs to be done in the upgrade guide. --> **Checklist** - [ ] Documentation added - [ ] Tests updated - [ ] Is this an important fix or new feature? Add an entry in the `CHANGELOG.md`. - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` Signed-off-by: Edward Welch <edward.welch@grafana.com> Co-authored-by: Karen Miller <84039272+KMiller-Grafana@users.noreply.github.com>
Issue present in loki 2.7.0 while using default values. |
On 2.7.4 as well |
Same in 2.8.0 |
None of the above configurations worked for me, version 2.8.0 |
For me the following were the only lines that I changed from the base config in the docker container and seem to work (so far):
I'm not throwing a huge amount at the server at the moment, but at least multiple panels in a dashboard load in a separate graphana instance that's pointing at it. |
It's works to me |
The first works to me in Loki v2.8.2 for the binary deployment. |
Adding the configuration using command is works for me in Loki v2.6.1 installed using helm
|
The following seems to work for the chart version 5.6.2 and app version 2.8.2 for the grafana/loki helm chart. loki:
limits_config:
split_queries_by_interval: 24h
max_query_parallelism: 100
query_scheduler:
max_outstanding_requests_per_tenant: 4096
frontend:
max_outstanding_per_tenant: 4096
# other stuff... If you are not deploying Loki via Helm, i believe you have to set these values not under the "loki:" key but on top level directly into the config.
It seems that "outstanding requests" is required to allow many requests to Loki, while the stuff in "limits_config" is the upper ceiling that Grafana send out to the datasource. Either "query_scheduler" or "frontend" is required based on your setup but just set both.
Depending on how complex your dashboard are, you might run into these limits and have to extend them. If the defaults wont be changed, i guess this issue can be closed. |
@litvinav im highly against closing this, sensible defaults is something every software should have. If you install an ingress it also works out of the box and you can configure it on top of it. That makes adoption for beginners easier. In order to break the default you need only to select about 5 datasets and you will get a 429 (this isn't a complex screen or anything) |
So what is the final decision? |
That's a big number relative to the default of 10. Might need to watch resource consumption.
|
No one of above solutions works, only revert to 2.4.1 finally fixed this dreaded issue. |
@msveshnikov Don't run old versions you put your clusters at risk! |
I have a good solution from another issue that was closed. Apparently the problem is with the parallel queries going on. Solution was found here : #4613 (comment)
|
This issue opened in new site and 100% working solutions |
Describe the bug
After upgrading to v2.4.2 from v2.4.1, none of the panels using loki show any data. I have a dashboard with 4 panels that load data from loki. I am able to see data ingested correctly with grafana explore datasource query.
Environment
Using loki with docker-compose and shipping docker logs with loki driver.
loki.yml
error on the grafana panel
The text was updated successfully, but these errors were encountered: