You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When LogQL request is cancelled by the client(LogCLI or Grafana), the cancellation is received by Query Frontend, but not propagated to some of the downstream requests started by Query Frontend(eg to queriers mainly).
This leads to more resource conception in Loki because, even after original request is cancelled, the queriers started by the request is still running.
NOTE: Can happen for any LogQL queries
Steps to reproduce the issue.
NOTE: Here I use LogCLI, for two reasons
Grafana sends two queries (actual log query and log volume query if enabled) so narrowing down the resource consumption is bit hard during that time interval. Whereas, using LogCLI we can send only single metric query that is causing the issue.
Easy to change the request timeout value for investigation.
1. Make count_over_time query via LogCLI
We use timeout 10s that closes the client connection after 10s cancelling the request.
Here we use date | md5sum | cut -d\- -f1 for random ID in the query (easy to find the exact query).
You can see it takes more than 10s (something was running after the cancelation)
Also from the traces, you can see, Query Frontend got the cancellation and returned 499 status correctly (in 10s)
So the problem is even Query Frontend got the cancellation and responded correctly with 499 at 10s, the complete request cycle ran for more than 15s (this went upto 45s in some cluster depends on the traffic)
The text was updated successfully, but these errors were encountered:
LogQL request cancellation not propogated.
Problem.
When LogQL request is cancelled by the client(LogCLI or Grafana), the cancellation is received by Query Frontend, but not propagated to some of the downstream requests started by Query Frontend(eg to queriers mainly).
This leads to more resource conception in Loki because, even after original request is cancelled, the queriers started by the request is still running.
NOTE: Can happen for any LogQL queries
Steps to reproduce the issue.
NOTE: Here I use LogCLI, for two reasons
1. Make
count_over_time
query via LogCLIWe use
timeout 10s
that closes the client connection after 10s cancelling the request.Here we use
date | md5sum | cut -d\- -f1
for random ID in the query (easy to find the exact query).Above command will print random id. Copy that to search for that query to find it's traces.
2. Search for that query in Loki.
Go search for the query we made on Grafana Explore or LogCLI via following query.
fill-in
<random-id>
copied from the previous step.3. Tempo traces.
Optionally see the traces of the requests.
You can see it takes more than 10s (something was running after the cancelation)
Also from the traces, you can see, Query Frontend got the cancellation and returned 499 status correctly (in 10s)
So the problem is even Query Frontend got the cancellation and responded correctly with 499 at 10s, the complete request cycle ran for more than 15s (this went upto 45s in some cluster depends on the traffic)
The text was updated successfully, but these errors were encountered: