-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loki returned 500 and "rpc error: code = Canceled desc = context canceled" when handling large data query #3244
Comments
I updated the query from grafana...seems that even the total returned bytes was dropped...but it will still report the same error as long as it's a "long query". ts=2021-01-27T05:13:41.095467378Z caller=spanlogger.go:53 org_id=fake traceID=1312e304681820a1 method=query.Exec level=debug Ingester.TotalReached=1 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Store.TotalChunksRef=414 Store.TotalChunksDownloaded=200 Store.ChunksDownloadTime=1.008968582s Store.HeadChunkBytes="0 B" Store.HeadChunkLines=0 Store.DecompressedBytes="389 MB" Store.DecompressedLines=287565 Store.CompressedBytes="37 MB" Store.TotalDuplicates=0 |
One more strange thing...I found this error from loki logs and seems that I had this error everytime with duration close to 1m...is it configurable? I've already updated the grpc settings, and the querier...but nothing change to this magic number... Much appreciated if anyone can shed me a light here... THanks |
query_timeout can config it. |
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
Hi. any updates ? |
Hi! I was having the same problem and I opted for changing the way queries were being made. By default, parallel queries are active, so one big query splits into smaller ones and runs all them in parallel. At fist I was having the error of "too many outstanding requests", but after extending the maximum amount of parallel queries, dashboards took a while to load, but still get loaded. And sometimes it was not working, showing the error of this issue "context canceled". |
Describe the bug
Loki logged "rpc error: code = Canceled desc = context canceled" when more 1.8GB data fetched and more than 1m took by the query.
To Reproduce
Steps to reproduce the behavior:
{} term
can be referenced on the bottom
Expected behavior
Grafana can fetch and visualize large size of data (10+ GB?) from loki
Environment:
Infrastructure: [e.g., Kubernetes, bare-metal, laptop]
loki and grafana are deployed in Azure Kubernetes Service and SSD used for data storage
Deployment tool: [e.g., helm, jsonnet]
Helm
Screenshots, Promtail config, or terminal output
If applicable, add any output to help explain your problem.
logs of Loki
ts=2021-01-27T04:50:13.279487527Z caller=spanlogger.go:53 org_id=fake traceID=6d3a3617a9d10b65 method=query.Exec level=debug Ingester.TotalReached=2 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=0 Ingester.TotalLinesSent=0 Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Store.TotalChunksRef=1490 Store.TotalChunksDownloaded=800 Store.ChunksDownloadTime=137.986708ms Store.HeadChunkBytes="0 B" Store.HeadChunkLines=0 Store.DecompressedBytes="1.8 GB" Store.DecompressedLines=1349743 Store.CompressedBytes="174 MB" Store.TotalDuplicates=0
ts=2021-01-27T04:50:13.279600928Z caller=spanlogger.go:53 org_id=fake traceID=6d3a3617a9d10b65 method=query.Exec level=debug Summary.BytesProcessedPerSecond="26 MB" Summary.LinesProcessedPerSecond=19645 Summary.TotalBytesProcessed="1.8 GB" Summary.TotalLinesProcessed=1349743 Summary.ExecTime=1m8.703279889s
level=info ts=2021-01-27T04:50:13.279953831Z caller=metrics.go:83 org_id=fake traceID=6d3a3617a9d10b65 latency=slow query="(sum by (path) (sum_over_time({filename=
"/var/log/nginx/json_access.log",host="10.0.2.", tenant_id=~"sre-tj06"} |json | status = 200 | line_format "{{.request_uri}}" |regexp "(?P^\\/([^/])\\/)" | line_format "{{.path}} {{.upstream_response_time}}" | unwrap upstream_response_time | error="" [5m]))) / sum by (path)(count_over_time({filename="/var/log/nginx/json_access.log",host="10.0.2.", tenant_id=~"sre-tj06"} | json | line_format "{{.request_uri}}" |regexp "(?P^\\/([^/])\\/)"| path != "" | path!= "\\" | error="" [5m]))" query_type=metric range_type=range length=48h0m1s step=2m0s duration=1m8.703279889s status=500 throughput=26MB total_bytes=1.8GBlevel=warn ts=2021-01-27T04:50:13.280160032Z caller=logging.go:71 traceID=6d3a3617a9d10b65 msg="GET /loki/api/v1/query_range?direction=BACKWARD&limit=1359&query=(sum%20by%20(path)%20(sum_over_time(%7Bfilename%3D~%22%2Fvar%2Flog%2Fnginx%2Fjson_access.log%22%2Chost%3D~%2210.0.2.%22%2C%20tenant_id%3D~%22sre-tj06%22%7D%20%7Cjson%20%7C%20status%20%3D%20200%20%7C%20line_format%20%22%7B%7B.request_uri%7D%7D%22%20%7Cregexp%20%22(%3FP%3Cpath%3E%5E%5C%5C%2F(%5B%5E%2F%5D)%5C%5C%2F)%22%20%7C%20line_format%20%22%7B%7B.path%7D%7D%20%7B%7B.upstream_response_time%7D%7D%22%20%20%7C%20unwrap%20upstream_response_time%20%7C%20__error__%3D%22%22%20%5B5m%5D)))%20%2F%20sum%20by%20(path)(count_over_time(%7Bfilename%3D~%22%2Fvar%2Flog%2Fnginx%2Fjson_access.log%22%2Chost%3D~%2210.0.2.%22%2C%20tenant_id%3D~%22sre-tj06%22%7D%20%7C%20json%20%7C%20line_format%20%22%7B%7B.request_uri%7D%7D%22%20%7Cregexp%20%22(%3FP%3Cpath%3E%5E%5C%5C%2F(%5B%5E%2F%5D)%5C%5C%2F)%22%7C%20path%20!%3D%20%22%22%20%7C%20path!%3D%20%22%5C%5C%22%20%7C%20__error__%3D%22%22%20%5B5m%5D))&start=1611550083000000000&end=1611722884000000000&step=120 (500) 1m8.704357997s Response: "rpc error: code = Canceled desc = context canceled\n" ws: false; Accept: application/json, text/plain, /; Accept-Encoding: gzip, deflate, br; Accept-Language: en-US,en;q=0.5; User-Agent: Grafana/7.3.5; X-Forwarded-For: 222.126.245.76, 10.244.0.110, 10.244.0.110; X-Grafana-Org-Id: 1; X-Real-Ip: 222.126.245.76; X-Request-Id: e87774979c4ef03dfd79c99361bcf97f; X-Scheme: https; "
The text was updated successfully, but these errors were encountered: