-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix request cancellation when frontend worker is busy #740
Comments
colega
added a commit
that referenced
this issue
Jan 13, 2022
With previous implementation, if worker was busy talking to scheduler, we didn't push the cancellation, keeping that query running. When cancelling a query, all its subqueries are cancelled at the same time, so this was most likely happening all the time (first subquery scheduled on this worker was canceled, the rest were not because worker was busy cancelling the first one). Also removed the `<-ctx.Done()` escape point when waiting for the enqueueing ACK and modified the enqueueing method to ensure that it always responds something. Fixes: #740 Inspired by: grafana/loki#5113 Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
This was referenced Jan 13, 2022
pstibrany
added a commit
that referenced
this issue
Jan 14, 2022
* Increase scheduler worker cancellation chan cap With previous implementation, if worker was busy talking to scheduler, we didn't push the cancellation, keeping that query running. When cancelling a query, all its subqueries are cancelled at the same time, so this was most likely happening all the time (first subquery scheduled on this worker was canceled, the rest were not because worker was busy cancelling the first one). Also removed the `<-ctx.Done()` escape point when waiting for the enqueueing ACK and modified the enqueueing method to ensure that it always responds something. Fixes: #740 Inspired by: grafana/loki#5113 Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Remove comment about chan memory usage Co-authored-by: Peter Štibraný <peter.stibrany@grafana.com> * Update test comment Co-authored-by: Peter Štibraný <peter.stibrany@grafana.com> * Add resp.Error to the log when response is unknown Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Log the entire uknown response Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Co-authored-by: Peter Štibraný <peter.stibrany@grafana.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
When we try to cancel an enqueued request:
mimir/pkg/frontend/v2/frontend.go
Lines 227 to 236 in e2e2e10
We have a default clause which we will follow if worker is busy sending or cancelling another request:
http://github.com/grafana/mimir/blob/e2e2e100ec14e7707d1733908e61c83c8b87ef40/pkg/frontend/v2/frontend_scheduler_worker.go#L285-L286
So sometimes the requests are processed in the queriers even when the client has canceled the request.
This is the same as fixed in loki in grafana/loki#5113
We should apply a similar fix.
To Reproduce
See test in grafana/loki#5113
Expected behavior
All upstream requests should be canceled when downstream cancels the originating request.
The text was updated successfully, but these errors were encountered: