Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle cancellations before requests #742

Closed

Conversation

colega
Copy link
Contributor

@colega colega commented Jan 13, 2022

What this PR does:

When scheduler worker is handling requests, if there's certain pressure on the loop, we can keep enqueueing requests instead of handling the cancellations. However, when there's that pressure on the loop, it's more important than never to properly cancel the requests.

This checks the cancellations channel first, and then checks the requests channel too.

Related to #741

Which issue(s) this PR fixes:

None, relates to #740

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

With previous implementation, if worker was busy talking to scheduler,
we didn't push the cancellation, keeping that query running.

When cancelling a query, all its subqueries are cancelled at the same
time, so this was most likely happening all the time (first subquery
scheduled on this worker was canceled, the rest were not because worker
was busy cancelling the first one).

Also removed the `<-ctx.Done()` escape point when waiting for the
enqueueing ACK and modified the enqueueing method to ensure that it
always responds something.

Fixes: #740
Inspired by: grafana/loki#5113

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
When scheduler worker is handling requests, if there's certain pressure
on the loop, we can keep enqueueing requests instead of handling the
cancellations. However, when there's that pressure on the loop, it's
more important than never to properly cancel the requests.

This checks the cancellations channel first, and then checks the
requests channel too.

Related to #741

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@colega
Copy link
Contributor Author

colega commented Jan 13, 2022

While implementing #741 I thought about this. WDYT @cyriltovena ?

@pstibrany
Copy link
Member

I think the problem is that if goroutine isn't waiting for cancellation at the exact moment when sender tries to send the cancellation, sender just gives up. I think we could instead add some buffer to cancellation channel or make it wait a bit longer. I'm not sure changing receiving part is the way to go, because if receiver (worker goroutine) is busy (eg. forwarding some request), sender will not send the cancellation request.

@colega
Copy link
Contributor Author

colega commented Jan 13, 2022

@pstibrany, yes, I already did that in #741. PTAL

This is just an extra I wanted to discuss. Also not sure about it, that's why it's draft and not included there.

Base automatically changed from increase-frontend-scheduler-worker-cancellation-channel-capacity to main January 14, 2022 08:34
@colega colega closed this Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants