-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes some bugs in the limit connection feature #2964
Conversation
There are two places which try to release the next waiter, when the ongoing connection is being canceled and we call explicitly to wake up a waiter or when the connection is being released. In both situations we call the same method `_release_waiter` that will iterate per all of the queues till found one that has a waiter that is not already canceled. This iteration will also check that the limit and the limit per host is not overcomed. The logic to check how many available connections/free slots there are has been placed behind the `_available_connections` function. So far this PR fixes: - Take into account the limit and limit per host at any time that we try to wake up a waiter. No availables, no wake up. - Dont give up till reach a waiter that is not finished. - Iterate always through all of the different hosts. What is missing: - More coverage - Shuffle the keys to give always the same chances to all of the hosts?
Having the concern of #1821, no degradation perceived with this PR imit_semaphore:
Responses: Counter({200: 9999})
Took: 20.797451972961426 seconds
limit_builtin:
Responses: Counter({200: 9999})
Took: 20.224246501922607 seconds |
Also worth it if @fafhrd91 can take a look. |
Codecov Report
@@ Coverage Diff @@
## master #2964 +/- ##
==========================================
+ Coverage 97.98% 98.16% +0.17%
==========================================
Files 40 41 +1
Lines 7592 7842 +250
Branches 1323 1392 +69
==========================================
+ Hits 7439 7698 +259
+ Misses 51 47 -4
+ Partials 102 97 -5
Continue to review full report at Codecov.
|
I'm at a conference, sorry. |
will try to carve out some time too later today, was working on a bunch of aiobotocore PRs that just landed, am hoping to switch to that and this fix in prod for the perf improvement. |
btw I was thinking, should a "FIFO semaphore" abstraction be refactored from this? I think it would make the main code clearer. |
@thehesiod What "FIFO semaphore" abstraction is? |
aiohttp/connector.py
Outdated
# remove a waiter even if it was cancelled, normally it's | ||
# removed when it's notified | ||
try: | ||
waiters.remove(fut) | ||
except ValueError: # fut may no longer be in list | ||
pass | ||
|
||
if not waiters: | ||
if key in self._waiters and not self._waiters[key]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the case when waiters is not self._waiters[key]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing, fixed!
aiohttp/connector.py
Outdated
# Having the dict keys ordered this avoids to iterate | ||
# at the same order at each call. | ||
queues = list(self._waiters.keys()) | ||
random.shuffle(queues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need shuffling?
The operation is not free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that hurts me. But without the shuffle the iteration of the queues will be done always in the same order, having chances of starvation for certain queues.
We could try to get rid of that operation, but it will imply some important changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of @thehesiod of having a "true" FIFO would get rid of this operation, but my gut feeling says that this FIFO queue will have enough footprint - has to take care of the limit per host - to make it slower than just a call shuffle
.
PD: a shuffle operation takes around 1 microsecond for a list of 10 elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waiters
is a deque
.
deque.rotate()
is cheap. Can we use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember that _waiters
is the dictionary that has the queues. So, unfortunately we can't use it.
We might implement the FIFO on top a unique deque
using the rotate
method as a way to skip a waiter that can't be deallocated because of the limit per host restriction. I didn't know but the slice of the head element is O(1), so the check and later the skip can be done without losing the deque properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. The PR's code is ok to me.
aiohttp/connector.py
Outdated
waiter = waiters.popleft() | ||
if not waiter.done(): | ||
waiter.set_result(None) | ||
found = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't break
enough? It saves a check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The break
does not work. Anyway changed to avoid the found
check duplicating some code.
@asvetlov basically what this code is trying to do is be a semaphore on the connectors, with the caveat that the waiters are released in order. So why I said a FIFO Semaphore :) |
@thehesiod got you |
aiohttp/connector.py
Outdated
|
||
return | ||
|
||
if not waiters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition is always true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean it would be nice to have a test case where waiters become non-empty after releasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it doesn't make sense right now let me get rid of this part of the code and I will check the tests.
# remove a waiter even if it was cancelled, normally it's | ||
# removed when it's notified | ||
try: | ||
waiters.remove(fut) | ||
except ValueError: # fut may no longer be in list | ||
pass | ||
|
||
raise e | ||
finally: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asvetlov can you check this new change? My concern with the previous implementation was because having a del self._waiters[key]
in two places was a tick bomb and hard to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
@asvetlov any clue about how can I increase the coverage? |
I think how everything is perfect. |
thank you, @pfreixes |
Thanks to you, any chance to get a new bugfix release? |
sorry I didn't have a chance to help review, been in a killer time sink project |
@thehesiod no problem. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs. |
There are two places which both try to release the next waiter. When the ongoing connection is being canceled calling explicitly to wake up a waiter and when the connection is being released, either because was called explicitly or implicitly due to the close of a response.
In both situations, we call the same method
_release_waiter
that will iterate per all of thequeues till found one that has a waiter that is not already canceled.
This iteration will also check that the limit and the limit per host is not overcome. The logic to check how many available connections/free slots there are has been placed behind the new
_available_connections
function.So far this PR fixes:
to wake up a waiter. No connection available, no wake-up.