-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BufferedProtocol support #2033
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2033 +/- ##
=============================================
- Coverage 92.070% 91.707% -0.363%
=============================================
Files 33 33
Lines 3102 3123 +21
Branches 542 544 +2
=============================================
+ Hits 2856 2864 +8
- Misses 166 180 +14
+ Partials 80 79 -1
Continue to review full report at Codecov.
|
for reference: https://bugs.python.org/issue32251 |
sanic/server.py
Outdated
self._time = current_time() | ||
self.recv_buffer += data | ||
|
||
if len(self.recv_buffer) > self.app.config.REQUEST_BUFFER_SIZE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After playing with this new code, I wonder if this line should be
if len(self.recv_buffer) >= self.app.config.REQUEST_BUFFER_SIZE
Heres one thing I observed:
- Set REQUEST_BUFFER_SIZE to 128 bytes.
- BufferedProtocol chooses
self._buffer
with size 128 (self.recv_buffer is unsized/unlimited). - Send a request to the app with 384 byte request size
- buffer_updated is called with nbytes 128
- 128 bytes is writren to recv_buffer
if nbytes > REQUEST_BUFFER_SIZE
is False, because 128 is not greater than 128- data_received is set, and http headers are read
- http protocol calls _receive_more(), tries to unpause the transport, but its not paused
- buffer_updated() is called again with nsize 128
- a new 128 bytes of data is written to self.recv_buffer
- len(self.recv_buffer) is now 256, so self.transport.pause_reading() is now executed.
- data_received is set, and further http headers are read
- http protocol calls _receive_more() again which unpauses the transport.
- repeats until all of the request is read and parsed by the HTTP protocol
So I'm wondering, should the transport be paused every time the recv_buffer size matches REQUEST_BUFFER_SIZE, or only when its greater than that?
What effect does pausing the transport even have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of pausing would be to allow for it to drain. It is more of an issue with streaming then sending complete responses. It is a protection from exhausting memory limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that said, you are probably correct that it should be >=
.
sanic/app.py
Outdated
@@ -3,7 +3,14 @@ | |||
import os | |||
import re | |||
|
|||
from asyncio import CancelledError, Protocol, ensure_future, get_event_loop | |||
|
|||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can have this in compat.py module, and don't need try/except blob in multiple places
# Python 3.7+ | ||
# -------------------------------------------- # | ||
|
||
def get_buffer(self, sizehint=-1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is sizehint
used for here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sizehint is the recommended minimum size for the returned buffer. It is acceptable to return smaller or larger buffers than what sizehint suggests. When set to -1, the buffer size can be arbitrary. It is an error to return a buffer with a zero size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are ignoring it here.
I'm going to do some further testing on this, to determine a recommendation to include it in 21.3 or not. |
Awesome 🤘 |
Ok.. like last time I tested it, the results are again inconclusive. Speeds are basically the same whether using BufferedProtocol or Protocol. So similar I thought there was something wrong with my testing methodology, but all seems to be good. I'm testing a basic hello-world Sanic application, with one server worker (no multiprocessing). My matrix was (BufferedProtocol,Protocol)(with_uvloop, without_uvloop)(with_restriction,without_restriction) giving a total of separate 8 tests. Every test gave 4400-4500 requests per second. Even with_uvloop and without_uvloop made no difference in my testing on my laptop with Python 3.8. Repeating all 8 tests above when restricting So I don't know. I like the BufferedProtocol implementation, its clean and correct and works well. It should be faster, but in the Sanic server it doesn't provide any speed advantage. |
Did you try with a large payload? What happens if the response is a large blob? Maybe throw in a couple tests at different sizes? However, these results pretty much confirm exactly my own results. And, I agree with your conclusion. It should be better, it seems like it. But, I think @Tronic streaming solution pretty much doubles up the effort, so we are already seeing that performance increase. So the Bufferedprotocol is doing the same thing except with two methods. |
Code Climate has analyzed commit 9e8520a and detected 0 issues on this pull request. The test coverage on the diff in this pull request is 93.7% (86% is the threshold). This pull request will bring the total coverage in the repository to 94.3% (0.0% change). View more on Code Climate. |
Continuing the conversation from #2183.
One thing that probably needs to be reexamined if we are going this direction is whether we want to continue writing to @Tronic Thoughts? |
self.recv_buffer = bytearray() | ||
self.recv_buffer = self.protocol.recv_buffer = bytearray() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still incorrect, and in general you should not recreate the buffer. Use del self.recv_buffer[:2]
or so to keep the existing buffer but to consume the data that you want removed (note: as discussed earlier, removing all or the first two bytes here might be is invalid anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still am not sure I see how. If we do not, and even if we account for the http loop, you end up with \r\n
as the first two bytes on the next request.
def buffer_updated(self, nbytes: int) -> None: | ||
data = self._buffer[:nbytes] | ||
|
||
self._time = current_time() | ||
self.recv_buffer += data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the get_buffer
function be returning a memoryview to recv_buffer
directly, rather than separate _buffer
, to avoid this copying of data here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, since we are concurrently receiving more data (so cannot alter the buffer) and handling a request (which needs to consume bytes from buffer), this cannot easily work. Which begs the question: why use BufferedProtocol in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was just my proof of concept to see if the BufferedProtocol would play more nicely with streaming requests. We probably need to consume the data somehow from the object returned by get_buffer
.
I am thinking perhaps what we might want to do is somehow sync on the buffer_updated
by grabbing the nbytes
value, and then using that as recv_buffer
instead of having that object. It is sort of duplicative at this point. This is a larger refactor though.
This could offer performance benefits if a single buffer was used. In this case The tricky part is that at some point the consumed bytes need to be actually removed from buffer and both memoryviews recreated accordingly, so that everything appears intact. I suppose that this update could be done within the |
self._buffer = memoryview( | ||
bytearray(self.app.config.REQUEST_BUFFER_SIZE) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a waste of RAM. The buffer should be kept as small as possible, not always the maximum permitted size. This makes a difference when handling a lot of parallel requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure where this value is coming from, but it is the size limit that get_buffer
seems to want 🤔
>> get_buffer sizehint=65536
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sizehint is hardcoded in libuv (and probably in a similar manner in plain asyncio). We can safely ignore the hint because libuv will then simply call recv_into
with a smaller buffer. For the sake of simplicity, it might be best to keep our buffer hardcoded to 64 KiB as well, at least until it can be evaluated whether BufferedProtocol offers those performance benefits that we want.
Resolves #1873
This makes use of the
asyncio.BufferedProtocol
.