Add BufferedProtocol support #2033

ahopkins · 2021-02-17T18:25:07Z

Resolves #1873

This makes use of the asyncio.BufferedProtocol.

codecov · 2021-02-17T18:26:21Z

Codecov Report

Merging #2033 (12d15dc) into master (b461cc3) will decrease coverage by 0.363%.
The diff coverage is 78.261%.

@@              Coverage Diff              @@
##            master     #2033       +/-   ##
=============================================
- Coverage   92.070%   91.707%   -0.363%     
=============================================
  Files           33        33               
  Lines         3102      3123       +21     
  Branches       542       544        +2     
=============================================
+ Hits          2856      2864        +8     
- Misses         166       180       +14     
+ Partials        80        79        -1

Impacted Files	Coverage Δ
sanic/app.py	`91.593% <60.000%> (-0.371%)`	⬇️
sanic/server.py	`86.096% <83.333%> (-1.859%)`	⬇️
sanic/websocket.py	`78.022% <0.000%> (-4.396%)`	⬇️
sanic/handlers.py	`97.938% <0.000%> (+2.062%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b461cc3...12d15dc. Read the comment docs.

ahopkins · 2021-02-17T19:49:50Z

for reference: https://bugs.python.org/issue32251

sanic/server.py

ashleysommer · 2021-02-23T03:13:18Z

sanic/server.py

+        self._time = current_time()
+        self.recv_buffer += data
+
+        if len(self.recv_buffer) > self.app.config.REQUEST_BUFFER_SIZE:


After playing with this new code, I wonder if this line should be

if len(self.recv_buffer) >= self.app.config.REQUEST_BUFFER_SIZE

Heres one thing I observed:

Set REQUEST_BUFFER_SIZE to 128 bytes.

BufferedProtocol chooses self._buffer with size 128 (self.recv_buffer is unsized/unlimited).

Send a request to the app with 384 byte request size

buffer_updated is called with nbytes 128

128 bytes is writren to recv_buffer

if nbytes > REQUEST_BUFFER_SIZE is False, because 128 is not greater than 128

data_received is set, and http headers are read

http protocol calls _receive_more(), tries to unpause the transport, but its not paused

buffer_updated() is called again with nsize 128

a new 128 bytes of data is written to self.recv_buffer

len(self.recv_buffer) is now 256, so self.transport.pause_reading() is now executed.

data_received is set, and further http headers are read

http protocol calls _receive_more() again which unpauses the transport.

repeats until all of the request is read and parsed by the HTTP protocol

So I'm wondering, should the transport be paused every time the recv_buffer size matches REQUEST_BUFFER_SIZE, or only when its greater than that?
What effect does pausing the transport even have?

The point of pausing would be to allow for it to drain. It is more of an issue with streaming then sending complete responses. It is a protection from exhausting memory limits.

With that said, you are probably correct that it should be >=.

yunstanford · 2021-02-26T19:16:10Z

sanic/app.py

@@ -3,7 +3,14 @@
 import os
 import re

-from asyncio import CancelledError, Protocol, ensure_future, get_event_loop
+
+try:


maybe we can have this in compat.py module, and don't need try/except blob in multiple places

yunstanford · 2021-02-26T19:17:50Z

sanic/server.py

+    # Python 3.7+
+    # -------------------------------------------- #
+
+    def get_buffer(self, sizehint=-1):


what is sizehint used for here ?

sizehint is the recommended minimum size for the returned buffer. It is acceptable to return smaller or larger buffers than what sizehint suggests. When set to -1, the buffer size can be arbitrary. It is an error to return a buffer with a zero size.

Source

We are ignoring it here.

ashleysommer · 2021-03-04T22:27:25Z

I'm going to do some further testing on this, to determine a recommendation to include it in 21.3 or not.

ahopkins · 2021-03-04T23:27:00Z

Awesome 🤘

ashleysommer · 2021-03-04T23:53:51Z

Ok.. like last time I tested it, the results are again inconclusive.

Speeds are basically the same whether using BufferedProtocol or Protocol. So similar I thought there was something wrong with my testing methodology, but all seems to be good.

I'm testing a basic hello-world Sanic application, with one server worker (no multiprocessing).
I tried different scenarios too, specifically with uvloop and without uvloop, and with an artificial receive-buffer restriction (to trigger stream pause and resume) and without a restriction.
Using wrk with 8 connections across 8 threads to hammer the server for 30 seconds per test.

My matrix was (BufferedProtocol,Protocol)(with_uvloop, without_uvloop)(with_restriction,without_restriction) giving a total of separate 8 tests.

Every test gave 4400-4500 requests per second. Even with_uvloop and without_uvloop made no difference in my testing on my laptop with Python 3.8.

Repeating all 8 tests above when restricting wrk to 1 connection at a time on 1 thread. Again the results were all within margin of error. Results were all 2200-2300 requests per second.

So I don't know. I like the BufferedProtocol implementation, its clean and correct and works well. It should be faster, but in the Sanic server it doesn't provide any speed advantage.

ahopkins · 2021-03-05T07:28:15Z

Did you try with a large payload? What happens if the response is a large blob? Maybe throw in a couple tests at different sizes?

However, these results pretty much confirm exactly my own results. And, I agree with your conclusion. It should be better, it seems like it. But, I think @Tronic streaming solution pretty much doubles up the effort, so we are already seeing that performance increase. So the Bufferedprotocol is doing the same thing except with two methods.

codeclimate · 2021-07-08T18:23:18Z

Code Climate has analyzed commit 9e8520a and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 93.7% (86% is the threshold).

This pull request will bring the total coverage in the repository to 94.3% (0.0% change).

View more on Code Climate.

ahopkins · 2021-07-08T18:30:43Z

Continuing the conversation from #2183.

BufferedProtocol seems to work for no body, single body, and chunked body requests when sent individually or pipelined.

One thing that probably needs to be reexamined if we are going this direction is whether we want to continue writing to recv_buffer the same way, or use a different sync primitive for control.

@Tronic Thoughts?

Tronic · 2021-07-08T18:54:57Z

sanic/http.py

-                self.recv_buffer = bytearray()
+                self.recv_buffer = self.protocol.recv_buffer = bytearray()


This is still incorrect, and in general you should not recreate the buffer. Use del self.recv_buffer[:2] or so to keep the existing buffer but to consume the data that you want removed (note: as discussed earlier, removing all ~~or the first two~~ bytes here ~~might be~~ is invalid anyway).

I still am not sure I see how. If we do not, and even if we account for the http loop, you end up with \r\n as the first two bytes on the next request.

Tronic · 2021-07-08T19:02:42Z

sanic/server.py

+    def buffer_updated(self, nbytes: int) -> None:
+        data = self._buffer[:nbytes]
+
+        self._time = current_time()
+        self.recv_buffer += data


Shouldn't the get_buffer function be returning a memoryview to recv_buffer directly, rather than separate _buffer, to avoid this copying of data here?

However, since we are concurrently receiving more data (so cannot alter the buffer) and handling a request (which needs to consume bytes from buffer), this cannot easily work. Which begs the question: why use BufferedProtocol in the first place?

Yes, this was just my proof of concept to see if the BufferedProtocol would play more nicely with streaming requests. We probably need to consume the data somehow from the object returned by get_buffer.

I am thinking perhaps what we might want to do is somehow sync on the buffer_updated by grabbing the nbytes value, and then using that as recv_buffer instead of having that object. It is sort of duplicative at this point. This is a larger refactor though.

Tronic · 2021-07-08T19:13:45Z

This could offer performance benefits if a single buffer was used. In this case get_buffer needs to return a memoryview to a location just past the end of the space already used, and recv_buffer should be another memoryview from the currently unconsumed position (zero if no bytes currently in buffer have been consumed yet) to the end of data received (i.e. the same position where the memoryview of get_buffer begins).

The tricky part is that at some point the consumed bytes need to be actually removed from buffer and both memoryviews recreated accordingly, so that everything appears intact. I suppose that this update could be done within the get_buffer function.

Tronic · 2021-07-08T19:20:00Z

sanic/server.py

+            self._buffer = memoryview(
+                bytearray(self.app.config.REQUEST_BUFFER_SIZE)
+            )


This is a waste of RAM. The buffer should be kept as small as possible, not always the maximum permitted size. This makes a difference when handling a lot of parallel requests.

I am not sure where this value is coming from, but it is the size limit that get_buffer seems to want 🤔

>> get_buffer sizehint=65536

The sizehint is hardcoded in libuv (and probably in a similar manner in plain asyncio). We can safely ignore the hint because libuv will then simply call recv_into with a smaller buffer. For the sake of simplicity, it might be best to keep our buffer hardcoded to 64 KiB as well, at least until it can be evaluated whether BufferedProtocol offers those performance benefits that we want.

Add BufferedProtocol support

607b47a

ahopkins added 5 commits February 17, 2021 21:57

Use configc value for buffer prealloc

0f4c045

Resolve type annotation

3284f97

py36 typing ignore

9bbf2a7

Merge branch 'master' into bufferedprotocol

8d25058

Merge branch 'master' into bufferedprotocol

12d15dc

ashleysommer reviewed Feb 23, 2021

View reviewed changes

sanic/server.py Outdated Show resolved Hide resolved

ashleysommer reviewed Feb 23, 2021

View reviewed changes

yunstanford reviewed Feb 26, 2021

View reviewed changes

ahopkins added the needs investigation label Mar 15, 2021

ahopkins marked this pull request as draft March 15, 2021 10:05

ahopkins added this to the v21.6 milestone Mar 15, 2021

Base automatically changed from master to main March 23, 2021 00:47

ahopkins modified the milestones: v21.6, v21.9 Jun 20, 2021

ahopkins added 2 commits July 8, 2021 21:08

update from main

4f79f31

Drain for streaming requests

6cd7673

ahopkins mentioned this pull request Jul 8, 2021

Manually reset the buffer when streaming request body #2183

Merged

Fix some linting issues

9e8520a

Tronic reviewed Jul 8, 2021

View reviewed changes

ahopkins modified the milestones: v21.9, Future Release Aug 8, 2021

ahopkins closed this Sep 29, 2021

ahopkins deleted the bufferedprotocol branch February 23, 2022 08:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BufferedProtocol support #2033

Add BufferedProtocol support #2033

ahopkins commented Feb 17, 2021

codecov bot commented Feb 17, 2021 •

edited

Loading

ahopkins commented Feb 17, 2021

ashleysommer Feb 23, 2021

ahopkins Feb 24, 2021

ahopkins Feb 24, 2021

yunstanford Feb 26, 2021

yunstanford Feb 26, 2021

ahopkins Feb 28, 2021

ahopkins Feb 28, 2021

ashleysommer commented Mar 4, 2021

ahopkins commented Mar 4, 2021

ashleysommer commented Mar 4, 2021

ahopkins commented Mar 5, 2021

codeclimate bot commented Jul 8, 2021

ahopkins commented Jul 8, 2021 •

edited

Loading

Tronic Jul 8, 2021 •

edited

Loading

ahopkins Jul 8, 2021

Tronic Jul 8, 2021

Tronic Jul 8, 2021

ahopkins Jul 8, 2021 •

edited

Loading

Tronic commented Jul 8, 2021 •

edited

Loading

Tronic Jul 8, 2021

ahopkins Jul 8, 2021

Tronic Jul 9, 2021

		self.recv_buffer = bytearray()
		self.recv_buffer = self.protocol.recv_buffer = bytearray()

Add BufferedProtocol support #2033

Add BufferedProtocol support #2033

Conversation

ahopkins commented Feb 17, 2021

codecov bot commented Feb 17, 2021 • edited Loading

Codecov Report

ahopkins commented Feb 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ashleysommer commented Mar 4, 2021

ahopkins commented Mar 4, 2021

ashleysommer commented Mar 4, 2021

ahopkins commented Mar 5, 2021

codeclimate bot commented Jul 8, 2021

ahopkins commented Jul 8, 2021 • edited Loading

Tronic Jul 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahopkins Jul 8, 2021 • edited Loading

Choose a reason for hiding this comment

Tronic commented Jul 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Feb 17, 2021 •

edited

Loading

ahopkins commented Jul 8, 2021 •

edited

Loading

Tronic Jul 8, 2021 •

edited

Loading

ahopkins Jul 8, 2021 •

edited

Loading

Tronic commented Jul 8, 2021 •

edited

Loading