Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Starlette compression Middleware on top of Cramjam #1

Merged
merged 16 commits into from
Sep 8, 2021
Merged

Conversation

vincentsarago
Copy link
Member

No description provided.

@vincentsarago
Copy link
Member Author

Fails tests for multiple reasons.

Locally the non-streamed responses works but I'm having hard time figuring how to deal with streaming response.

cc @kylebarron

@vincentsarago vincentsarago changed the title WIP [WIP] Create Starlette compression Middleware on top of Cramjam May 26, 2021
@vincentsarago
Copy link
Member Author

Status: streaming response works for GZIP but not for the other compression 🤷‍♂️ I have no idea what's going on 😭

@milesgranger
Copy link
Contributor

@vincentsarago
I think this is because with GZIP, it's typically okay to have multiple streams/members within a stream. Deflate and Brotli on the other hand do not. Having multiple streams in a single buffer will result in both, just decoding the first stream and acting like everything is fine.

ie

>>> import brotli
>>> out1 = brotli.compress(b'foo')
>>> out2 = brotli.compress(b'bar')
>>> brotli.decompress(out1 + out2)
b'foo'  # Only decodes the first stream in the buffer

>>> import gzip
>>> out1 = gzip.compress(b'foo')
>>> out2 = gzip.compress(b'bar')
>>> gzip.decompress(out1 + out2)
b'foobar'  # gzip happily does both streams
>>> 

# cramjam should expose functionality similar to brotli.Compressor:
>>> compressor = brotli.Compressor()
>>> compressor.compress(b'foo')
b''
>>> compressor.compress(b'bar')
b''
>>> out = compressor.finish()
>>> brotli.decompress(out)
b'foobar'
>>>

I'll start work in cramjam to support exporting Compressor objects that can accept being written multiple times. Because cramjam.brotli.compress expects the full buffer which is to be compressed.

In the mean time, you could use cramjam.Buffer and write to that during the streaming of data and after the last byte, de/compress the cramjam.Buffer object; it'll be a bit more performant than concatenating bytes but of course won't compress anything between writes.

>>> buf = cramjam.Buffer()
>>> buf.write(b'foo')
3
>>> buf.write(b'bar')
3
>>> buf.seek(0)
0
>>> out = cramjam.brotli.compress(buf)
>>> out
cramjam.Buffer(len=10)
>>> bytes(cramjam.brotli.decompress(out))
b'foobar'
>>> 

I don't exactly have a lot of time to add this stuff, but there is some time. 🙂
Sorry for the headache, hopefully, this helps.

@vincentsarago
Copy link
Member Author

🤯 Thanks for jumping on this @milesgranger, it's really appreciated 🙏

I also don't have time right now but those are precious information 👼

@vincentsarago
Copy link
Member Author

sorry I'm just re-reading this @milesgranger. You are saying that Brotli and Deflate DO NOT support multiple streams?

so even if you make it work in cramjam it won't be good enough for the project because we need to make sure our response can be decoded by any client.

@vincentsarago
Copy link
Member Author

🤔 it's interesting because streaming response works with brotli-asgi, it just calls a .flush and .finish method https://github.com/fullonic/brotli-asgi/blob/deb8a927b2328132b56f514b897f2aa8e80e716a/brotli_asgi/__init__.py#L156-L158

@milesgranger
Copy link
Contributor

milesgranger commented Sep 2, 2021

Exactly right. You'll notice that cramjam doesn't expose any object with flush/finish methods. These tie off/finish the end of a stream. In cramjam's case, the entire buffer is expected, or to put it another way, it is stateless; the whole buffer is expected and treated as the output stream. Implementations like brotli have both brotli.compress as well as brotli.Compressor. At present, one cannot incrementally add to the stream like one can with brotli.Compressor for example. Other implementations like python-snappy use snappy.stream_compress which expects buffer protocol input/output objects, but the effect is the same.

I worked on it a bit last night and will likely be finished within the week.

@vincentsarago
Copy link
Member Author

Thanks now I understand :D

@milesgranger
Copy link
Contributor

I've merged in and made a pre-release for cramjam with a Compressor object similar to brotli's implementation.
When you have time, could you give it a go and let me know if you have any trouble?

pip install cramjam==2.4.0-rc1

I'll update docs later, but for now this is an example:

>>> import cramjam
>>> compressor = cramjam.brotli.Compressor()
>>> compressor.compress(b'foo')
3
>>> compressor.compress(b'bar')
3
>>> compressed = compressor.finish()
>>> bytes(compressed)
b'\x8b\x02\x80foobar\x03'
>>> decompressed = cramjam.brotli.decompress(compressed)
>>> bytes(decompressed)
b'foobar'
>>> 

@vincentsarago
Copy link
Member Author

that's awesome @milesgranger
To be honest I have still difficulties to understand how to use this new method within the middleware we are building.

if we look at https://github.com/fullonic/brotli-asgi/blob/deb8a927b2328132b56f514b897f2aa8e80e716a/brotli_asgi/__init__.py#L144-L146 and https://github.com/fullonic/brotli-asgi/blob/deb8a927b2328132b56f514b897f2aa8e80e716a/brotli_asgi/__init__.py#L156-L158 it seems that the brotlimethod (compress) returns the compressed object (while the Compressor returns a Integer)

Again, ASGI streaming response are not something I master and I'm quite confused but if we look at the brotli asgi I think what it does is

  1. compress some bytes
  2. call a .flush() method (part of the Compressor object)
  3. if there are no more body to return: call a .finish() method.

I absolutely don't know what the flush nor the finish() method do 🤷

@milesgranger
Copy link
Contributor

No worries, let's see if we can get this squared away.

I don't expose .flush b/c one, IMO the flush here in brotli, does something somewhat unexpected. .flush should push everything in any intermediate buffer into the encoder/compressor. In brotli's case, it looks like it just outputs whatever has been compressed thus far. Then .finish does the final call to append the ending bytes to signify the end of the stream: \x03
Example:

>>> import brotli
>>> compressor = brotli.Compressor()
>>> compressor.compress(b'foo')
b''
>>> compressor.compress(b'bar')
b''
>>> compressor.flush()
b'\x8b\x02\x80foobar'
>>> out = compressor.finish()
>>> out
b'\x03'

Meanwhile, in the Rust implementation, the flush method returns a Result<()>, the () indicating there is no output, but rather indicating as mentioned before, any intermediate bytes have been written to the encoder. In cramjam finish returns the completed stream.

ie.

>>> import cramjam
>>> compressor = cramjam.brotli.Compressor()
>>> compressor.compress(b'foo')
3
>>> compressor.compress(b'bar')
3
>>> out = compressor.finish()
>>> bytes(out)
b'\x8b\x02\x80foobar\x03'

Note! It seems brotli does exactly this as well, should you just call finish

>>> compressor = brotli.Compressor()
>>> compressor.compress(b'foo')
b''
>>> compressor.compress(b'bar')
b''
>>> compressor.finish()
b'\x8b\x02\x80foobar\x03'
>>> 

In your case, it seems you'd just not need to call .flush. Perhaps there is a good reason I'm not aware of for doing this, but if you want the same behavior, it appears you'll just have to be aware that the finish on cramjam (and with brotli, if not calling flush) will give the entire compressed stream, beginning to end.

@milesgranger
Copy link
Contributor

I took a closer look and see that you would need access to what is currently compressed at arbitrary points, and flush, although not working the way I would expect it to, it accomplishes this for you well. I can see I at least need to add some way to reference the current buffer as well as flush. Basically seems I've implemented the streaming input but you also need streaming output; seems obvious now, sorry about that. :-)

@vincentsarago
Copy link
Member Author

❤️ no worries @milesgranger and thanks for your time I really appreciate 🙏

@vincentsarago
Copy link
Member Author

@milesgranger thanks for milesgranger/cramjam#68 🙏
let me see how it goes 😄

@milesgranger
Copy link
Contributor

My pleasure, also see #2 and see if that helps. :-)

@vincentsarago
Copy link
Member Author

🤦 I didn't get the notification for #2 Thanks a lot 😄

@vincentsarago vincentsarago changed the title [WIP] Create Starlette compression Middleware on top of Cramjam Create Starlette compression Middleware on top of Cramjam Sep 8, 2021
@vincentsarago vincentsarago merged commit 170d6d8 into main Sep 8, 2021
@vincentsarago vincentsarago deleted the dev branch October 28, 2022 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants