-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add header enabling gzip downloads #3380
Conversation
@tartavull Can you give a code snippet that doesn't work the way you expect (and explain what you do expect)? |
Upload a compress file
Downloading file
I expect that when the http request is made it has the header |
Sending |
I haven't seen problems when retrieving uncompressed files, but I haven't test other compression formats. |
@dhermes Any concerns about this PR? I have no problem adding an |
I'd like to see what @thobrla or someone else from the Storage team says. I'm just not sure if it makes sense. |
It's reasonable (even preferable) to include this header, but it is a substantial semantic change. How does the library handle gzipped bytes in the response? Is it up to the caller to decompress them? |
https://github.com/GoogleCloudPlatform/google-auth-library-python-httplib2 takes care of decompression, no action required by the caller |
As does |
Seems fine, then. The caller has always gotten uncompressed bytes and they'll continue to get uncompressed bytes. Out of curiosity, how were mid-download connection breaks handled for content-encoding:gzip objects previously? |
I think you will now get an |
@tartavull Can you update the unit tests that fail as a result of your change? Once that is done, we can accept this. |
8d02bcc
to
bc031ec
Compare
Do you happen to know how
is related to the commit changes? |
@tartavull Sorry for being quiet here for way too long.
|
GET requests now contain the header accept-encoding:gzip This improves performance of compressible strings which were uploaded as gzips. The caller is not required to do any decompression because decompressiong is handle by the library. Confusing `IOError: CRC check failed` exceptions will be risen in the case of mid-download connection breaks.
@dhermes looking forward to discuss them. |
@tartavull Do you have some examples? Or did you mean something else? |
Ah I see. Let me play around a bit with this to try to "break things" / investigate the raw payloads. In the meanwhile, you can check out the underlying library used for uploads (docs and source). You can pass in custom headers to an upload so this perf optimization would be usable immediately for you. |
Any luck breaking things? |
@dhermes Is there anything I can do to get this merged? |
Sorry @tartavull it fell off my plate of things to do! Really bad of me, eek. I want to test this PR on real use cases before merging. In particular I'd like to test on two files:
and maybe some other cases I haven't thought of? I just want to make sure this "does the right thing". |
@dhermes I understand. Let me know if you need any help from me. |
Poke. |
This PR modifies a single line of code, and it reduces the traffic to google cloud storage for all your users downloading compressed files. For our particular case that's 250x reduction in bytes downloaded. |
@dhermes At this point I want to just merge this. Giving you a chance to throw up a flag, but this should really get in. |
@lukesneeringer If you feel it's the right move, go for it. I worry that it is partially incorrect, e.g. of the thing stored is gzipped then it will accidentally do the wrong thing. |
I am not particularly worried. This is a really common pattern. |
The `_make_transport` method is now spelled `_get_transport`.
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that they're okay with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the |
[ci skip]
Thanks :) |
Not really intended to be merged.
I'm just wondering if there is any other way of downloading gzip files that has the correct content-encoding.