Add header enabling gzip downloads #3380

tartavull · 2017-05-06T14:10:51Z

Not really intended to be merged.
I'm just wondering if there is any other way of downloading gzip files that has the correct content-encoding.

dhermes · 2017-05-06T15:22:12Z

@tartavull Can you give a code snippet that doesn't work the way you expect (and explain what you do expect)?

tartavull · 2017-05-06T16:09:21Z

Upload a compress file

 def _compress(content):
        stringio = StringIO()
        gzip_obj = gzip.GzipFile(mode='wb', fileobj=stringio)
        gzip_obj.write(content)
        gzip_obj.close()
        return stringio.getvalue()
content = 'highly compressible string'
blob.upload_from_string(compress(content))
blob.content_encoding = "gzip"
blob.patch()

Downloading file

assert content == blob.download_as_string()

I expect that when the http request is made it has the header accept-encoding:gzip
otherwise the server will decompress the blob and send something much larger.

lukesneeringer · 2017-05-07T05:30:51Z

Sending Accept-Encoding: gzip does not seem like a hack to me. That is the entire point of the Accept-Encoding header. I would be willing to accept this, so long as it does not have problematic side effects (e.g. causing trouble for other compression formats or uncompressed files).

tartavull · 2017-05-09T17:11:12Z

I haven't seen problems when retrieving uncompressed files, but I haven't test other compression formats.

lukesneeringer · 2017-05-09T20:51:35Z

@dhermes Any concerns about this PR? I have no problem adding an Accept-Encoding header.

dhermes · 2017-05-09T20:55:18Z

I'd like to see what @thobrla or someone else from the Storage team says. I'm just not sure if it makes sense.

thobrla · 2017-05-09T22:11:09Z

It's reasonable (even preferable) to include this header, but it is a substantial semantic change. How does the library handle gzipped bytes in the response? Is it up to the caller to decompress them?

tartavull · 2017-05-09T22:21:31Z

https://github.com/GoogleCloudPlatform/google-auth-library-python-httplib2 takes care of decompression, no action required by the caller

dhermes · 2017-05-09T23:44:41Z

As does urllib3 / requests

thobrla · 2017-05-09T23:50:12Z

Seems fine, then. The caller has always gotten uncompressed bytes and they'll continue to get uncompressed bytes.

Out of curiosity, how were mid-download connection breaks handled for content-encoding:gzip objects previously?

tartavull · 2017-05-10T13:58:47Z

I think you will now get an IOError: CRC check failed exception
from https://github.com/python-git/python/blob/master/Lib/gzip.py#L316

lukesneeringer · 2017-05-10T14:34:02Z

@tartavull Can you update the unit tests that fail as a result of your change? Once that is done, we can accept this.

tartavull · 2017-05-17T18:53:26Z

Do you happen to know how

Warning, treated as error:
/var/code/gcp/docs/logging-usage.rst.rst:25:Over dedent has detected
nox > Command bash ./test_utils/scripts/update_docs.sh failed with exit code 1
nox > Session docs failed. :(
Exited with code 1

is related to the commit changes?

dhermes · 2017-05-17T19:01:38Z

@tartavull Sorry for being quiet here for way too long.

No the update_docs.sh failure is not your fault, it's related to the 1.6.1 release of Sphinx. Just rebase your branch on master and everything will be OK
I'd really love to ~~discussion~~ discuss some snippets of code with the current implementation that doesn't work as you expect and then contrast that with how the same code runs with the change in this PR

GET requests now contain the header accept-encoding:gzip This improves performance of compressible strings which were uploaded as gzips. The caller is not required to do any decompression because decompressiong is handle by the library. Confusing `IOError: CRC check failed` exceptions will be risen in the case of mid-download connection breaks.

tartavull · 2017-05-17T20:02:33Z

@dhermes looking forward to discuss them.

dhermes · 2017-05-17T20:41:26Z

@tartavull Do you have some examples? Or did you mean something else?

tartavull · 2017-05-17T21:35:02Z

I missed understand you, I don't have any snippets that this PR fixes, it is just a huge performance improvement, that's why I tagged the commit with "perf".
As shown in the snippet above, there are no changes in the output to the caller.

To provide you with a real use case, we store large 3d stacks of images in small 3d chunks.
These chunks are sized 512x512x64, and are of type uint64. Each uncompressed chunk is 128mb, but as you can see in the image they are mostly the same value so they compress to around 500kb.

dhermes · 2017-05-17T22:24:13Z

Ah I see. Let me play around a bit with this to try to "break things" / investigate the raw payloads.

In the meanwhile, you can check out the underlying library used for uploads (docs and source). You can pass in custom headers to an upload so this perf optimization would be usable immediately for you.

tartavull · 2017-05-31T20:48:20Z

Any luck breaking things?

tartavull · 2017-06-27T15:57:24Z

@dhermes Is there anything I can do to get this merged?

dhermes · 2017-06-27T17:31:04Z

Sorry @tartavull it fell off my plate of things to do! Really bad of me, eek.

I want to test this PR on real use cases before merging. In particular I'd like to test on two files:

File A: Plain text
File B: File A, but gzip-ed locally

and maybe some other cases I haven't thought of?

I just want to make sure this "does the right thing".

tartavull · 2017-07-11T03:47:40Z

@dhermes I understand. Let me know if you need any help from me.

lukesneeringer · 2017-08-03T21:19:35Z

Poke.

tartavull · 2017-08-05T00:15:39Z

This PR modifies a single line of code, and it reduces the traffic to google cloud storage for all your users downloading compressed files. For our particular case that's 250x reduction in bytes downloaded.
But after 90 days of created it hasn't being merged.

lukesneeringer · 2017-08-07T19:22:50Z

@dhermes At this point I want to just merge this. Giving you a chance to throw up a flag, but this should really get in.

dhermes · 2017-08-07T21:58:50Z

@lukesneeringer If you feel it's the right move, go for it.

I worry that it is partially incorrect, e.g. of the thing stored is gzipped then it will accidentally do the wrong thing.

lukesneeringer · 2017-08-08T17:43:49Z

I am not particularly worried. This is a really common pattern.

The `_make_transport` method is now spelled `_get_transport`.

googlebot · 2017-10-13T13:32:56Z

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that they're okay with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this State. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

[ci skip]

tartavull · 2017-10-13T14:19:37Z

Thanks :)

googlebot added the cla: yes This human has signed the Contributor License Agreement. label May 6, 2017

tseaver added api: storage Issues related to the Cloud Storage API. in progress type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels May 15, 2017

tartavull force-pushed the master branch 3 times, most recently from 8d02bcc to bc031ec Compare May 17, 2017 17:14

tartavull force-pushed the master branch from bc031ec to a7a8390 Compare May 17, 2017 19:57

Merge branch 'master' into master

9bee63f

Luke Sneeringer and others added 3 commits August 8, 2017 10:46

Merge branch 'public-master' into tartavull-master

3894ee4

Fix the syntax error caused by merge.

b4deea1

Fix merge conflict

12687da

The `_make_transport` method is now spelled `_get_transport`.

tseaver requested a review from lukesneeringer as a code owner October 13, 2017 13:32

googlebot added cla: no This human has *not* signed the Contributor License Agreement. and removed cla: yes This human has signed the Contributor License Agreement. labels Oct 13, 2017

Ugh, whitespace via in-browser editor

bb94f95

[ci skip]

tseaver self-requested a review October 13, 2017 14:13

tseaver approved these changes Oct 13, 2017

View reviewed changes

tseaver changed the title ~~A hack for enabling gzip downloads~~ Add header enabling gzip downloads Oct 13, 2017

tseaver merged commit 8762c34 into googleapis:master Oct 13, 2017

tseaver mentioned this pull request Oct 16, 2017

Prep 'storage-1.5.0' release. #4202

Merged

shumin mentioned this pull request Oct 19, 2017

Failing MD5 checksum with latest Storage release #4227

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add header enabling gzip downloads #3380

Add header enabling gzip downloads #3380

tartavull commented May 6, 2017

dhermes commented May 6, 2017

tartavull commented May 6, 2017

lukesneeringer commented May 7, 2017

tartavull commented May 9, 2017

lukesneeringer commented May 9, 2017

dhermes commented May 9, 2017

thobrla commented May 9, 2017

tartavull commented May 9, 2017

dhermes commented May 9, 2017

thobrla commented May 9, 2017

tartavull commented May 10, 2017 •

edited

Loading

lukesneeringer commented May 10, 2017

tartavull commented May 17, 2017

dhermes commented May 17, 2017 •

edited

Loading

tartavull commented May 17, 2017

dhermes commented May 17, 2017

tartavull commented May 17, 2017 •

edited

Loading

dhermes commented May 17, 2017

tartavull commented May 31, 2017

tartavull commented Jun 27, 2017

dhermes commented Jun 27, 2017

tartavull commented Jul 11, 2017

lukesneeringer commented Aug 3, 2017

tartavull commented Aug 5, 2017

lukesneeringer commented Aug 7, 2017

dhermes commented Aug 7, 2017

lukesneeringer commented Aug 8, 2017

googlebot commented Oct 13, 2017

tartavull commented Oct 13, 2017

Add header enabling gzip downloads #3380

Add header enabling gzip downloads #3380

Conversation

tartavull commented May 6, 2017

dhermes commented May 6, 2017

tartavull commented May 6, 2017

lukesneeringer commented May 7, 2017

tartavull commented May 9, 2017

lukesneeringer commented May 9, 2017

dhermes commented May 9, 2017

thobrla commented May 9, 2017

tartavull commented May 9, 2017

dhermes commented May 9, 2017

thobrla commented May 9, 2017

tartavull commented May 10, 2017 • edited Loading

lukesneeringer commented May 10, 2017

tartavull commented May 17, 2017

dhermes commented May 17, 2017 • edited Loading

tartavull commented May 17, 2017

dhermes commented May 17, 2017

tartavull commented May 17, 2017 • edited Loading

dhermes commented May 17, 2017

tartavull commented May 31, 2017

tartavull commented Jun 27, 2017

dhermes commented Jun 27, 2017

tartavull commented Jul 11, 2017

lukesneeringer commented Aug 3, 2017

tartavull commented Aug 5, 2017

lukesneeringer commented Aug 7, 2017

dhermes commented Aug 7, 2017

lukesneeringer commented Aug 8, 2017

googlebot commented Oct 13, 2017

tartavull commented Oct 13, 2017

tartavull commented May 10, 2017 •

edited

Loading

dhermes commented May 17, 2017 •

edited

Loading

tartavull commented May 17, 2017 •

edited

Loading