-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aiohttp decompresses 'Content-Encoding: gzip' responses #4462
Comments
see #1992 |
I'm against. Because it predates RFC. I don't see any real case where we should not decompress automatically. The Content-Encoding header is not related to data contents actually. It is connected with a way data is transferred over HTTP. HTTP servers that serve data that supposed to be used in compressed form, should not set I'm closing the issue. Please reopen if you don't think so. |
there is a real case, where you're downloading a gzip file and you want to store the gzip file. I think this is already supported though if you follow the issue I linked to |
@socketpair I'm not sure what you mean by "it predating RFC". Please note that I'm not against having an option to decompress bodies with a compresseion CE since there absolutely are valid use cases for that. I'm arguing that this decompression should not happen by default. @thehesiod Without having tested it, according to the comments in your PR #2110, that disables decoding of both content and transfer encoding. While I'm sure there are use cases for that, in my opinion, only content encoding should be disabled. |
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding The Content-Encoding entity header is used to compress the media-type. When present, its value indicates which encodings were applied to the entity-body. It lets the client know how to decode in order to obtain the media-type referenced by the Content-Type header. If someone wants to serve .tar.gz, he MUST NOT use "Content-Encoding: gzip". For example:
I would revert #2110 . I don't see any reasons to have such an option. If S3 server (or its settings) is wrong, we should we care? For example, this is a hack: Here you are uploading pre-gzipped contents, instruct S3 to add header and then expect a browser to decompress. It works. Everything OK. But if you want to download gzipped contents (you possibly need it for the S3 protocol) you have to force server not to send headers you set up before. Because these headers are for end-users, not for S3-aware programs. S3-aware programs should use https://botocore.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html |
No, serving a From section 7.2.1 of RFC 2616 (emphasis mine):
Serving it with |
@JustAnotherArchivist I've updated the comment, sorry for the long delay. |
I think that S3 usage is okay, actually. The browser decompresses it if the file is used in a |
Regarding Well. We may argue more and more. Aiohttp has P.S. The reason why I reopen the issue: I figured out, that browsers (at least, Firefox) DOES NOT decompress such responses when they download URL (not render). @asvetlov I don't know what to do. Both points of view have pros and cons. RFC actually is not clear enough. |
@socketpair I agree with the original bug reporter, he is correct. This is a bug. You should fix it. This is a bug impacting Funtoo Linux due to http://cdn.postfix.johnriley.me/mirrors/postfix-release/official/postfix-3.6.1.tar.gz. This is a CDN serving a gzip-compressed tarball. The artifact being served over HTTP is "postfix-3.6.1.tar.gz". Relevant headers are: Content-Encoding: gzip aiohttp returns a 21MB file, uncompressed. In our case, getting the exact binary data of the postfix-3.6.1.tar.gz file is essential -- gzip is not just a transport feature (aka Transfer-Encoding) to reduce bandwidth but we are requesting the contents of "the gzip compressed file" and we need it to match perfectly what is on the origin server because we must compute SHA512 digest of this file. So we need the original compressed version -- we can't recompress it because our gzip might use different settings and get a different binary and thus different SHA512. What you should do by default is not transparently decompress based on Content-Encoding. CE is for data, TE is for transfer. End of story. If you need to support this, you should add a feature for auto-decompression of content-encoded gzip which could be useful in some cases but it should not be the default behavior. We must respect the distinct meaning of CE and TE, and the concepts of "content" and "transfer". When I download, I want the "content". Upstream bug: https://bugs.funtoo.org/browse/FL-8477 |
Looking into this more, it seems like the reality of HTTP servers and their handling of Content-Encoding is not always according to what we would hope, or in alignment with HTTP specs. A workaround for our specific issue was to specify "Accept-Encoding: identity" in the request, which informed the CDN to send the file as-is with any Content-Encoding (it is already compressed on disk, so this is fine) thus avoiding the issue. |
I've used the following workaround on a server that is wrong in guessing content encoding via async def on_response_prepare(request, response):
if not response.compression:
response.headers.popall("Content-Encoding", None)
return response
app.on_response_prepare.append(on_response_prepare) |
I think we cannot change our defaults but should provide an option to disable encoding on-demand. |
@wodny from my understanding, we are talking about decompressing incoming data. But in your snippet, you are tweaking outcoming response. |
Probably should have elaborated more on that. My setup was summetric i.e. aiohttp on server and client. Instead of modifying all clients I modified the server (as a workaround) but affected all the clients. If a server were built with another library probably an analogous solution would exist. |
There is seemingly a ton of ambiguity even in the spec (RFC2616 ) itself.
https://datatracker.ietf.org/doc/html/rfc2616#section-14.41 OK, so transfer-codings and content-codings are different, independent things, except that those links in the second half there go to the content-codings section.
https://datatracker.ietf.org/doc/html/rfc2616#section-14.3 So if transfer codings are not the same as content codings, and So something like And So as a practical matter nobody ever supported |
IMO, the problem and solution as written in the OP here is not correct. It makes it seem like a client problem but there is nothing wrong with decompressing a representation by default. And as pointed out, there is the However, I do think it points to a change that should be made to the server. While there is understandably great freedom in the specs as to how a request should translate to a response, the response is supposed to be some "representation" of the request. Currently, if the request is for a tarball, e.g. Note this is exactly the response you get when downloading a repo tarball from GitHub for example. As another data point, this is also the behavior in |
Long story (not so) short
aiohttp automatically decompresses responses with a
Content-Encoding
header. I believe this is incorrect – while decompression may of course be useful, it should not be on by default.The
Content-Encoding
header is similar to theTransfer-Encoding
one, and the two are often confused. The difference is that TE is about the data transfer, i.e. the encoding for a particular connection and client, whereas CE is about the data itself. CE exists to allow specifying this encoding without losing the MIME type of the actual data.As an example, an ideal response for a
.tar.gz
file download would have aContent-Type: application/x-tar
header and aContent-Encoding: gzip
one (rather than e.g.Content-Type: application/x-gzip
withoutContent-Encoding
). When downloading such a file, one would still expect to get the.tar.gz
file, not an uncompressed.tar
. This is why I think that aiohttp should not automatically decompress on aContent-Encoding
header.Steps to reproduce
Make a request for a file that has been gzipped, and read from the ClientResponse object (e.g. to write it to a file).
Expected behaviour
Get the gzipped file.
Actual behaviour
Returns the decompressed file.
Your environment
All versions are affected (oldest checked: 2.3.10, but presumably much older than that).
Additional notes
curl also used to have this bug in their HTTP/2 implementation. I haven't searched in detail whether it existed in the HTTP/1.1 one as well, but current versions do not decompress such responses.
The text was updated successfully, but these errors were encountered: