Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compression headers #665

Closed
chaoran-chen opened this issue Feb 22, 2024 · 6 comments · Fixed by #679
Closed

Fix compression headers #665

chaoran-chen opened this issue Feb 22, 2024 · 6 comments · Fixed by #679
Assignees
Labels
bug Something isn't working

Comments

@chaoran-chen
Copy link
Member

chaoran-chen commented Feb 22, 2024

Currently, when we go to aggregated?compression=gzip, we get a response with

Content-Encoding: gzip
Content-Type: application/json

Similarly, if choosing ?compression=zstd, one get a response with a Content-Encoding: zstd. Opening in a browser, this causes an error:

image

I think that the headers are actually quite wrong. If someone requests a compressed file, then the content is not JSON (or TSV or FASTA) but it's a GZIP or ZSTD file. I.e., there shouldn't be a Content-Encoding and the Content-Type should be application/gzip or application/zstd (I checked that both are official IANA media types) or maybe just application/octet-stream.

@chaoran-chen chaoran-chen added the bug Something isn't working label Feb 22, 2024
@fengelniederhammer
Copy link
Contributor

Actually, after doing some research, I think that our implementation follows the specification/standards:

https://www.rfc-editor.org/rfc/rfc9110#name-content-encoding

Content-Encoding is primarily used to allow a representation's data to be compressed without losing the identity of its underlying media type.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type

The Content-Type representation header is used to indicate the original media type of the resource (prior to any content encoding applied for sending).

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding

The Content-Encoding representation header lists any encodings that have been applied to the representation (message payload), and in what order. This lets the recipient know how to decode the representation in order to obtain the original payload format. Content encoding is mainly used to compress the message data without losing information about the origin media type.

The key point seems to be "without losing information about the origin media type".

Browsers AFAIK don't support zstd (also curl does not display the data because "binary data might mess with your terminal"). If we set the Content-Type to something else, then a browser probably still won't be able to display any meaningful data (I didn't try it though).

I'd argue that the current implementation is fine, so I'll move this to blocked until we decide that we still want to change it.

@chaoran-chen
Copy link
Member Author

chaoran-chen commented Feb 27, 2024

If a user does not set compression in the URL and has an Accept-Encoding header that includes e.g. gzip, then, by the HTTP specification, it means that the user can understand responses encoded with gzip. In other words, the user is willing to receive gzip-encoded data.

The logic for the compression query parameter is different. If the user sets compression to gzip, the resource/content that the user wants is a gzip file. The browser should never try to decode it because the resource/content that the user requested is the gzip and not an uncompressed file.

This is similar to our dataFormat query parameter vs the Accept HTTP header. If the dataFormat is TSV, then the user wants a TSV file and nothing else.

@chaoran-chen
Copy link
Member Author

The current implementation doesn't work, does it? The user should be able to go to a particular link in their browser and the browser should then download a GZIP or ZSTD compressed file in the sense that it puts a compressed file onto their disk.

As the feature is very important and needed for our other project, I'll put it back into prioritized.

@fengelniederhammer
Copy link
Contributor

So the feature is actually about downloading the file instead of showing it in the browser? This can already be achieved by setting compression=zstd&downloadAsFile=true.

Should we make downloadAsFile=true (or rather Content-Disposition: attachment in the response) implied by compression=zstd?
But I'm not sure whether this is a good idea. I'm quite sure that it's a bad idea for gzip, as browsers usually demand gzip compressed data. There it should be up to the user to decide whether to download or not. Doing it only for zstd might be surprising. Also, downloading as a file and compression are two different dimensions. If users wants to have both, they should specify both.

We could also simply mention in the docs that usually you want to downloadAsFile=true, when asking for zstd compressed files (at least when downloading via the browser - using curl or wget, this isn't an issue anyway).

@chaoran-chen
Copy link
Member Author

This can already be achieved by setting compression=zstd&downloadAsFile=true.

This doesn't work in (current and older) versions of Chrome:

Screenshot 2024-02-27 at 18 01 17

In future versions or if you enable zstd in Chrome's settings (https://caniuse.com/?search=zstd), it works.

But I'm not sure whether this is a good idea. I'm quite sure that it's a bad idea for gzip, as browsers usually demand gzip compressed data. There it should be up to the user to decide whether to download or not.

If a user asks for a gzipped TSV file (which is not the same as asking for a TSV file which may be encoded as a GZIP or not), then they should receive a GZIP file. Showing it in a decompressed format shall not be considered a right way to display the content. (A "correct" way could be to show the binary in a hex format but of course, no browser does that.)

I think that we should fully separate the query parameters from the Accept-Encoding header. A query defines what the content should be whereas Accept-Encoding is usually transparent to the user.

@fengelniederhammer
Copy link
Contributor

fengelniederhammer commented Feb 28, 2024

I just tried https://lapis.cov-spectrum.org/gisaid/v2/sample/aggregated?accessKey=9C[...]Pd&downloadAsFile=true&compression=zstd. It triggers a download, both in Firefox (123.0) and Chrome (122.0.6261.94).

As we figured out, it works in Chrome, because the Linux build has the zstd flag enabled and tries to decompress zstd. If the flag is disabled, Chrome will show a failed request (even though it was successful - ignoring the Content-Disposition header). IMO this is a Chrome issue.

As dicsussed, we will change it to overwrite the Content-Type header, if the compression parameter is set.

@fengelniederhammer fengelniederhammer self-assigned this Feb 28, 2024
fengelniederhammer added a commit that referenced this issue Feb 28, 2024
…ession property in the request was set #665

That way we lose information about the actual content of the request, but Chrome downloads zstd encoded data instead of failing to display it.
fengelniederhammer added a commit that referenced this issue Feb 28, 2024
…ession property in the request was set #665

That way we lose information about the actual content of the request, but Chrome downloads zstd encoded data instead of failing to display it.
fengelniederhammer added a commit that referenced this issue Feb 29, 2024
…e compression property in the request was set #665
fengelniederhammer added a commit that referenced this issue Feb 29, 2024
…e compression property in the request was set #665
fengelniederhammer added a commit that referenced this issue Feb 29, 2024
…ession property in the request was set #665

That way we lose information about the actual content of the request, but Chrome downloads zstd encoded data instead of failing to display it.
fengelniederhammer added a commit that referenced this issue Feb 29, 2024
…ession property in the request was set #665

That way we lose information about the actual content of the request, but Chrome downloads zstd encoded data instead of failing to display it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants