You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe you have misunderstood the intent of the Content-encoding header (e.g., https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding ). This refers to additional encoding done to the payload of the transfer, which must be reversed upon receipt. In your case, the payload is pre-compressed, it is not a compression applied to the transfer.
Comment from the aiohttp issue:
" The Content-Encoding header is not related to data contents actually. It is connected with a way data is transferred over HTTP."
Exactly.
Note that when you specify "gzip" to read_csv or file open, it is understood as file compression. There may be a difference between gzip (the file format) and gzip (the stream compression codec).
What I still don't quite understand is the discrepancy of the code behaviors:
import fsspec as fs
url = "https://convect-test-data.s3.us-west-2.amazonaws.com/tx_3_target_time_series.csv.gz"
# this will return the correct decompressed file content
with fs.open(url, 'r') as f:
print(f.readline())
# this throws the 400 cannot decode content-encoding: gzip error
with fs.open(url, 'r') as f:
df = pd.read_csv(f)
Problem
I have a pre-compressed (with gzip) CSV file on s3: https://convect-test-data.s3.us-west-2.amazonaws.com/tx_3_target_time_series.csv.gz
where the meta is set as
When reading it as a dataframe
throws the following error
While the following is fine
prints out the normal csv file content. It looks like the file is already decompressed when opened.
Reading using
pandas
's reader is also fine:If I removed the
content-encoding: gzip
from the s3 meta, then the following is fineRelated issues
I have raised this issue to dask community: dask/dask#7959
I think this issue might also related to aio-libs/aiohttp#4462
environment
The text was updated successfully, but these errors were encountered: