Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df.to_csv ignores compression when provided with a file handle #21227

Closed
guinny65 opened this issue May 28, 2018 · 5 comments · Fixed by #21249
Closed

df.to_csv ignores compression when provided with a file handle #21227

guinny65 opened this issue May 28, 2018 · 5 comments · Fixed by #21249
Labels
IO CSV read_csv, to_csv Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@guinny65
Copy link

guinny65 commented May 28, 2018

Code at issue

with open(outputfile, "w") as file_handle:
    df2.to_csv(file_handle, compression='gzip')

Problem description

The written file is not gzip compressed.

Expected Output

The output should be a gzip compressed csv file. Similar to what is obtained when using:

df2.to_csv('/path/to/file.csv.gz',compression='gzip')
@WillAyd
Copy link
Member

WillAyd commented May 29, 2018

Maybe related to #21144

@minggli
Copy link
Contributor

minggli commented May 29, 2018

>>> import os
>>> import pandas as pd
>>> from pandas import *
>>>
>>> pd.__version__
'0.20.3'
>>>
>>> df = DataFrame(100 * [[123, 234, 435]])
>>>
>>> with open('test_compressed', 'w') as fh:
...     df.to_csv(fh, compression='gzip')
...
>>> fh_size = os.path.getsize('test_compressed')
>>> df.to_csv('test_compressed', compression='gzip')
>>> f_size = os.path.getsize('test_compressed')
>>>
>>> os.remove('test_compressed')
>>> assert fh_size == f_size
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

looks like an existing behaviour dating back to version 0.20 or earlier.

Actually, the documentation of compression= says:

compression : string, optional

A string representing the compression to use in the output file. Allowed values are ‘gzip’, ‘bz2’, ‘zip’, ‘xz’. This input is only used when the first argument is a filename.

so it is not supported but may be a new use case perhaps?

@toninlg
Copy link

toninlg commented Jul 1, 2020

Hi,

Should the following code works with this merge or is it related to #22555?

import os
import sys
import pandas as pd
from pandas import *

print(sys.version)
print(pd.__version__)
df = DataFrame(100 * [[123, 234, 435]])
with open('./test_compressed.gz', 'w', newline='') as fh:
    df.to_csv(fh)

fh_size = os.path.getsize('./test_compressed.gz')
df.to_csv('./test_compressed.gz')
f_size = os.path.getsize('./test_compressed.gz')
os.remove('./test_compressed.gz')
assert fh_size == f_size

3.6.7 (default, Dec 6 2019, 07:03:06) [MSC v.1900 64 bit (AMD64)]
0.25.1

3.7.7 (default, May 6 2020, 11:45:54) [MSC v.1916 64 bit (AMD64)]
1.0.5
I have an assertion error in both cases and if I add compression='gzip' to the first to_csv, I have RuntimeWarning: compression has no effect when passing file-like object as input.

Thank you

@AvivAvital2
Copy link

@toninlg this worked for me

import gzip
with io.StringIO() as buf:
    df.to_csv(buf)
    with open('test_compressed.gz', 'wb') as remote_file:
        remote_file.write(gzip.compress(bytes(buf.getvalue(), 'utf-8')))

@miodeqqq
Copy link

You can also try this:

import csv
import gzip
from io import BytesIO, TextIOWrapper

gz_buffer = BytesIO()

with gzip.GzipFile(fileobj=gz_buffer, mode="w") as gz_file:
    df.to_csv(
        path_or_buf=TextIOWrapper(gz_file, "utf8"),
        index=False,
        sep=",",
        quoting=csv.QUOTE_NONE,
        compression="gzip",
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants