Add Zip file functionality. Fixes #11413 #12103

lababidi · 2016-01-20T21:01:30Z

This PR leverages Python's ZipFile functionality to automatically unzip files read into DataFrames using read_csv().

jreback · 2016-01-20T21:05:44Z

pandas/io/tests/test_parsers.py

+
+            result = self.read_csv(open(path, 'rb'), compression='zip')
+            tm.assert_frame_equal(result, expected)
+


need a test for multiple files in a zip (e.g. assert that the ValueError is raised)

jreback · 2016-01-21T15:34:48Z

as an aside, would be interested in solving issues brought up in #11666 and partially addressed in #11677 (not merged)

…le files.

lababidi · 2016-01-21T15:41:49Z

@jreback I would be happy to continue this to the other read_* functions. I like the idea of refactoring out the compression determination and the decompression steps, so that it can be used in all the read_* functions. I won't use the PR #11677 because it does not include tests or zip functionality. It also only focuses on pickles.

lababidi · 2016-01-26T18:44:50Z

@jreback Could you merge this in and I'll take care of the other requests?

jreback · 2016-01-26T19:29:20Z

pandas/io/parsers.py

@@ -61,9 +61,9 @@ class ParserWarning(Warning):
 dtype : Type name or dict of column -> type, default None
    Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
    (Unsupported with engine='python')
-compression : {'gzip', 'bz2', 'infer', None}, default 'infer'
+compression : {'gzip', 'bz2', 'zip', 'infer', None}, default 'infer'


can you add an explanation about .zip is only a single file

In the description for the parser, a warning/comment is made that a zip file may only contain one file that needs to be read in. If more than one file is compressed into the ZIP file, a ValueError is thrown.

jreback · 2016-01-26T20:13:33Z

pandas/parser.pyx

+                    source = zip_file.open(file_name)
+
+                elif len(zip_names)>1:
+                    raise ValueError('Multiple files found in compressed '


maybe just do else here (e.g. you can have 0 files in an archive?)

…est_gzip, test_bz2, test_zip. Add tests for python and c engines.

Conflicts: pandas/io/tests/test_parsers.py

lababidi · 2016-01-29T03:22:52Z

@jreback could you please merge this before you make anymore changes to the tests.

jreback · 2016-01-29T03:38:11Z

you need to rebase then I'll review

…le files.

In the description for the parser, a warning/comment is made that a zip file may only contain one file that needs to be read in. If more than one file is compressed into the ZIP file, a ValueError is thrown.

…est_gzip, test_bz2, test_zip. Add tests for python and c engines.

lababidi · 2016-01-29T14:06:28Z

@jreback Rebased. Thanks.

jreback · 2016-01-29T14:17:15Z

you need to force push. you should only have your commits. see here

lababidi · 2016-01-29T15:14:46Z

@jreback I'm redoing this Pull Request. I will close this one and open a new PR.

jreback · 2016-01-29T15:21:18Z

ok, in the future, just push to the same one

stoffprof · 2016-05-04T13:25:22Z

Can the same functionality be added to read_fwf() and other read_* methods?

jreback · 2016-05-04T13:28:20Z

#11666 is related
#12688 is where this could be done, e.g. the code is a spread out a bit. welcome for you to take a crack at it @Itzybitzy

jreback · 2016-05-04T13:28:48Z

IOW, the compression interfaces need to be pulled out a bit from the parser code

stoffprof · 2016-05-05T22:31:40Z

Sorry @jreback, I just don't have the skills to do that (yet). Is there something good for a beginner to work on? Documentation maybe? (Not even sure where the right place to ask this is.)

jreback · 2016-05-05T22:39:21Z

http://pandas.pydata.org/pandas-docs/stable/contributing.html

selects label of difficulty novice and you will see lots of issues

Add Zip file functionality. Fixes #11413

7f3461c

jreback reviewed Jan 20, 2016
View reviewed changes

jreback added Enhancement IO Data IO issues that don't fit into a more specific label labels Jan 21, 2016

Add test to ensure ValueError is thrown when ZIP file contains multip…

2fc43b6

…le files.

jreback reviewed Jan 26, 2016
View reviewed changes

Add parser description warning to handle ZIP files

f5a641d

In the description for the parser, a warning/comment is made that a zip file may only contain one file that needs to be read in. If more than one file is compressed into the ZIP file, a ValueError is thrown.

jreback reviewed Jan 26, 2016
View reviewed changes

Mahmoud Lababidi added 3 commits January 26, 2016 17:54

Create TestCompression Nose Test Class. Split test_compression into t…

cf7c347

…est_gzip, test_bz2, test_zip. Add tests for python and c engines.

Merge remote-tracking branch 'origin/master'

1b21456

Merge branch 'master' of github.com:pydata/pandas

b6939ea

Conflicts: pandas/io/tests/test_parsers.py

Mahmoud Lababidi and others added 5 commits January 29, 2016 08:53

Add Zip file functionality. Fixes #11413

56e55a6

Add test to ensure ValueError is thrown when ZIP file contains multip…

40fe268

…le files.

Add parser description warning to handle ZIP files

ee336f1

In the description for the parser, a warning/comment is made that a zip file may only contain one file that needs to be read in. If more than one file is compressed into the ZIP file, a ValueError is thrown.

Create TestCompression Nose Test Class. Split test_compression into t…

1c3ecd7

…est_gzip, test_bz2, test_zip. Add tests for python and c engines.

Merge branch 'master' of github.com:lababidi/pandas

b3c21cf

lababidi closed this Jan 29, 2016

jreback added this to the 0.18.1 milestone May 4, 2016

ozak mentioned this pull request May 31, 2019

Compression keyword for Stata and others? #26599

Closed


		result = self.read_csv(open(path, 'rb'), compression='zip')
		tm.assert_frame_equal(result, expected)

Uh oh!

Add Zip file functionality. Fixes #11413 #12103

Add Zip file functionality. Fixes #11413 #12103

Uh oh!

Conversation

lababidi commented Jan 20, 2016

Uh oh!

jreback Jan 20, 2016

Choose a reason for hiding this comment

Uh oh!

jreback commented Jan 21, 2016

Uh oh!

lababidi commented Jan 21, 2016

Uh oh!

lababidi commented Jan 26, 2016

Uh oh!

jreback Jan 26, 2016

Choose a reason for hiding this comment

Uh oh!

jreback Jan 26, 2016

Choose a reason for hiding this comment

Uh oh!

lababidi commented Jan 29, 2016

Uh oh!

jreback commented Jan 29, 2016

Uh oh!

lababidi commented Jan 29, 2016

Uh oh!

jreback commented Jan 29, 2016

Uh oh!

lababidi commented Jan 29, 2016

Uh oh!

jreback commented Jan 29, 2016

Uh oh!

stoffprof commented May 4, 2016

Uh oh!

jreback commented May 4, 2016

Uh oh!

jreback commented May 4, 2016

Uh oh!

stoffprof commented May 5, 2016

Uh oh!

jreback commented May 5, 2016

Uh oh!

Uh oh!