-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Add Zip file functionality. Fixes #11413 #12103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
result = self.read_csv(open(path, 'rb'), compression='zip') | ||
tm.assert_frame_equal(result, expected) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a test for multiple files in a zip (e.g. assert that the ValueError
is raised)
@jreback I would be happy to continue this to the other |
@jreback Could you merge this in and I'll take care of the other requests? |
@@ -61,9 +61,9 @@ class ParserWarning(Warning): | |||
dtype : Type name or dict of column -> type, default None | |||
Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32} | |||
(Unsupported with engine='python') | |||
compression : {'gzip', 'bz2', 'infer', None}, default 'infer' | |||
compression : {'gzip', 'bz2', 'zip', 'infer', None}, default 'infer' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an explanation about .zip is only a single file
In the description for the parser, a warning/comment is made that a zip file may only contain one file that needs to be read in. If more than one file is compressed into the ZIP file, a ValueError is thrown.
source = zip_file.open(file_name) | ||
|
||
elif len(zip_names)>1: | ||
raise ValueError('Multiple files found in compressed ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just do else
here (e.g. you can have 0 files in an archive?)
…est_gzip, test_bz2, test_zip. Add tests for python and c engines.
Conflicts: pandas/io/tests/test_parsers.py
@jreback could you please merge this before you make anymore changes to the tests. |
you need to rebase then I'll review |
In the description for the parser, a warning/comment is made that a zip file may only contain one file that needs to be read in. If more than one file is compressed into the ZIP file, a ValueError is thrown.
…est_gzip, test_bz2, test_zip. Add tests for python and c engines.
@jreback Rebased. Thanks. |
you need to force push. you should only have your commits. see here |
@jreback I'm redoing this Pull Request. I will close this one and open a new PR. |
ok, in the future, just push to the same one |
Can the same functionality be added to read_fwf() and other read_* methods? |
#11666 is related |
IOW, the compression interfaces need to be pulled out a bit from the parser code |
Sorry @jreback, I just don't have the skills to do that (yet). Is there something good for a beginner to work on? Documentation maybe? (Not even sure where the right place to ask this is.) |
http://pandas.pydata.org/pandas-docs/stable/contributing.html selects label of difficulty novice and you will see lots of issues |
closes #11413
This PR leverages Python's ZipFile functionality to automatically unzip files read into DataFrames using read_csv().