-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support binary file handles in to_csv #35129
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok to me, can you add a whatsnew new in other enhancements section. (or bug fixes I/O if this is really a bug fix?)
thank you @jreback! The first commit is technically a bug fix as it prevents an error from occurring ( I think that this PR should also fix #23854 @eode, #19827 @colobas, and #13068 @graingert (as long as the user puts a |
awesome! can you have tests for each of these issues, edit the top of the PR to say you close, and add to the whatsnew? |
This PR looks good! Let's add the tests that @jreback requested, and I think we should be good. |
#19827: my issue #35058 is actually a duplicate of #19827 (I had a different use case but it is the same underlying technical issue). His code example is already covered in the test. I referenced his issue in the test. #23854 and #13068: this PR will fix 'half' of both issues. If the file handle is in binary mode it will fix their issue (honor the encoding) but for non-binary file handles the issues are still not addressed. I added a test-case containing sample code from these two issues to validate the encoding for binary file handles.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @twoertwein very comprehensive
just some clarifications and doc requests
doc string updates in to_csv looks good - could expand even to include a full example in the main docs (io.rst) and link in the doc-string
Hello @twoertwein! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-08-07 00:24:21 UTC |
I overhauled a part of |
This test has some unclosed file warnings, but other PRs have the same warnings (so hopefully not related to this PR).
|
@twoertwein : It looks like there are test failures unrelated to your PR. You may need to rebase or merge |
@gfyoung rebased. Master seems to have the same issue, all other tests pass :) |
@twoertwein : The changes look okay to me right now. Unfortunately, we will need to wait until the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twoertwein very nice. minor test comments and a doc request. pls merge master and ping on green.
@@ -13,6 +13,25 @@ including other versions of pandas. | |||
Enhancements | |||
~~~~~~~~~~~~ | |||
|
|||
.. _whatsnew_120.binary_handle_to_csv: | |||
|
|||
Support for binary file handles in ``to_csv`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this example to the io.rst section as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new subsection in "Data Handling". Is that an appropriate place?
@gfyoung if any other comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment, pls merge master and ping on green.
@jreback ping |
thanks @twoertwein very nice! |
@@ -3080,6 +3086,10 @@ def to_csv( | |||
supported for compression modes 'gzip' and 'bz2' | |||
as well as 'zip'. | |||
|
|||
.. versionchanged:: 1.2.0 | |||
|
|||
Compression is supported for non-binary file objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should have been "Compression is supported for binary file objects."! @jreback Should I create a one-line PR for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened a PR: #35615
to_csv
is ignored, when passing a file object instead of a path #19827, fixes BUG: to_csv doesn't support file handles from ZipFiles #35058, fixes df.to_csv() ignores encoding when given a file object or any other filelike object. #23854 *, fixes Python 3 writing to_csv file ignores encoding argument. #13068 *, and fixes In-memory to_csv compression #22555black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
The first commit addresses #35058 (comment): python's
open
cannot take anencoding
argument whenmode
contains a'b'
(opened in binary mode). This avoids an error when executingdf.to_csv("output.csv", mode="w+b")
.The second commit fixes #35058, #19827, #23854 *, and #13068 *:
to_csv
supports file handles in binary mode ifmode
contains ab
and it honorsencoding
.to_csv
re-invented a lot that was already done inget_handle
. Letget_handle
do the heavy lifting and remove all special cases fromto_csv
.The third commit fixes #22555: some compression algorithms did not set the mode to be writeable for file handles. Together with the re-factoring in the second commit, it is now possible to write to binary file handles with compression!
*requesting an encoding for a non-binary file handles through
to_csv
still doesn't work but imho also doesn't make sense: specify the encoding yourself when opening the file or use a binary file handle