-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
support binary file handles in to_csv #35129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok to me, can you add a whatsnew new in other enhancements section. (or bug fixes I/O if this is really a bug fix?)
|
thank you @jreback! The first commit is technically a bug fix as it prevents an error from occurring ( I think that this PR should also fix #23854 @eode, #19827 @colobas, and #13068 @graingert (as long as the user puts a |
awesome! can you have tests for each of these issues, edit the top of the PR to say you close, and add to the whatsnew? |
|
This PR looks good! Let's add the tests that @jreback requested, and I think we should be good. |
|
#19827: my issue #35058 is actually a duplicate of #19827 (I had a different use case but it is the same underlying technical issue). His code example is already covered in the test. I referenced his issue in the test. #23854 and #13068: this PR will fix 'half' of both issues. If the file handle is in binary mode it will fix their issue (honor the encoding) but for non-binary file handles the issues are still not addressed. I added a test-case containing sample code from these two issues to validate the encoding for binary file handles.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @twoertwein very comprehensive
just some clarifications and doc requests
doc string updates in to_csv looks good - could expand even to include a full example in the main docs (io.rst) and link in the doc-string
|
Hello @twoertwein! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-08-07 00:24:21 UTC |
|
I overhauled a part of |
|
This test has some unclosed file warnings, but other PRs have the same warnings (so hopefully not related to this PR).
|
|
@twoertwein : It looks like there are test failures unrelated to your PR. You may need to rebase or merge |
|
@gfyoung rebased. Master seems to have the same issue, all other tests pass :) |
|
@twoertwein : The changes look okay to me right now. Unfortunately, we will need to wait until the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twoertwein very nice. minor test comments and a doc request. pls merge master and ping on green.
|
|
||
| .. _whatsnew_120.binary_handle_to_csv: | ||
|
|
||
| Support for binary file handles in ``to_csv`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this example to the io.rst section as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new subsection in "Data Handling". Is that an appropriate place?
|
@gfyoung if any other comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment, pls merge master and ping on green.
|
@jreback ping |
|
thanks @twoertwein very nice! |
| .. versionchanged:: 1.2.0 | ||
| Compression is supported for non-binary file objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should have been "Compression is supported for binary file objects."! @jreback Should I create a one-line PR for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened a PR: #35615
to_csvis ignored, when passing a file object instead of a path #19827, fixes BUG: to_csv doesn't support file handles from ZipFiles #35058, fixes df.to_csv() ignores encoding when given a file object or any other filelike object. #23854 *, fixes Python 3 writing to_csv file ignores encoding argument. #13068 *, and fixes In-memory to_csv compression #22555black pandasgit diff upstream/master -u -- "*.py" | flake8 --diffThe first commit addresses #35058 (comment): python's
opencannot take anencodingargument whenmodecontains a'b'(opened in binary mode). This avoids an error when executingdf.to_csv("output.csv", mode="w+b").The second commit fixes #35058, #19827, #23854 *, and #13068 *:
to_csvsupports file handles in binary mode ifmodecontains aband it honorsencoding.to_csvre-invented a lot that was already done inget_handle. Letget_handledo the heavy lifting and remove all special cases fromto_csv.The third commit fixes #22555: some compression algorithms did not set the mode to be writeable for file handles. Together with the re-factoring in the second commit, it is now possible to write to binary file handles with compression!
*requesting an encoding for a non-binary file handles through
to_csvstill doesn't work but imho also doesn't make sense: specify the encoding yourself when opening the file or use a binary file handle