-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: inconsistent and undocumented option "converters" to read_excel #8548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: inconsistent and undocumented option "converters" to read_excel #8548
Conversation
usually start off a PR with tests first. |
I added a test for this functionality. |
ok, this looks fine
|
I added some docs there, not sure it's enough, open for feedback. |
@@ -1992,6 +1992,31 @@ indices to be parsed. | |||
|
|||
read_excel('path_to_file.xls', 'Sheet1', parse_cols=[0, 2, 3]) | |||
|
|||
.. versionadded:: 0.15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take this line out (it has existed previously, just not documented)
Hi, what's the current status on this? I think I implemented all you requested, do you think it's ready and can be merged? |
It is possible to manipulate the contents of single Excel cells while reading | ||
via the `converters` option. `converters` is a dictionary of functions: the keys | ||
are the names or indices of columns to be transformed, the values are functions | ||
that take one input argument, the Excel cell content, and return the transformed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take out the sentence starting with 'Lambda function'; its uncessary. Can you make this a bit more concise?
ok, pls edit the docs a bit, rebase and squash. |
087fff5
to
d0597b8
Compare
Ok, edited docs, squashed, rebased, pushed (d0597b8). |
|
||
It is possible to transform the contents of Excel cells via the `converters` | ||
option. It accepts a dictionary of functions: the keys are the names or | ||
indices of columns to be transformed, the values are functions that take one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of this, I think just expand the doc-string (just a bit, maybe 1 line) for TextReader
and Excel
(to make them consistent). This is also true more generally for csv-type reading, so don't want it specifically here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, could you please reformulate? I can't understand.
- What do you want to delete from the docs? I thought you wanted to have the ..note, or shall I delete the whole thing again?
- Which line do you want to add to those docstrings? Could you please write here the line you want?
- TextReader? or TextParser?
- What is true more generally?
Thanks.
edit: I tried to implement what I understood of your suggestion, please check and let me know.
Dict of functions for converting values in certain columns. Keys can | ||
either be integers or column labels, values are functions that take one | ||
input argument, the cell (not column) content, and return the | ||
transformed content. | ||
encoding : string, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is good
looks ok to me |
BUG: inconsistent and undocumented option "converters" to read_excel
Thanks! |
Thanks! I'm happy since it's my first PR and I made a bit of a mess. I'll try to work on the full dtype integration, using the C parser. |
Issue #8212 (first part): pandas.read_excel accepts an optional argument "converters" (which is passed down to PythonParser) to convert single cells in columns with a conversion function. I documented this feature and added a try/except block to make it work in case some cells contain NaNs.
What's still missing is the full "dtype" argument, a la read_csv. That patch is somewhat orthogonal because it only works with the C parser, so I plan to implement it in a second step.