Skip to content

BUG: inconsistent and undocumented option "converters" to read_excel #8548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 15, 2014

Conversation

iosonofabio
Copy link

Issue #8212 (first part): pandas.read_excel accepts an optional argument "converters" (which is passed down to PythonParser) to convert single cells in columns with a conversion function. I documented this feature and added a try/except block to make it work in case some cells contain NaNs.

What's still missing is the full "dtype" argument, a la read_csv. That patch is somewhat orthogonal because it only works with the C parser, so I plan to implement it in a second step.

@jreback
Copy link
Contributor

jreback commented Oct 13, 2014

usually start off a PR with tests first.

@iosonofabio
Copy link
Author

I added a test for this functionality.

@jreback
Copy link
Contributor

jreback commented Oct 25, 2014

ok, this looks fine

  • need a release note in v0.15.0 (refere the issue you highlited above). as you are now allowing the converters argument to work with missing values. maybe adjust the docs a bit in doc/source/io.rst (excel section).

@jreback jreback added Docs IO Excel read_excel, to_excel Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Oct 25, 2014
@jreback jreback added this to the 0.15.1 milestone Oct 25, 2014
@iosonofabio
Copy link
Author

I added some docs there, not sure it's enough, open for feedback.

@@ -1992,6 +1992,31 @@ indices to be parsed.

read_excel('path_to_file.xls', 'Sheet1', parse_cols=[0, 2, 3])

.. versionadded:: 0.15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take this line out (it has existed previously, just not documented)

@iosonofabio
Copy link
Author

Hi, what's the current status on this? I think I implemented all you requested, do you think it's ready and can be merged?
No pressure, I am just trying to join loose ends...

It is possible to manipulate the contents of single Excel cells while reading
via the `converters` option. `converters` is a dictionary of functions: the keys
are the names or indices of columns to be transformed, the values are functions
that take one input argument, the Excel cell content, and return the transformed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take out the sentence starting with 'Lambda function'; its uncessary. Can you make this a bit more concise?

@jreback
Copy link
Contributor

jreback commented Nov 13, 2014

ok, pls edit the docs a bit, rebase and squash.

@iosonofabio
Copy link
Author

Ok, edited docs, squashed, rebased, pushed (d0597b8).


It is possible to transform the contents of Excel cells via the `converters`
option. It accepts a dictionary of functions: the keys are the names or
indices of columns to be transformed, the values are functions that take one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this, I think just expand the doc-string (just a bit, maybe 1 line) for TextReader and Excel (to make them consistent). This is also true more generally for csv-type reading, so don't want it specifically here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, could you please reformulate? I can't understand.

  1. What do you want to delete from the docs? I thought you wanted to have the ..note, or shall I delete the whole thing again?
  2. Which line do you want to add to those docstrings? Could you please write here the line you want?
  3. TextReader? or TextParser?
  4. What is true more generally?
    Thanks.

edit: I tried to implement what I understood of your suggestion, please check and let me know.

Dict of functions for converting values in certain columns. Keys can
either be integers or column labels, values are functions that take one
input argument, the cell (not column) content, and return the
transformed content.
encoding : string, default None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good

@jreback
Copy link
Contributor

jreback commented Nov 15, 2014

looks ok to me

@jorisvandenbossche ?

jorisvandenbossche added a commit that referenced this pull request Nov 15, 2014
BUG: inconsistent and undocumented option "converters" to read_excel
@jorisvandenbossche jorisvandenbossche merged commit 072e40b into pandas-dev:master Nov 15, 2014
@jorisvandenbossche
Copy link
Member

Thanks!

@iosonofabio
Copy link
Author

Thanks! I'm happy since it's my first PR and I made a bit of a mess. I'll try to work on the full dtype integration, using the C parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO Excel read_excel, to_excel Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants