BUG: inconsistent and undocumented option "converters" to read_excel #8548

iosonofabio · 2014-10-13T12:31:11Z

Issue #8212 (first part): pandas.read_excel accepts an optional argument "converters" (which is passed down to PythonParser) to convert single cells in columns with a conversion function. I documented this feature and added a try/except block to make it work in case some cells contain NaNs.

What's still missing is the full "dtype" argument, a la read_csv. That patch is somewhat orthogonal because it only works with the C parser, so I plan to implement it in a second step.

jreback · 2014-10-13T12:33:01Z

usually start off a PR with tests first.

iosonofabio · 2014-10-13T13:44:49Z

I added a test for this functionality.

jreback · 2014-10-25T00:15:15Z

ok, this looks fine

need a release note in v0.15.0 (refere the issue you highlited above). as you are now allowing the converters argument to work with missing values. maybe adjust the docs a bit in doc/source/io.rst (excel section).

iosonofabio · 2014-10-30T09:19:38Z

I added some docs there, not sure it's enough, open for feedback.

jreback · 2014-10-30T13:28:16Z

doc/source/io.rst

@@ -1992,6 +1992,31 @@ indices to be parsed.

   read_excel('path_to_file.xls', 'Sheet1', parse_cols=[0, 2, 3])

+.. versionadded:: 0.15


take this line out (it has existed previously, just not documented)

iosonofabio · 2014-11-13T21:47:31Z

Hi, what's the current status on this? I think I implemented all you requested, do you think it's ready and can be merged?
No pressure, I am just trying to join loose ends...

jreback · 2014-11-13T22:46:10Z

doc/source/io.rst

+   It is possible to manipulate the contents of single Excel cells while reading
+   via the `converters` option. `converters` is a dictionary of functions: the keys
+   are the names or indices of columns to be transformed, the values are functions
+   that take one input argument, the Excel cell content, and return the transformed


take out the sentence starting with 'Lambda function'; its uncessary. Can you make this a bit more concise?

jreback · 2014-11-13T22:46:56Z

ok, pls edit the docs a bit, rebase and squash.

iosonofabio · 2014-11-14T07:51:02Z

Ok, edited docs, squashed, rebased, pushed (d0597b8).

jreback · 2014-11-14T13:17:58Z

doc/source/io.rst

+
+   It is possible to transform the contents of Excel cells via the `converters`
+   option. It accepts a dictionary of functions: the keys are the names or
+   indices of columns to be transformed, the values are functions that take one


instead of this, I think just expand the doc-string (just a bit, maybe 1 line) for TextReader and Excel (to make them consistent). This is also true more generally for csv-type reading, so don't want it specifically here.

I am sorry, could you please reformulate? I can't understand.

What do you want to delete from the docs? I thought you wanted to have the ..note, or shall I delete the whole thing again?

Which line do you want to add to those docstrings? Could you please write here the line you want?

TextReader? or TextParser?

What is true more generally?
Thanks.

edit: I tried to implement what I understood of your suggestion, please check and let me know.

jreback · 2014-11-15T17:11:34Z

pandas/io/parsers.py

+        Dict of functions for converting values in certain columns. Keys can
+        either be integers or column labels, values are functions that take one
+        input argument, the cell (not column) content, and return the
+        transformed content.
    encoding : string, default None


this is good

jreback · 2014-11-15T17:12:01Z

looks ok to me

@jorisvandenbossche ?

BUG: inconsistent and undocumented option "converters" to read_excel

jorisvandenbossche · 2014-11-15T19:02:03Z

Thanks!

iosonofabio · 2014-11-15T20:38:38Z

Thanks! I'm happy since it's my first PR and I made a bit of a mess. I'll try to work on the full dtype integration, using the C parser.

jreback added Docs IO Excel read_excel, to_excel Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Oct 25, 2014

jreback added this to the 0.15.1 milestone Oct 25, 2014

jreback reviewed Oct 30, 2014
View reviewed changes

jreback reviewed Nov 13, 2014
View reviewed changes

BUG: "converters" in read_excel with missing data

d0597b8

iosonofabio force-pushed the excel_dtype branch from 087fff5 to d0597b8 Compare November 14, 2014 07:49

jreback reviewed Nov 14, 2014
View reviewed changes

docs fix (?)

89d4871

jreback reviewed Nov 15, 2014
View reviewed changes

jorisvandenbossche added a commit that referenced this pull request Nov 15, 2014

Merge pull request #8548 from iosonofabio/excel_dtype

072e40b

BUG: inconsistent and undocumented option "converters" to read_excel

jorisvandenbossche merged commit 072e40b into pandas-dev:master Nov 15, 2014

jorisvandenbossche mentioned this pull request Dec 4, 2014

Add converters= argument to ExcelFile.parse #2868

Closed

		@@ -1992,6 +1992,31 @@ indices to be parsed.

		read_excel('path_to_file.xls', 'Sheet1', parse_cols=[0, 2, 3])

		.. versionadded:: 0.15

Uh oh!

BUG: inconsistent and undocumented option "converters" to read_excel #8548

BUG: inconsistent and undocumented option "converters" to read_excel #8548

Uh oh!

Conversation

iosonofabio commented Oct 13, 2014

Uh oh!

jreback commented Oct 13, 2014

Uh oh!

iosonofabio commented Oct 13, 2014

Uh oh!

jreback commented Oct 25, 2014

Uh oh!

iosonofabio commented Oct 30, 2014

Uh oh!

jreback Oct 30, 2014

Choose a reason for hiding this comment

Uh oh!

iosonofabio commented Nov 13, 2014

Uh oh!

jreback Nov 13, 2014

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 13, 2014

Uh oh!

iosonofabio commented Nov 14, 2014

Uh oh!

jreback Nov 14, 2014

Choose a reason for hiding this comment

Uh oh!

iosonofabio Nov 14, 2014

Choose a reason for hiding this comment

Uh oh!

jreback Nov 15, 2014

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 15, 2014

Uh oh!

jorisvandenbossche commented Nov 15, 2014

Uh oh!

iosonofabio commented Nov 15, 2014

Uh oh!

Uh oh!