BUG: Respect na_rep in DataFrame.to_latex() when used with formatters (#9046) #25799

tomneep · 2019-03-20T12:47:42Z

closes DataFrame.to_latex() should honor na_rep after formatter. #9046
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

A very old issue so low priority!

I'd be tempted to remove the shortcut in _format_strings() since I'd guess in most cases it doesn't really save much time.

jreback · 2019-03-20T13:06:41Z

pandas/io/formats/format.py

@@ -1047,7 +1047,8 @@ def get_result_as_array(self):
        """

        if self.formatter is not None:
-            return np.array([self.formatter(x) for x in self.values])
+            return np.array([self.formatter(x) if not isna(x) else self.na_rep


construct the array (as object), the set the nan's via mask

I've changed that list comp. to a np.array then mask. Is that what you mean?

codecov · 2019-03-20T13:31:28Z

Codecov Report

Merging #25799 into master will not change coverage.
The diff coverage is 0%.

@@           Coverage Diff           @@
##           master   #25799   +/-   ##
=======================================
  Coverage   41.77%   41.77%           
=======================================
  Files         173      173           
  Lines       53002    53002           
=======================================
  Hits        22141    22141           
  Misses      30861    30861

Flag	Coverage Δ
#single	`41.77% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/io/formats/format.py	`50.82% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 37d04a3...bb2549e. Read the comment docs.

codecov · 2019-03-20T13:31:30Z

Codecov Report

Merging #25799 into master will decrease coverage by 0.01%.
The diff coverage is 20%.

@@            Coverage Diff             @@
##           master   #25799      +/-   ##
==========================================
- Coverage   91.82%    91.8%   -0.02%     
==========================================
  Files         175      175              
  Lines       52580    52583       +3     
==========================================
- Hits        48279    48275       -4     
- Misses       4301     4308       +7

Flag	Coverage Δ
#multiple	`90.36% <20%> (-0.01%)`	⬇️
#single	`41.91% <0%> (-0.05%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/formats/format.py	`97.54% <20%> (-0.35%)`	⬇️
pandas/io/gbq.py	`75% <0%> (-12.5%)`	⬇️
pandas/core/frame.py	`96.79% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b727c6b...a47b305. Read the comment docs.

WillAyd

@tomneep looks like a conflict can you merge master?

WillAyd · 2019-03-31T15:39:01Z

doc/source/whatsnew/v0.25.0.rst

@@ -280,6 +280,7 @@ I/O
 - :meth:`DataFrame.to_html` now raises ``TypeError`` when using an invalid type for the ``classes`` parameter instead of ``AsseertionError`` (:issue:`25608`)
 - Bug in :meth:`DataFrame.to_string` and :meth:`DataFrame.to_latex` that would lead to incorrect output when the ``header`` keyword is used (:issue:`16718`)
 - Bug in :func:`read_csv` not properly interpreting the UTF8 encoded filenames on Windows on Python 3.6+ (:issue:`15086`)
+- Bug in :meth:`DataFrame.to_latex` that would ignore `na_rep` if the `formatters` argument was used (:issue:`9046`)


Want double backticks around na_rep here

…#9046)

WillAyd

lgtm @jreback

simonjayhawkins · 2019-04-03T22:37:55Z

i'm -1 on this PR since i thought the idea of the custom formatters was to override the behavior with a user defined function, hence the shortcut.

if you want to define a custom formatter for a particular column, but have the standard na_rep for the remaining columns, doesn't this change prevent that?

tomneep · 2019-04-04T08:14:52Z

@simonjayhawkins this is a good point that I hadn't considered.

I'm happy for this to be closed.
It would be good to know what needs to be done to close #9046?
Are doc updates needed?

simonjayhawkins · 2019-04-04T14:33:39Z

Are doc updates needed?

the formatters parameter is in the common docstring and common across to_html, to_latex and to_string

        formatters : list or dict of one-param. functions, optional
            Formatter functions to apply to columns' elements by position or
            name.
            The result of each function must be a unicode string.
            List must be of length equal to the number of columns.

which is out-of-sync with the only reference to formatters in the IO user guide https://pandas-docs.github.io/pandas-docs-travis/user_guide/io.html#writing-a-formatted-string

formatters default None, a dictionary (by column) of functions each of which takes a single argument and returns a formatted string

so any improvements here would be welcome, but I don't think it is strictly necessary in order to close the issue.

simonjayhawkins · 2019-04-06T11:20:48Z

Thinking about it, this could work if we change the docstring. At the moment the custom function must return a string. we could allow nan, None whatever to be returned from the custom function. Then apply the na_rep to only those. This would allow the user defined function to choose whether to handle the missing values.

simonjayhawkins · 2019-05-27T13:18:39Z

There's been a issue raised, #26278, regarding using string formatters. This make sense to be compatible with the .style api. I think that it would also make sense, that if string formatters were accepted in to_latex, to_html etc then the na_rep should be respected.

so although I was -1 on this PR originally, I think that, in combination with adding string formatters, and keeping the behavior unchanged for callables, then this PR would close the original issue.

tomneep · 2019-05-28T09:33:37Z

Ok- I think that it is probably still best to close this for now, as I think we'd still want to keep the current behaviour if a list of callables is given if I'm reading your suggestion correctly, e.g.

df.to_latex(formatters=[i.format for i in ('{:d}', '{:1.1f}', '{:1.3f}')])

Then the na_rep behaviour can be dealt with if changes are made so you can do

df.to_latex(formatters=['%1i', '%1.1f', '%1.3f'])

It might be worth considering how this change would (or maybe wouldn't) affect the float_format argument too, as one might naively assume that

df.to_latex(float_format=['%1i', '%1.1f', '%1.3f'])

could be a viable choice for this feature too.

Anyway, If there's no objection I'll close this PR tomorrow.

simonjayhawkins · 2019-05-29T10:08:31Z

Anyway, If there's no objection I'll close this PR tomorrow.

No problem. Thanks @tomneep for looking into this.

jreback requested changes Mar 20, 2019

View reviewed changes

gfyoung added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate IO LaTeX to_latex Bug labels Mar 25, 2019

WillAyd requested changes Mar 31, 2019

View reviewed changes

tomneep added 2 commits April 1, 2019 10:00

BUG: Respect na_rep in DataFrame.to_latex() when used with formatters (…

c1ae5f6

…#9046)

Remove duplicated whatsnew entry

a47b305

WillAyd approved these changes Apr 1, 2019

View reviewed changes

tomneep closed this May 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Respect na_rep in DataFrame.to_latex() when used with formatters (#9046) #25799

BUG: Respect na_rep in DataFrame.to_latex() when used with formatters (#9046) #25799

tomneep commented Mar 20, 2019

jreback Mar 20, 2019

tomneep Mar 20, 2019

codecov bot commented Mar 20, 2019

codecov bot commented Mar 20, 2019 •

edited

Loading

WillAyd left a comment

WillAyd Mar 31, 2019

WillAyd left a comment

simonjayhawkins commented Apr 3, 2019

tomneep commented Apr 4, 2019

simonjayhawkins commented Apr 4, 2019

simonjayhawkins commented Apr 6, 2019

simonjayhawkins commented May 27, 2019

tomneep commented May 28, 2019

simonjayhawkins commented May 29, 2019

BUG: Respect na_rep in DataFrame.to_latex() when used with formatters (#9046) #25799

BUG: Respect na_rep in DataFrame.to_latex() when used with formatters (#9046) #25799

Conversation

tomneep commented Mar 20, 2019

jreback Mar 20, 2019

Choose a reason for hiding this comment

tomneep Mar 20, 2019

Choose a reason for hiding this comment

codecov bot commented Mar 20, 2019

Codecov Report

codecov bot commented Mar 20, 2019 • edited Loading

Codecov Report

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd Mar 31, 2019

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

simonjayhawkins commented Apr 3, 2019

tomneep commented Apr 4, 2019

simonjayhawkins commented Apr 4, 2019

simonjayhawkins commented Apr 6, 2019

simonjayhawkins commented May 27, 2019

tomneep commented May 28, 2019

simonjayhawkins commented May 29, 2019

codecov bot commented Mar 20, 2019 •

edited

Loading