Skip to content

BUG: Respect na_rep in DataFrame.to_latex() when used with formatters (#9046) #25799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

tomneep
Copy link
Contributor

@tomneep tomneep commented Mar 20, 2019

A very old issue so low priority!

I'd be tempted to remove the shortcut in _format_strings() since I'd guess in most cases it doesn't really save much time.

@@ -1047,7 +1047,8 @@ def get_result_as_array(self):
"""

if self.formatter is not None:
return np.array([self.formatter(x) for x in self.values])
return np.array([self.formatter(x) if not isna(x) else self.na_rep
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct the array (as object), the set the nan's via mask

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed that list comp. to a np.array then mask. Is that what you mean?

@codecov
Copy link

codecov bot commented Mar 20, 2019

Codecov Report

Merging #25799 into master will not change coverage.
The diff coverage is 0%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #25799   +/-   ##
=======================================
  Coverage   41.77%   41.77%           
=======================================
  Files         173      173           
  Lines       53002    53002           
=======================================
  Hits        22141    22141           
  Misses      30861    30861
Flag Coverage Δ
#single 41.77% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/io/formats/format.py 50.82% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 37d04a3...bb2549e. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 20, 2019

Codecov Report

Merging #25799 into master will decrease coverage by 0.01%.
The diff coverage is 20%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25799      +/-   ##
==========================================
- Coverage   91.82%    91.8%   -0.02%     
==========================================
  Files         175      175              
  Lines       52580    52583       +3     
==========================================
- Hits        48279    48275       -4     
- Misses       4301     4308       +7
Flag Coverage Δ
#multiple 90.36% <20%> (-0.01%) ⬇️
#single 41.91% <0%> (-0.05%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/format.py 97.54% <20%> (-0.35%) ⬇️
pandas/io/gbq.py 75% <0%> (-12.5%) ⬇️
pandas/core/frame.py 96.79% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b727c6b...a47b305. Read the comment docs.

@gfyoung gfyoung added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate IO LaTeX to_latex Bug labels Mar 25, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomneep looks like a conflict can you merge master?

@@ -280,6 +280,7 @@ I/O
- :meth:`DataFrame.to_html` now raises ``TypeError`` when using an invalid type for the ``classes`` parameter instead of ``AsseertionError`` (:issue:`25608`)
- Bug in :meth:`DataFrame.to_string` and :meth:`DataFrame.to_latex` that would lead to incorrect output when the ``header`` keyword is used (:issue:`16718`)
- Bug in :func:`read_csv` not properly interpreting the UTF8 encoded filenames on Windows on Python 3.6+ (:issue:`15086`)
- Bug in :meth:`DataFrame.to_latex` that would ignore `na_rep` if the `formatters` argument was used (:issue:`9046`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want double backticks around na_rep here

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @jreback

@simonjayhawkins
Copy link
Member

i'm -1 on this PR since i thought the idea of the custom formatters was to override the behavior with a user defined function, hence the shortcut.

if you want to define a custom formatter for a particular column, but have the standard na_rep for the remaining columns, doesn't this change prevent that?

@tomneep
Copy link
Contributor Author

tomneep commented Apr 4, 2019

@simonjayhawkins this is a good point that I hadn't considered.

I'm happy for this to be closed.
It would be good to know what needs to be done to close #9046?
Are doc updates needed?

@simonjayhawkins
Copy link
Member

Are doc updates needed?

the formatters parameter is in the common docstring and common across to_html, to_latex and to_string

        formatters : list or dict of one-param. functions, optional
            Formatter functions to apply to columns' elements by position or
            name.
            The result of each function must be a unicode string.
            List must be of length equal to the number of columns.

which is out-of-sync with the only reference to formatters in the IO user guide https://pandas-docs.github.io/pandas-docs-travis/user_guide/io.html#writing-a-formatted-string

formatters default None, a dictionary (by column) of functions each of which takes a single argument and returns a formatted string

so any improvements here would be welcome, but I don't think it is strictly necessary in order to close the issue.

@simonjayhawkins
Copy link
Member

Thinking about it, this could work if we change the docstring. At the moment the custom function must return a string. we could allow nan, None whatever to be returned from the custom function. Then apply the na_rep to only those. This would allow the user defined function to choose whether to handle the missing values.

@simonjayhawkins
Copy link
Member

There's been a issue raised, #26278, regarding using string formatters. This make sense to be compatible with the .style api. I think that it would also make sense, that if string formatters were accepted in to_latex, to_html etc then the na_rep should be respected.

so although I was -1 on this PR originally, I think that, in combination with adding string formatters, and keeping the behavior unchanged for callables, then this PR would close the original issue.

@tomneep
Copy link
Contributor Author

tomneep commented May 28, 2019

Ok- I think that it is probably still best to close this for now, as I think we'd still want to keep the current behaviour if a list of callables is given if I'm reading your suggestion correctly, e.g.

df.to_latex(formatters=[i.format for i in ('{:d}', '{:1.1f}', '{:1.3f}')])

Then the na_rep behaviour can be dealt with if changes are made so you can do

df.to_latex(formatters=['%1i', '%1.1f', '%1.3f'])

It might be worth considering how this change would (or maybe wouldn't) affect the float_format argument too, as one might naively assume that

df.to_latex(float_format=['%1i', '%1.1f', '%1.3f'])

could be a viable choice for this feature too.

Anyway, If there's no objection I'll close this PR tomorrow.

@tomneep tomneep closed this May 29, 2019
@simonjayhawkins
Copy link
Member

Anyway, If there's no objection I'll close this PR tomorrow.

No problem. Thanks @tomneep for looking into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO LaTeX to_latex Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.to_latex() should honor na_rep after formatter.
5 participants