Skip to content

Add option encoding to to_html? #30483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
janpipek opened this issue Dec 26, 2019 · 3 comments
Closed

Add option encoding to to_html? #30483

janpipek opened this issue Dec 26, 2019 · 3 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@janpipek
Copy link
Contributor

janpipek commented Dec 26, 2019

Many thanks for the vast range of I/O possibilities that pandas allows! I just met one edge (or maybe not so edge) case where an extra parameter could save some work, especially for newbie users.

Code Sample, a copy-pastable example if possible

pd.DataFrame({"a": ['☿']}).to_html("a.html")

# Would be nice to have:
# pd.DataFrame({"a": ['☿']}).to_html("a.html", encoding="utf-8")  

Problem description

With the current signature of DataFrame.to_html, it is not possible to easily write non-ascii / non-latin1 characters to HTML directly or, more generally, to specify the output encoding. It is necessary to pass an open file:

with open("a.html", "w", encoding="utf-8") as out:
    pd.DataFrame({"a": ['☿']}).to_html(out)

It would be nice to have a parameter (admittedly, a 24th one) to allow this, consistent with the to_csv one. I see that there is some discussion on parameter consistency in #15008 and #28377 (hopefully, I did my searching well and this is not a duplicate issue), so it might be against the design principles. Do you think this would be a viable idea? If yes, I am ready to implement it.

Note: It is then questionable, whether an explicit encoding should also result in a correct <meta charset...> tag being added to the file.

My motivation: I am currently writing lesson materials for an EDA course and wanted to show how easy it is to export data frames (by chance containing planet symbols but can be any non-Western character) to any format ;-)

Thanks,
Jan

Expected Output

None

Unexpected output

UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-67-2972ac7a12d7> in <module>
----> 1 pd.DataFrame({"a": ['☿']}).to_html("a.html")

~\Miniconda3\lib\site-packages\pandas\core\frame.py in to_html(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, max_rows, max_cols, show_dimensions, decimal, bold_rows, classes, escape, notebook, border, table_id, render_links)
   2315         )
   2316         # TODO: a generic formatter wld b in DataFrameFormatter
-> 2317         formatter.to_html(classes=classes, notebook=notebook, border=border)
   2318 
   2319         if buf is None:

~\Miniconda3\lib\site-packages\pandas\io\formats\format.py in to_html(self, classes, notebook, border)
    843         elif isinstance(self.buf, str):
    844             with open(self.buf, "w") as f:
--> 845                 buffer_put_lines(f, html)
    846         else:
    847             raise TypeError("buf is not a file name and it has no write " " method")

~\Miniconda3\lib\site-packages\pandas\io\formats\format.py in buffer_put_lines(buf, lines)
   1808     if any(isinstance(x, str) for x in lines):
   1809         lines = [str(x) for x in lines]
-> 1810     buf.write("\n".join(lines))

~\Miniconda3\lib\encodings\cp1252.py in encode(self, input, final)
     17 class IncrementalEncoder(codecs.IncrementalEncoder):
     18     def encode(self, input, final=False):
---> 19         return codecs.charmap_encode(input,self.errors,encoding_table)[0]
     20 
     21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode character '\u263f' in position 193: character maps to <undefined>

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.3.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : None.None

pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : 0.29.13
pytest : 5.2.1
hypothesis : None
sphinx : None
blosc : None
feather : 0.4.0
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.10.1
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.7
tables : None
xarray : 0.14.0
xlrd : None
xlwt : 1.3.0
xlsxwriter : None

@alimcmaster1 alimcmaster1 added the IO HTML read_html, to_html, Styler.apply, Styler.applymap label Dec 27, 2019
@simonjayhawkins
Copy link
Member

@janpipek Thanks for the report. This has been implemented on master. see #28692

@simonjayhawkins
Copy link
Member

duplicate of #28663

@simonjayhawkins simonjayhawkins added Duplicate Report Duplicate issue or pull request and removed IO HTML read_html, to_html, Styler.apply, Styler.applymap labels Dec 28, 2019
@janpipek
Copy link
Contributor Author

Oops, sorry, thank you. I should have looked closer. Nice job :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

3 participants