Add option `encoding` to `to_html`? #30483

janpipek · 2019-12-26T10:03:51Z

Many thanks for the vast range of I/O possibilities that pandas allows! I just met one edge (or maybe not so edge) case where an extra parameter could save some work, especially for newbie users.

Code Sample, a copy-pastable example if possible

pd.DataFrame({"a": ['☿']}).to_html("a.html")

# Would be nice to have:
# pd.DataFrame({"a": ['☿']}).to_html("a.html", encoding="utf-8")

Problem description

With the current signature of DataFrame.to_html, it is not possible to easily write non-ascii / non-latin1 characters to HTML directly or, more generally, to specify the output encoding. It is necessary to pass an open file:

with open("a.html", "w", encoding="utf-8") as out:
    pd.DataFrame({"a": ['☿']}).to_html(out)

It would be nice to have a parameter (admittedly, a 24th one) to allow this, consistent with the to_csv one. I see that there is some discussion on parameter consistency in #15008 and #28377 (hopefully, I did my searching well and this is not a duplicate issue), so it might be against the design principles. Do you think this would be a viable idea? If yes, I am ready to implement it.

Note: It is then questionable, whether an explicit encoding should also result in a correct <meta charset...> tag being added to the file.

My motivation: I am currently writing lesson materials for an EDA course and wanted to show how easy it is to export data frames (by chance containing planet symbols but can be any non-Western character) to any format ;-)

Thanks,
Jan

Expected Output

None

Unexpected output

UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-67-2972ac7a12d7> in <module>
----> 1 pd.DataFrame({"a": ['☿']}).to_html("a.html")

~\Miniconda3\lib\site-packages\pandas\core\frame.py in to_html(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, max_rows, max_cols, show_dimensions, decimal, bold_rows, classes, escape, notebook, border, table_id, render_links)
   2315         )
   2316         # TODO: a generic formatter wld b in DataFrameFormatter
-> 2317         formatter.to_html(classes=classes, notebook=notebook, border=border)
   2318 
   2319         if buf is None:

~\Miniconda3\lib\site-packages\pandas\io\formats\format.py in to_html(self, classes, notebook, border)
    843         elif isinstance(self.buf, str):
    844             with open(self.buf, "w") as f:
--> 845                 buffer_put_lines(f, html)
    846         else:
    847             raise TypeError("buf is not a file name and it has no write " " method")

~\Miniconda3\lib\site-packages\pandas\io\formats\format.py in buffer_put_lines(buf, lines)
   1808     if any(isinstance(x, str) for x in lines):
   1809         lines = [str(x) for x in lines]
-> 1810     buf.write("\n".join(lines))

~\Miniconda3\lib\encodings\cp1252.py in encode(self, input, final)
     17 class IncrementalEncoder(codecs.IncrementalEncoder):
     18     def encode(self, input, final=False):
---> 19         return codecs.charmap_encode(input,self.errors,encoding_table)[0]
     20 
     21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode character '\u263f' in position 193: character maps to <undefined>

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit : None python : 3.7.3.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : None.None

pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : 0.29.13
pytest : 5.2.1
hypothesis : None
sphinx : None
blosc : None
feather : 0.4.0
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.10.1
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.7
tables : None
xarray : 0.14.0
xlrd : None
xlwt : 1.3.0
xlsxwriter : None

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2019-12-28T12:48:39Z

@janpipek Thanks for the report. This has been implemented on master. see #28692

simonjayhawkins · 2019-12-28T12:49:02Z

duplicate of #28663

janpipek · 2019-12-30T09:32:26Z

Oops, sorry, thank you. I should have looked closer. Nice job :-)

alimcmaster1 added the IO HTML read_html, to_html, Styler.apply, Styler.applymap label Dec 27, 2019

simonjayhawkins closed this as completed Dec 28, 2019

simonjayhawkins added Duplicate Report Duplicate issue or pull request and removed IO HTML read_html, to_html, Styler.apply, Styler.applymap labels Dec 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add option `encoding` to `to_html`? #30483

Add option `encoding` to `to_html`? #30483

janpipek commented Dec 26, 2019 •

edited

Loading

simonjayhawkins commented Dec 28, 2019

Uh oh!

simonjayhawkins commented Dec 28, 2019

Uh oh!

janpipek commented Dec 30, 2019

Uh oh!

Uh oh!

Add option encoding to to_html? #30483

Add option encoding to to_html? #30483

Comments

janpipek commented Dec 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Unexpected output

Output of pd.show_versions()

simonjayhawkins commented Dec 28, 2019

Uh oh!

simonjayhawkins commented Dec 28, 2019

Uh oh!

janpipek commented Dec 30, 2019

Uh oh!

Add option `encoding` to `to_html`? #30483

Add option `encoding` to `to_html`? #30483

janpipek commented Dec 26, 2019 •

edited

Loading

Output of `pd.show_versions()`