You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many thanks for the vast range of I/O possibilities that pandas allows! I just met one edge (or maybe not so edge) case where an extra parameter could save some work, especially for newbie users.
Code Sample, a copy-pastable example if possible
pd.DataFrame({"a": ['☿']}).to_html("a.html")
# Would be nice to have:# pd.DataFrame({"a": ['☿']}).to_html("a.html", encoding="utf-8")
Problem description
With the current signature of DataFrame.to_html, it is not possible to easily write non-ascii / non-latin1 characters to HTML directly or, more generally, to specify the output encoding. It is necessary to pass an open file:
with open("a.html", "w", encoding="utf-8") as out:
pd.DataFrame({"a": ['☿']}).to_html(out)
It would be nice to have a parameter (admittedly, a 24th one) to allow this, consistent with the to_csv one. I see that there is some discussion on parameter consistency in #15008 and #28377 (hopefully, I did my searching well and this is not a duplicate issue), so it might be against the design principles. Do you think this would be a viable idea? If yes, I am ready to implement it.
Note: It is then questionable, whether an explicit encoding should also result in a correct <meta charset...> tag being added to the file.
My motivation: I am currently writing lesson materials for an EDA course and wanted to show how easy it is to export data frames (by chance containing planet symbols but can be any non-Western character) to any format ;-)
Thanks,
Jan
Expected Output
None
Unexpected output
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-67-2972ac7a12d7> in <module>
----> 1 pd.DataFrame({"a": ['☿']}).to_html("a.html")
~\Miniconda3\lib\site-packages\pandas\core\frame.py in to_html(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, max_rows, max_cols, show_dimensions, decimal, bold_rows, classes, escape, notebook, border, table_id, render_links)
2315 )
2316 # TODO: a generic formatter wld b in DataFrameFormatter
-> 2317 formatter.to_html(classes=classes, notebook=notebook, border=border)
2318
2319 if buf is None:
~\Miniconda3\lib\site-packages\pandas\io\formats\format.py in to_html(self, classes, notebook, border)
843 elif isinstance(self.buf, str):
844 with open(self.buf, "w") as f:
--> 845 buffer_put_lines(f, html)
846 else:
847 raise TypeError("buf is not a file name and it has no write " " method")
~\Miniconda3\lib\site-packages\pandas\io\formats\format.py in buffer_put_lines(buf, lines)
1808 if any(isinstance(x, str) for x in lines):
1809 lines = [str(x) for x in lines]
-> 1810 buf.write("\n".join(lines))
~\Miniconda3\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):
UnicodeEncodeError: 'charmap' codec can't encode character '\u263f' in position 193: character maps to <undefined>
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : None.None
Many thanks for the vast range of I/O possibilities that pandas allows! I just met one edge (or maybe not so edge) case where an extra parameter could save some work, especially for newbie users.
Code Sample, a copy-pastable example if possible
Problem description
With the current signature of
DataFrame.to_html
, it is not possible to easily write non-ascii / non-latin1 characters to HTML directly or, more generally, to specify the output encoding. It is necessary to pass an open file:It would be nice to have a parameter (admittedly, a 24th one) to allow this, consistent with the
to_csv
one. I see that there is some discussion on parameter consistency in #15008 and #28377 (hopefully, I did my searching well and this is not a duplicate issue), so it might be against the design principles. Do you think this would be a viable idea? If yes, I am ready to implement it.Note: It is then questionable, whether an explicit encoding should also result in a correct
<meta charset...>
tag being added to the file.My motivation: I am currently writing lesson materials for an EDA course and wanted to show how easy it is to export data frames (by chance containing planet symbols but can be any non-Western character) to any format ;-)
Thanks,
Jan
Expected Output
Unexpected output
Output of
pd.show_versions()
pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : 0.29.13
pytest : 5.2.1
hypothesis : None
sphinx : None
blosc : None
feather : 0.4.0
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.10.1
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.7
tables : None
xarray : 0.14.0
xlrd : None
xlwt : 1.3.0
xlsxwriter : None
The text was updated successfully, but these errors were encountered: