read_hdf() file name encoding with with accented or special characters on Windows #29832

wj-c · 2019-11-25T06:16:51Z

Problem description

pd.read_hdf() has the same issue as #15086. If the file path contains special characters (like Chinese) on Windows, it fails to read the file.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.3
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 41.6.0.post20191030
Cython : 0.29.14
pytest : 5.2.4
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : 1.2.6
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.11
tables : 3.6.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.6

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-11-25T12:12:40Z

Can you provide a reproducible example, with a stacktrace?

http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

wj-c · 2019-12-02T08:54:58Z

Sorry for the late reply. Sure I can provide an example.

Here is the code:

import pandas as pd
store = pd.HDFStore('测试.h5')
pd.DataFrame().to_hdf(store, '/test')

Here is the results of running the above code:

Traceback (most recent call last):
File "C:\Users\cwj\Desktop\Untitled-1.py", line 3, in
pd.DataFrame().to_hdf(store, '/test')
File "C:\Users\cwj\Anaconda3\envs\main\lib\site-packages\pandas\core\generic.py", line 2530, in to_hdf
pytables.to_hdf(path_or_buf, key, self, **kwargs)
File "C:\Users\cwj\Anaconda3\envs\main\lib\site-packages\pandas\io\pytables.py", line 276, in to_hdf
path_or_buf, mode=mode, complevel=complevel, complib=complib
File "C:\Users\cwj\Anaconda3\envs\main\lib\site-packages\pandas\io\pytables.py", line 505, in init
self.open(mode=mode, **kwargs)
File "C:\Users\cwj\Anaconda3\envs\main\lib\site-packages\pandas\io\pytables.py", line 627, in open
self._handle = tables.open_file(self._path, self._mode, **kwargs)
File "C:\Users\cwj\Anaconda3\envs\main\lib\site-packages\tables\file.py", line 315, in open_file
return File(filename, mode, title, root_uep, filters, **kwargs)
File "C:\Users\cwj\Anaconda3\envs\main\lib\site-packages\tables\file.py", line 778, in init
self._g_new(filename, mode, **params)
File "tables/hdf5extension.pyx", line 492, in tables.hdf5extension.File._g_new
tables.exceptions.HDF5ExtError: HDF5 error back trace

File "C:\ci\hdf5_1545244154871\work\src\H5F.c", line 444, in H5Fcreate
unable to create file
File "C:\ci\hdf5_1545244154871\work\src\H5Fint.c", line 1364, in H5F__create
unable to open file
File "C:\ci\hdf5_1545244154871\work\src\H5Fint.c", line 1579, in H5F_open
unable to truncate a file which is already open

End of HDF5 error back trace

Unable to open/create file '测试.h5'
Closing remaining open files:测试.h5...done

I am using Windows 10 Chinese version. If I change the Windows system encoding to utf-8 (by running chcp 65001 in cmd.exe), the above code works fine.

I suppose it resembles the already solved issue #15086 because I could also not use pd.read_csv before unless I change the Windows system encoding to utf-8.

Now #15086 is solved and pd.read_csv works fine under my Windows default encoding. But hdf-related APIs still remain problematic.

wj-c · 2019-12-02T09:03:22Z

By the way, changing the system encoding to utf-8 is of course a solution to this issue. However, it affects using some other legacy software in, for example, Chinese, which does not support utf-8 encoding. So I hope this issue could be investigated. Thanks!

jreback · 2019-12-02T12:54:05Z

@wj-c happy to take the same patch as for #15086

TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Nov 25, 2019

jreback added Unicode Unicode strings IO HDF5 read_hdf, HDFStore and removed Needs Info Clarification about behavior needed to assess issue labels Dec 2, 2019

mroeschke added Bug Windows Windows OS labels Apr 14, 2020

wj-c closed this as completed Jun 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

read_hdf() file name encoding with with accented or special characters on Windows #29832

read_hdf() file name encoding with with accented or special characters on Windows #29832

wj-c commented Nov 25, 2019 •

edited

Loading

TomAugspurger commented Nov 25, 2019

Uh oh!

wj-c commented Dec 2, 2019

Uh oh!

wj-c commented Dec 2, 2019

Uh oh!

jreback commented Dec 2, 2019

Uh oh!

Uh oh!

read_hdf() file name encoding with with accented or special characters on Windows #29832

read_hdf() file name encoding with with accented or special characters on Windows #29832

Comments

wj-c commented Nov 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Nov 25, 2019

Uh oh!

wj-c commented Dec 2, 2019

Uh oh!

wj-c commented Dec 2, 2019

Uh oh!

jreback commented Dec 2, 2019

Uh oh!

wj-c commented Nov 25, 2019 •

edited

Loading

Output of `pd.show_versions()`