Skip to content

read_hdf() file name encoding with with accented or special characters on Windows #29832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wj-c opened this issue Nov 25, 2019 · 4 comments
Closed
Labels
Bug IO HDF5 read_hdf, HDFStore Unicode Unicode strings Windows Windows OS

Comments

@wj-c
Copy link

wj-c commented Nov 25, 2019

Problem description

pd.read_hdf() has the same issue as #15086. If the file path contains special characters (like Chinese) on Windows, it fails to read the file.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.3
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 41.6.0.post20191030
Cython : 0.29.14
pytest : 5.2.4
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : 1.2.6
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.11
tables : 3.6.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.6

@TomAugspurger
Copy link
Contributor

Can you provide a reproducible example, with a stacktrace?

http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Nov 25, 2019
@wj-c
Copy link
Author

wj-c commented Dec 2, 2019

Sorry for the late reply. Sure I can provide an example.


Here is the code:

import pandas as pd
store = pd.HDFStore('测试.h5')
pd.DataFrame().to_hdf(store, '/test')


Here is the results of running the above code:

Traceback (most recent call last):
File "C:\Users\cwj\Desktop\Untitled-1.py", line 3, in
pd.DataFrame().to_hdf(store, '/test')
File "C:\Users\cwj\Anaconda3\envs\main\lib\site-packages\pandas\core\generic.py", line 2530, in to_hdf
pytables.to_hdf(path_or_buf, key, self, **kwargs)
File "C:\Users\cwj\Anaconda3\envs\main\lib\site-packages\pandas\io\pytables.py", line 276, in to_hdf
path_or_buf, mode=mode, complevel=complevel, complib=complib
File "C:\Users\cwj\Anaconda3\envs\main\lib\site-packages\pandas\io\pytables.py", line 505, in init
self.open(mode=mode, **kwargs)
File "C:\Users\cwj\Anaconda3\envs\main\lib\site-packages\pandas\io\pytables.py", line 627, in open
self._handle = tables.open_file(self._path, self._mode, **kwargs)
File "C:\Users\cwj\Anaconda3\envs\main\lib\site-packages\tables\file.py", line 315, in open_file
return File(filename, mode, title, root_uep, filters, **kwargs)
File "C:\Users\cwj\Anaconda3\envs\main\lib\site-packages\tables\file.py", line 778, in init
self._g_new(filename, mode, **params)
File "tables/hdf5extension.pyx", line 492, in tables.hdf5extension.File._g_new
tables.exceptions.HDF5ExtError: HDF5 error back trace

File "C:\ci\hdf5_1545244154871\work\src\H5F.c", line 444, in H5Fcreate
unable to create file
File "C:\ci\hdf5_1545244154871\work\src\H5Fint.c", line 1364, in H5F__create
unable to open file
File "C:\ci\hdf5_1545244154871\work\src\H5Fint.c", line 1579, in H5F_open
unable to truncate a file which is already open

End of HDF5 error back trace

Unable to open/create file '测试.h5'
Closing remaining open files:测试.h5...done


I am using Windows 10 Chinese version. If I change the Windows system encoding to utf-8 (by running chcp 65001 in cmd.exe), the above code works fine.

I suppose it resembles the already solved issue #15086 because I could also not use pd.read_csv before unless I change the Windows system encoding to utf-8.

Now #15086 is solved and pd.read_csv works fine under my Windows default encoding. But hdf-related APIs still remain problematic.

@wj-c
Copy link
Author

wj-c commented Dec 2, 2019

By the way, changing the system encoding to utf-8 is of course a solution to this issue. However, it affects using some other legacy software in, for example, Chinese, which does not support utf-8 encoding. So I hope this issue could be investigated. Thanks!

@jreback jreback added Unicode Unicode strings IO HDF5 read_hdf, HDFStore and removed Needs Info Clarification about behavior needed to assess issue labels Dec 2, 2019
@jreback
Copy link
Contributor

jreback commented Dec 2, 2019

@wj-c happy to take the same patch as for #15086

@mroeschke mroeschke added Bug Windows Windows OS labels Apr 14, 2020
@wj-c wj-c closed this as completed Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HDF5 read_hdf, HDFStore Unicode Unicode strings Windows Windows OS
Projects
None yet
Development

No branches or pull requests

4 participants