Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chinese character in the path using netcdf4 dataset #997

Closed
LiNinghui-AI opened this issue Feb 29, 2020 · 17 comments
Closed

Chinese character in the path using netcdf4 dataset #997

LiNinghui-AI opened this issue Feb 29, 2020 · 17 comments

Comments

@LiNinghui-AI
Copy link

I need to open a nc file with netcdf4. But in the path of the file, there are chinese characters and netCDF4.Dataset return the error : "No such file or directory". But if I use "os.path.isfile", the file is found. I've tried to decode, encode (in utf-8) the path without result.

Is there any option to use in dataset call ?

Thanks.

@jswhit2
Copy link
Contributor

jswhit2 commented Mar 2, 2020

This may be related to #941, which in turn is related to HDF5 unicode filename support (https://forum.hdfgroup.org/t/non-english-characters-in-hdf5-file-name/4627/8)

@jswhit2
Copy link
Contributor

jswhit2 commented Mar 2, 2020

h5py/h5py#839 suggests that this may be been fixed in HD5 1.10.6 (at least for windows). What version of HDF5 are you using?

@jswhit2
Copy link
Contributor

jswhit2 commented Mar 2, 2020

The following works for me

import netCDF4
filename="delta_\u0394.nc"
print(filename)
nc = netCDF4.Dataset(filename,'w')
nc.filename = filename
nc.close()
nc = netCDF4.Dataset(filename)
print(nc)

delta_Δ.nc
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    filename: delta_Δ.nc
    dimensions(sizes):
    variables(dimensions):
    groups:

Can you modify this example to use the utf-8 filename causing you problems?

@LiNinghui-AI
Copy link
Author

The following works for me

import netCDF4
filename="delta_\u0394.nc"
print(filename)
nc = netCDF4.Dataset(filename,'w')
nc.filename = filename
nc.close()
nc = netCDF4.Dataset(filename)
print(nc)

delta_Δ.nc
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    filename: delta_Δ.nc
    dimensions(sizes):
    variables(dimensions):
    groups:

Can you modify this example to use the utf-8 filename causing you problems?

Thank you very much! I am going to try.

@LiNinghui-AI
Copy link
Author

The following works for me

import netCDF4
filename="delta_\u0394.nc"
print(filename)
nc = netCDF4.Dataset(filename,'w')
nc.filename = filename
nc.close()
nc = netCDF4.Dataset(filename)
print(nc)

delta_Δ.nc
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    filename: delta_Δ.nc
    dimensions(sizes):
    variables(dimensions):
    groups:

Can you modify this examptf-8 filename causing you problems?

@LiNinghui-AI
Copy link
Author

h5py/h5py#839 suggests that this may be been fixed in HD5 1.10.6 (at least for windows). What version of HDF5 are you using?

I use netCDF4 1.5.3.

@LiNinghui-AI
Copy link
Author

The following works for me

import netCDF4
filename="delta_\u0394.nc"
print(filename)
nc = netCDF4.Dataset(filename,'w')
nc.filename = filename
nc.close()
nc = netCDF4.Dataset(filename)
print(nc)

delta_Δ.nc
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    filename: delta_Δ.nc
    dimensions(sizes):
    variables(dimensions):
    groups:

Can you modify this example to use the utf-8 filename causing you problems?

import netCDF4 as np
from netCDF4 import Dataset
path = r"H:\LNHCourse\海洋探测技术专题实验\专题实验八\V2019151040600.L2_SNPP_OC.nc"
fh = Dataset( path )

Above is my code, I think it is simple and I met the error :
[Errno 2] No such file or directory: b'H:\LNHCourse\\xe6\xb5\xb7\xe6\xb4\x8b\xe6\x8e\xa2\xe6\xb5\x8b\xe6\x8a\x80\xe6\x9c\xaf\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c\\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c\xe5\x85\xab\V2019151040600.L2_SNPP_OC.nc'

@jswhit
Copy link
Collaborator

jswhit commented Mar 3, 2020

What encoding are you using?

@LiNinghui-AI
Copy link
Author

What encoding are you using?

UTF8

@jswhit
Copy link
Collaborator

jswhit commented Mar 9, 2020

I think this is a windows filename encoding issue - I can create a file with this filename and read it back in on macos x and linux. Unfortunately, I don't have access to Windows.

@jswhit
Copy link
Collaborator

jswhit commented Mar 10, 2020

Disregard my last message - I can now reproduce this on macos x and linux. Here's the script:

import netCDF4, os
dirpath1 =\
b'\xe6\xb5\xb7\xe6\xb4\x8b\xe6\x8e\xa2\xe6\xb5\x8b\xe6\x8a\x80\xe6\x9c\xaf\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c'.decode('utf-8')
dirpath2 =\
b'\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c\xe5\x85\xab'.decode('utf-8')
dirpath = os.path.join(dirpath1,dirpath2)
os.makedirs(dirpath,exist_ok=True)
filename=os.path.join(dirpath,'V2019151040600.L2_SNPP_OC.nc')
print(filename)
nc = netCDF4.Dataset(filename,'w')
nc.filename = filename
nc.close()
nc = netCDF4.Dataset(filename)
print(nc.filename)

which produces

[mac28:~/python] jwhitaker% python3.7 unicode_filename.py
海洋探测技术专题实验/专题实验八/V2019151040600.L2_SNPP_OC.nc
Traceback (most recent call last):
File "unicode_filename.py", line 10, in
nc = netCDF4.Dataset(filename,'w')
File "netCDF4/_netCDF4.pyx", line 2321, in netCDF4._netCDF4.Dataset.init
File "netCDF4/_netCDF4.pyx", line 1885, in netCDF4._netCDF4._ensure_nc_success
PermissionError: [Errno 13] Permission denied: b'\xe6\xb5\xb7\xe6\xb4\x8b\xe6\x8e\xa2\xe6\xb5\x8b\xe6\x8a\x80\xe6\x9c\xaf\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c/\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c\xe5\x85\xab/V2019151040600.L2_SNPP_OC.nc'

Curiously, the file is created - but can't be read. ncdump produces the same error when given the full path to the file, so it isn't an issue in the python interface.

[mac28:/python] jwhitaker% ls -l 海洋探测技术专题实验/专题实验八/V2019151040600.L2_SNPP_OC.nc
-rw-r--r-- 1 jwhitaker PSD\climate 6144 Mar 10 11:47 海洋探测技术专题实验/专题实验八/V2019151040600.L2_SNPP_OC.nc
[mac28:
/python] jwhitaker% ncdump 海洋探测技术专题实验/专题实验八/V2019151040600.L2_SNPP_OC.nc
ncdump: 海洋探测技术专题实验/专题实验八/V2019151040600.L2_SNPP_OC.nc: 海洋探测技术专题实验/专题实验八/V2019151040600.L2_SNPP_OC.nc: No such file or directory

h5dump does work though - so it seems like it's an issue with nc_open

@jswhit
Copy link
Collaborator

jswhit commented Mar 10, 2020

The following h5py script works, so I don't think this is an hdf5 issue

import h5py, os
dirpath1 =\
b'\xe6\xb5\xb7\xe6\xb4\x8b\xe6\x8e\xa2\xe6\xb5\x8b\xe6\x8a\x80\xe6\x9c\xaf\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c'.decode('utf-8')
dirpath2 =\
b'\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c\xe5\x85\xab'.decode('utf-8')
dirpath = os.path.join(dirpath1,dirpath2)
os.makedirs(dirpath,exist_ok=True)
filename=os.path.join(dirpath,'V2019151040600.L2_SNPP_OC.h5')
print(filename)
f = h5py.File(filename,'w')
dset = f.create_dataset("mydataset", (100,), dtype='i')
f.close()
f = h5py.File(filename,'r')
print(f)

[mac28:~/python] jwhitaker% python3.7 unicode_filename_h5py.py
海洋探测技术专题实验/专题实验八/V2019151040600.L2_SNPP_OC.h5
<HDF5 file "V2019151040600.L2_SNPP_OC.h5" (mode r)>

@jswhit
Copy link
Collaborator

jswhit commented Mar 10, 2020

From https://support.hdfgroup.org/HDF5/doc/Advanced/UsingUnicode/index.html:

Since file access is a system issue, filenames do not fall within the scope of HDF5’s UTF-8 capabilities; filenames are encoded at the system level. Linux and Mac OS systems normally handle UTF-8 encoded filenames correctly while Windows systems generally do not.

A simple workaround in this case (since the chinese characters are in the directory names, not the filename itself) is to create the directory in python, change the working directory, and then create the file in that directory

import netCDF4, os
dirpath1 =\
b'\xe6\xb5\xb7\xe6\xb4\x8b\xe6\x8e\xa2\xe6\xb5\x8b\xe6\x8a\x80\xe6\x9c\xaf\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c'.decode('utf-8')
dirpath2 =\
b'\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c\xe5\x85\xab'.decode('utf-8')
dirpath = os.path.join(dirpath1,dirpath2)
os.makedirs(dirpath,exist_ok=True)
os.chdir(dirpath)
filename='V2019151040600.L2_SNPP_OC.nc'
print(filename)
nc = netCDF4.Dataset(filename,'w')
nc.filename = filename
nc.close()
nc = netCDF4.Dataset(filename)
print(os.getcwd)
print(nc.filename)

Still curious as to how h5py manages to make it work, digging further...

@LiNinghui-AI
Copy link
Author

From https://support.hdfgroup.org/HDF5/doc/Advanced/UsingUnicode/index.html:

Since file access is a system issue, filenames do not fall within the scope of HDF5’s UTF-8 capabilities; filenames are encoded at the system level. Linux and Mac OS systems normally handle UTF-8 encoded filenames correctly while Windows systems generally do not.

A simple workaround in this case (since the chinese characters are in the directory names, not the filename itself) is to create the directory in python, change the working directory, and then create the file in that directory

import netCDF4, os
dirpath1 =\
b'\xe6\xb5\xb7\xe6\xb4\x8b\xe6\x8e\xa2\xe6\xb5\x8b\xe6\x8a\x80\xe6\x9c\xaf\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c'.decode('utf-8')
dirpath2 =\
b'\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c\xe5\x85\xab'.decode('utf-8')
dirpath = os.path.join(dirpath1,dirpath2)
os.makedirs(dirpath,exist_ok=True)
os.chdir(dirpath)
filename='V2019151040600.L2_SNPP_OC.nc'
print(filename)
nc = netCDF4.Dataset(filename,'w')
nc.filename = filename
nc.close()
nc = netCDF4.Dataset(filename)
print(os.getcwd)
print(nc.filename)

Still curious as to how h5py manages to make it work, digging further...
Thank you very much for your help, I followed your suggestion to successfully solve this problem

@jswhit2
Copy link
Contributor

jswhit2 commented Mar 11, 2020

A potential solution (bug fix in the C lib) is being discussed at Unidata/netcdf-c#1666

@zouyaoji
Copy link

@LiNinghui-AI 嗨 问一下解决了吗?我这有中文路径还是不行呢~

@zgcao
Copy link

zgcao commented Jan 4, 2022

I tryied an indirect method:
(1) replace the filename using English character using os.rename(src, dst);
(2) using the netcdf4 to read it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants