Chinese character in the path using netcdf4 dataset #997

LiNinghui-AI · 2020-02-29T11:43:40Z

I need to open a nc file with netcdf4. But in the path of the file, there are chinese characters and netCDF4.Dataset return the error : "No such file or directory". But if I use "os.path.isfile", the file is found. I've tried to decode, encode (in utf-8) the path without result.

Is there any option to use in dataset call ?

Thanks.

jswhit2 · 2020-03-02T21:36:25Z

This may be related to #941, which in turn is related to HDF5 unicode filename support (https://forum.hdfgroup.org/t/non-english-characters-in-hdf5-file-name/4627/8)

jswhit2 · 2020-03-02T21:38:18Z

h5py/h5py#839 suggests that this may be been fixed in HD5 1.10.6 (at least for windows). What version of HDF5 are you using?

jswhit2 · 2020-03-02T22:18:15Z

The following works for me

import netCDF4
filename="delta_\u0394.nc"
print(filename)
nc = netCDF4.Dataset(filename,'w')
nc.filename = filename
nc.close()
nc = netCDF4.Dataset(filename)
print(nc)

delta_Δ.nc
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    filename: delta_Δ.nc
    dimensions(sizes):
    variables(dimensions):
    groups:

Can you modify this example to use the utf-8 filename causing you problems?

LiNinghui-AI · 2020-03-03T03:28:17Z

The following works for me

import netCDF4
filename="delta_\u0394.nc"
print(filename)
nc = netCDF4.Dataset(filename,'w')
nc.filename = filename
nc.close()
nc = netCDF4.Dataset(filename)
print(nc)

delta_Δ.nc
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    filename: delta_Δ.nc
    dimensions(sizes):
    variables(dimensions):
    groups:

Can you modify this example to use the utf-8 filename causing you problems?

Thank you very much! I am going to try.

LiNinghui-AI · 2020-03-03T03:29:15Z

The following works for me

import netCDF4
filename="delta_\u0394.nc"
print(filename)
nc = netCDF4.Dataset(filename,'w')
nc.filename = filename
nc.close()
nc = netCDF4.Dataset(filename)
print(nc)

delta_Δ.nc
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    filename: delta_Δ.nc
    dimensions(sizes):
    variables(dimensions):
    groups:

Can you modify this examptf-8 filename causing you problems?

LiNinghui-AI · 2020-03-03T03:36:03Z

h5py/h5py#839 suggests that this may be been fixed in HD5 1.10.6 (at least for windows). What version of HDF5 are you using?

I use netCDF4 1.5.3.

LiNinghui-AI · 2020-03-03T03:42:25Z

The following works for me

import netCDF4
filename="delta_\u0394.nc"
print(filename)
nc = netCDF4.Dataset(filename,'w')
nc.filename = filename
nc.close()
nc = netCDF4.Dataset(filename)
print(nc)

delta_Δ.nc
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    filename: delta_Δ.nc
    dimensions(sizes):
    variables(dimensions):
    groups:

Can you modify this example to use the utf-8 filename causing you problems?

import netCDF4 as np
from netCDF4 import Dataset
path = r"H:\LNHCourse\海洋探测技术专题实验\专题实验八\V2019151040600.L2_SNPP_OC.nc"
fh = Dataset( path )

Above is my code, I think it is simple and I met the error :
[Errno 2] No such file or directory: b'H:\LNHCourse\\xe6\xb5\xb7\xe6\xb4\x8b\xe6\x8e\xa2\xe6\xb5\x8b\xe6\x8a\x80\xe6\x9c\xaf\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c\\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c\xe5\x85\xab\V2019151040600.L2_SNPP_OC.nc'

jswhit · 2020-03-03T15:37:07Z

What encoding are you using?

LiNinghui-AI · 2020-03-06T07:45:12Z

What encoding are you using?

UTF8

jswhit · 2020-03-09T18:56:15Z

I think this is a windows filename encoding issue - I can create a file with this filename and read it back in on macos x and linux. Unfortunately, I don't have access to Windows.

jswhit · 2020-03-10T16:28:09Z

Disregard my last message - I can now reproduce this on macos x and linux. Here's the script:

import netCDF4, os
dirpath1 =\
b'\xe6\xb5\xb7\xe6\xb4\x8b\xe6\x8e\xa2\xe6\xb5\x8b\xe6\x8a\x80\xe6\x9c\xaf\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c'.decode('utf-8')
dirpath2 =\
b'\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c\xe5\x85\xab'.decode('utf-8')
dirpath = os.path.join(dirpath1,dirpath2)
os.makedirs(dirpath,exist_ok=True)
filename=os.path.join(dirpath,'V2019151040600.L2_SNPP_OC.nc')
print(filename)
nc = netCDF4.Dataset(filename,'w')
nc.filename = filename
nc.close()
nc = netCDF4.Dataset(filename)
print(nc.filename)

which produces

[mac28:~/python] jwhitaker% python3.7 unicode_filename.py
海洋探测技术专题实验/专题实验八/V2019151040600.L2_SNPP_OC.nc
Traceback (most recent call last):
File "unicode_filename.py", line 10, in
nc = netCDF4.Dataset(filename,'w')
File "netCDF4/_netCDF4.pyx", line 2321, in netCDF4._netCDF4.Dataset.init
File "netCDF4/_netCDF4.pyx", line 1885, in netCDF4._netCDF4._ensure_nc_success
PermissionError: [Errno 13] Permission denied: b'\xe6\xb5\xb7\xe6\xb4\x8b\xe6\x8e\xa2\xe6\xb5\x8b\xe6\x8a\x80\xe6\x9c\xaf\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c/\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c\xe5\x85\xab/V2019151040600.L2_SNPP_OC.nc'

Curiously, the file is created - but can't be read. ncdump produces the same error when given the full path to the file, so it isn't an issue in the python interface.

[mac28:/python] jwhitaker% ls -l 海洋探测技术专题实验/专题实验八/V2019151040600.L2_SNPP_OC.nc
-rw-r--r-- 1 jwhitaker PSD\climate 6144 Mar 10 11:47 海洋探测技术专题实验/专题实验八/V2019151040600.L2_SNPP_OC.nc
[mac28:/python] jwhitaker% ncdump 海洋探测技术专题实验/专题实验八/V2019151040600.L2_SNPP_OC.nc
ncdump: 海洋探测技术专题实验/专题实验八/V2019151040600.L2_SNPP_OC.nc: 海洋探测技术专题实验/专题实验八/V2019151040600.L2_SNPP_OC.nc: No such file or directory

h5dump does work though - so it seems like it's an issue with nc_open

jswhit · 2020-03-10T16:36:02Z

The following h5py script works, so I don't think this is an hdf5 issue

import h5py, os
dirpath1 =\
b'\xe6\xb5\xb7\xe6\xb4\x8b\xe6\x8e\xa2\xe6\xb5\x8b\xe6\x8a\x80\xe6\x9c\xaf\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c'.decode('utf-8')
dirpath2 =\
b'\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c\xe5\x85\xab'.decode('utf-8')
dirpath = os.path.join(dirpath1,dirpath2)
os.makedirs(dirpath,exist_ok=True)
filename=os.path.join(dirpath,'V2019151040600.L2_SNPP_OC.h5')
print(filename)
f = h5py.File(filename,'w')
dset = f.create_dataset("mydataset", (100,), dtype='i')
f.close()
f = h5py.File(filename,'r')
print(f)

[mac28:~/python] jwhitaker% python3.7 unicode_filename_h5py.py
海洋探测技术专题实验/专题实验八/V2019151040600.L2_SNPP_OC.h5
<HDF5 file "V2019151040600.L2_SNPP_OC.h5" (mode r)>

jswhit · 2020-03-10T16:52:31Z

From https://support.hdfgroup.org/HDF5/doc/Advanced/UsingUnicode/index.html:

Since file access is a system issue, filenames do not fall within the scope of HDF5’s UTF-8 capabilities; filenames are encoded at the system level. Linux and Mac OS systems normally handle UTF-8 encoded filenames correctly while Windows systems generally do not.

A simple workaround in this case (since the chinese characters are in the directory names, not the filename itself) is to create the directory in python, change the working directory, and then create the file in that directory

import netCDF4, os
dirpath1 =\
b'\xe6\xb5\xb7\xe6\xb4\x8b\xe6\x8e\xa2\xe6\xb5\x8b\xe6\x8a\x80\xe6\x9c\xaf\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c'.decode('utf-8')
dirpath2 =\
b'\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c\xe5\x85\xab'.decode('utf-8')
dirpath = os.path.join(dirpath1,dirpath2)
os.makedirs(dirpath,exist_ok=True)
os.chdir(dirpath)
filename='V2019151040600.L2_SNPP_OC.nc'
print(filename)
nc = netCDF4.Dataset(filename,'w')
nc.filename = filename
nc.close()
nc = netCDF4.Dataset(filename)
print(os.getcwd)
print(nc.filename)

Still curious as to how h5py manages to make it work, digging further...

LiNinghui-AI · 2020-03-11T03:48:17Z

From https://support.hdfgroup.org/HDF5/doc/Advanced/UsingUnicode/index.html:

Since file access is a system issue, filenames do not fall within the scope of HDF5’s UTF-8 capabilities; filenames are encoded at the system level. Linux and Mac OS systems normally handle UTF-8 encoded filenames correctly while Windows systems generally do not.

A simple workaround in this case (since the chinese characters are in the directory names, not the filename itself) is to create the directory in python, change the working directory, and then create the file in that directory
import netCDF4, os
dirpath1 =\
b'\xe6\xb5\xb7\xe6\xb4\x8b\xe6\x8e\xa2\xe6\xb5\x8b\xe6\x8a\x80\xe6\x9c\xaf\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c'.decode('utf-8')
dirpath2 =\
b'\xe4\xb8\x93\xe9\xa2\x98\xe5\xae\x9e\xe9\xaa\x8c\xe5\x85\xab'.decode('utf-8')
dirpath = os.path.join(dirpath1,dirpath2)
os.makedirs(dirpath,exist_ok=True)
os.chdir(dirpath)
filename='V2019151040600.L2_SNPP_OC.nc'
print(filename)
nc = netCDF4.Dataset(filename,'w')
nc.filename = filename
nc.close()
nc = netCDF4.Dataset(filename)
print(os.getcwd)
print(nc.filename)
Still curious as to how h5py manages to make it work, digging further...
Thank you very much for your help, I followed your suggestion to successfully solve this problem

jswhit2 · 2020-03-11T13:06:35Z

A potential solution (bug fix in the C lib) is being discussed at Unidata/netcdf-c#1666

zouyaoji · 2021-07-22T03:35:27Z

@LiNinghui-AI 嗨问一下解决了吗？我这有中文路径还是不行呢~

zgcao · 2022-01-04T08:55:42Z

I tryied an indirect method:
(1) replace the filename using English character using os.rename(src, dst);
(2) using the netcdf4 to read it.

LiNinghui-AI closed this as completed Mar 3, 2020

LiNinghui-AI reopened this Mar 3, 2020

jswhit mentioned this issue Mar 10, 2020

problem opening file with Chinese chars in path Unidata/netcdf-c#1666

Closed

LiNinghui-AI closed this as completed Mar 14, 2020

jswhit mentioned this issue Dec 2, 2022

PermissionError: [Errno 13] Permission denied: b'F:\\\xe5\x88\x9b\xe5\xbb\xba\xe6\x96\x87\xe4\xbb\xb6\\\\new.nc' #1220

Closed

kmuehlbauer mentioned this issue Dec 8, 2023

Cannot read a nc file that contains the diacritic mark tilde (á) in the path pydata/xarray#8531

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chinese character in the path using netcdf4 dataset #997

Chinese character in the path using netcdf4 dataset #997

LiNinghui-AI commented Feb 29, 2020

jswhit2 commented Mar 2, 2020

jswhit2 commented Mar 2, 2020

jswhit2 commented Mar 2, 2020 •

edited

Loading

LiNinghui-AI commented Mar 3, 2020

LiNinghui-AI commented Mar 3, 2020

LiNinghui-AI commented Mar 3, 2020

LiNinghui-AI commented Mar 3, 2020

jswhit commented Mar 3, 2020

LiNinghui-AI commented Mar 6, 2020

jswhit commented Mar 9, 2020

jswhit commented Mar 10, 2020 •

edited

Loading

jswhit commented Mar 10, 2020

jswhit commented Mar 10, 2020 •

edited

Loading

LiNinghui-AI commented Mar 11, 2020

jswhit2 commented Mar 11, 2020

zouyaoji commented Jul 22, 2021

zgcao commented Jan 4, 2022

Chinese character in the path using netcdf4 dataset #997

Chinese character in the path using netcdf4 dataset #997

Comments

LiNinghui-AI commented Feb 29, 2020

jswhit2 commented Mar 2, 2020

jswhit2 commented Mar 2, 2020

jswhit2 commented Mar 2, 2020 • edited Loading

LiNinghui-AI commented Mar 3, 2020

LiNinghui-AI commented Mar 3, 2020

LiNinghui-AI commented Mar 3, 2020

LiNinghui-AI commented Mar 3, 2020

jswhit commented Mar 3, 2020

LiNinghui-AI commented Mar 6, 2020

jswhit commented Mar 9, 2020

jswhit commented Mar 10, 2020 • edited Loading

jswhit commented Mar 10, 2020

jswhit commented Mar 10, 2020 • edited Loading

LiNinghui-AI commented Mar 11, 2020

jswhit2 commented Mar 11, 2020

zouyaoji commented Jul 22, 2021

zgcao commented Jan 4, 2022

jswhit2 commented Mar 2, 2020 •

edited

Loading

jswhit commented Mar 10, 2020 •

edited

Loading

jswhit commented Mar 10, 2020 •

edited

Loading