-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot open file with non-ASCII characters in the path, such as Japanese. #1786
Comments
What version of netcdf are you using? |
I'm using 4.7.4. |
I think you are being misled by the fact that the type of the path |
Oops just recalled you said Windows. Let me test that. I could well believe |
|
Ok, this is going to take a while to fix. For the record, I need to do this:
|
So now I am confused. I just did a build and test of netcdf using visual |
I cannot think that most Unix programmers and English Windows programmers will deal this, because it works correctly in their environment without do anything. To do so, how about the following? |
It seems that those tests are for variable or dimension names, not for filenames. test_unicode_directory.sh is for UTF-8 filename.
Assume your code page is Windows-1252: |
Bash and the linux api in general handles utf8 ok as near as I can tell. |
I do not understand what you mean. The netcdf library |
I know that The cause is that the behavior of HDF5 has changed. So the simplest solution is, as I first showed, convert the filename from ANSI to UTF-8 before calling HDF5 functions if HDF5 >= 1.10.6. |
On Windows, with HDF5 1.10.6 or later,
nc_open()
does not work properly for the filename containing non-ASCII characters.Unfortunately, #1668 does not resolve this issue.
This is because Netcdf treats the filename encodings as ANSI, whereas HDF5 now treats as UTF-8. "ANSI" means locale specific 8-bit character set.
If we pass the filename encoded as ANSI to
nc_open()
,check_file_type()
is called then file opened byfopen()
. At this time, the filename is treated as ANSI, so the file can be opened.If the file format is NC4,
H5Fopen()
is called. Since the filename is treated as UTF-8, it is converted from UTF-8 to UTF-16, and the file is opened by_wopen()
. However, this conversion is incorrect so_wopen()
will fail.On the other hand, if we pass the filename encoded as UTF-8 to
nc_open()
, opening the file fails atcheck_file_type()
, since the UTF-8 filename contains illegal characters for ANSI.To solve this problem, convert the filename from ANSI to UTF-8 before calling HDF5 functions if HDF5 is 1.10.6 or later.
Additionally, it would be nice if a new API that accepts UTF-8 or UTF-16 like
nc_openW()
ornc_open_utf8()
would be added to access the Unicode filename.The text was updated successfully, but these errors were encountered: