-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recognize _Encoding attribute for char and string arrays #665
Conversation
Use 'utf-8' and 'replace' for everything except NC_STRING variable data. For NC_STRING variable data, look for _Encoding variable attribute, otherwise use 'utf-8'.
character variables.
force 'U' dtype in chartostring.
to a char variable with _Encoding set.
character array (type='S1') is given
@shoyer, I'm wondering how this change would impact xarray - especially the auto-conversion of char arrays to string arrays with the last dimension collapsed. This would only happen if the |
last dim of char variable.
@jswhit thanks for the heads up. Yes, I think this implementation as-is would break xarray, where we do our own char -> string array conversion. There are two ways to fix this:
I like this second option better. |
The second option would be nice, but quite difficult since How about adding a |
That would fit the existing API of the library, where any interpretation of attributes is configurable... |
I went ahead and added a |
I'm okay with methods for this. But going forward, this is probably a case for separate low level and high level interfaces, even if only the high level interface is exposed publicly. h5py uses this approach and it works quite well. |
OK, merging now. @shoyer, good idea about the low level interface. I'll create a separate ticket for that. |
Add check for
_Encoding
attribute forNC_STRING
variables, otherwise use 'utf-8'. 'utf-8' is used everywhere else, 'default_encoding' global module variable is no longer used.getncattr
method now takes optional kwarg 'encoding' (default 'utf-8') so encoding of attributes can be specified if desired. If_Encoding
is specified for anNC_CHAR
('S1') variable,thechartostring
utility function is used to convert the array of characters to an array of strings with one less dimension (the last dimension is interpreted as the length of each string) when reading the data. When writing the data,stringtochar
is used to convert a numpy array of fixed length strings to an array of characters with one more dimension.chartostring
andstringtochar
now also have an 'encoding' kwarg.The
_Encoding
attribute convention is being discussed in Unidata/netcdf-c#402.