-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Want an option to ignore variables of unsupported data type when opening zarr files #2465
Comments
Out of curiosity, why store these URLs (which I assume have no dimensionality) as global attributes instead of variables? |
Any chance you could send me one of those files in either zip format |
These URLs do have dimensionality. They correspond to each time chunk and contain source information from where the individual file was downloaded. And yes, @DennisHeimbigner , we would be open to being a test case for string support. Here is the file that has the issue. It has a bogus URL for now, but it has the same dimensions as the time variable. In case you are interested in replicating the issue, the error that I get when opening this file is "Assertion failed: (type && type->format_type_info != NULL), function zclose_type, file zclose.c, line 228." |
@amberjungminlee Ah, that makes sense then. |
re: Issue Unidata#2465 re: Issue Unidata#2259 [Note: It also tangentially affects PR Unidata#2466 since this PR requires that PR to be merged before this one and actually includes that PR here.] The primary issue to be addressed is to provide a way for user to specify the size of the fixed length strings. This is handled by providing the following new attributes special: 1. **_nczarr_default_maxstrlen** — This is an attribute of the root group. It specifies the default maximum string length for string types. If not specified, then it has the value of 64 characters. 2. **_nczarr_maxstrlen** &mdash This is a per-variable attribute. It specifies the maximum string length for the string type associated with the variable. If not specified, then it is assigned the value of **_nczarr_default_maxstrlen**. This PR also requires some hacking to handle the existing netcdf-c NC_CHAR type, which does not exist in zarr. The goal was to choose numpy types for both the netcdf-c NC_STRING type and the netcdf-c NC_CHAR type such that if a pure zarr implementation read them, it would still work and an NC_CHAR type would be handled by zarr as a string of length 1. For writing variables and NCZarr attributes, the type mapping is as follows: * "|S1" for NC_CHAR. * ">S1" for NC_STRING && MAXSTRLEN==1 * ">Sn" for NC_STRING && MAXSTRLEN==n Note that it is a bit of a hack to use endianness, but it should be ok since for string/char, the endianness has no meaning. For reading attributes with pure zarr (i.e. with no nczarr atribute types defined), they will always be interpreted as of type NC_CHAR. ## Misc. Other Changes 1. Convert the nczarr special attributes and keys to be all lower case. So "_NCZARR_ATTR" now used "_nczarr_attr. Support back compatibility for the upper case names. 2. Cleanup my too-clever-by-half handling of scalars in libnczarr.
This PR (#2467) is an experimental |
Fixed by #2492 |
Our team is working with earth science zarr data that has some variable metadata stored as a string object type. These variables contain strings such as source URLs that correspond to each chunk. We are aware that in the documentation, string types are not supported as variables. This is fine. Because this field is just additional metadata for internal purposes, they are not necessary for our use case of netCDF4. We want it so that the netCDF4 code can either ignore the string variables, throw a warning that string variables are not supported, or simply have limited functionality for string variables. We just don't want the code to break when we open the zarr file.
The text was updated successfully, but these errors were encountered: