-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a netcdf file has NC_STRING variables with fill value turned off (nc_inq_var_fill() function returns no_fill as 1) #2612
Comments
We got a bit more info about how the file was created. The following versions of libraries and packages were used: libnetcdf 4.8.1 Interestingly, the newer pipeline now creates a file that does NOT have the fill value turned off. The newer pipleline uses the same versions, the only difference being HDF5 now is built with SZip support using libaec. There doesn't seem to be a difference in the python code generating the file that would explain why the fill value is NOT turned off with the newer pipeline. I am not sure if we will get to the bottom of this, but would be at least good to have some official confirmation on expected behavior for turning off fill value for NC_STRING variables. Thanks! |
Turning off fill values for netcdf-4 files is not really helpful, and is only supported for backward compatibility with classic formats. The idea of turning off fill values is that, when creating a classic file, instead of setting each value to something, the library can just assign some disk space for the variable and do nothing to it, so that it contains whatever it contains, and that will be random values if read, but we count on the user to later actually write all these data, so we're just skipping the step of writing a fill value everywhere, and then having the fill values overwritten by real data. But HDF5 does not work this way. In HDF5, disk space is not allocated for chunks that are not written. So turning the fill value on for such variables does not increase any disk activity. HDF5 does not write chunks for data until you need to. So if you define a big variable, and then don't write any data to it, there will actually be no disk space allocated, and no fill values will be written in any case. If you then try to read the data, the HDF5 library will pretend that it is there, and it is full of fill value. So turning off fill values for HDF5 data does not actually accomplish anything. |
Thank you for the detailed explanation @edwardhartnett ! I guess I was asking more about what is "allowed" with netCDF APIs (both C and python), i.e. what the official netCDF position is about fill values for NC_STRING variables. From #727 I assumed turning fill values off was not supported (or just not possible) for NC_STRING variables (although I don't think it is documented). So it was a surprise to come across a file which had the fill value turned off for NC_STRING variables. Because of this wrong assumption I ended up running void* p;
char* stringFillValue;
p = &stringFillValue; // p is char**
int no_fill; // 1 if fill value is turned OFF, 0 if fill value is turned ON
status = nc_inq_var_fill(ncid, varid, &no_fill, p);
status = nc_free_string(1, static_cast<char**>(p)); // crash when no_fill=1, i.e. when there is no fill value
|
Hi @edwardhartnett , Sorry to raise this question again, but we have noticed an increased number of such files in recent months. I think their source is European Centre for Medium-Range Weather Forecasts Climate Data Store (ECMWF CDS): https://cds.climate.copernicus.eu/ (the new, just recently released system). Specifically, when downloading data there is an (experimental) option to convert GRIB files to netCDF, and it seems to produce files where NC_STRIG variables have disabled fill values (nc_inq_var_fill returns no_fill=1). Could you comment on this? Would you consider such files well-formed (if there is no way to turn off fill values for NC_STRING variables using netCDF library)? |
Hello!
I was hoping to get clarification about whether it is supported to "turn off" fill value for NC_STRING variables.
We recently came across a file that had two NC_STRING variables with the fill value turned off (
nc_inq_var_fill()
function used on those variables returnsno_fill
as 1). I was under assumption that this should not be possible (e.g. see #727 and Unidata/netcdf4-python#331 ), but I am not sure it is officially documented... We tried to get more info about how this file was created, and apparently it was created with netcdf-python and this code here was used:From my (limited) understanding of netcdf-python, setting
fill_value=None
does not turn off fill value, but just sets it to a default fill value. When we use netcdf-python and create "U1" or "double" variables withfill_value=None
, we still see thatnc_inq_var_fill()
function returnsno_fill
as 0 (fill value is still turned on). From the netcdf4-python doc, it seems like the way to turn off the fill value is to usefill_value=False
. When we use netcdf-python and create "U1" or "double" variables withfill_value=False
, we see that for the double variablenc_inq_var_fill()
returnsno_fill
as 1 (so the fill value is successfully turned off). But for the "U1"/string variable,nc_inq_var_fill()
still returnsno_fill
as 0 (the fill value is still on). This agrees with our understanding that turning off fill value for NC_STRING variables is unsupported or at least is broken (and results in no-op in netcdf-python, as described in Unidata/netcdf4-python#331 (comment) ).When I try to create an NC_STRING variable with fill value "turned off" using netcdf-c, I get get -36 error code (NC_EINVAL - "NetCDF: Invalid argument"), which is also consistent with #727 (comment).
We were not able to get any more details about how this file was created and how the fill value for NC_STRING variables was "turned off".
We also saw that ncdump from v.4.7.0 could read this file, but ncdump from v.4.9.0 could not (Unknown file format)...
It would be great to know the following:
The text was updated successfully, but these errors were encountered: