Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot set _FillValue attribute for VLEN or compound variable #730

Closed
shoyer opened this issue Oct 23, 2017 · 11 comments
Closed

cannot set _FillValue attribute for VLEN or compound variable #730

shoyer opened this issue Oct 23, 2017 · 11 comments

Comments

@shoyer
Copy link
Contributor

shoyer commented Oct 23, 2017

Is this an intrinsic limitation of netCDF-C, or simply something that hasn't been wrapped yet?

I don't need the aspect of filling in default values in a Variable, but I would like to be able to set _FillValue to indicate values that should be treated as missing when read from disk.

My use case is supporting arrays of (unicode) strings with missing values in xarray.

@shoyer
Copy link
Contributor Author

shoyer commented Oct 23, 2017

For example, a good choice for _FillValue could be one of the non-character unicode symbols (e.g., U+FFFF), which are guaranteed not to correspond to valid characters.

@jswhit
Copy link
Collaborator

jswhit commented Oct 23, 2017

The C-lib doesn't support fill values for non-string vlens and compound variables. It might work for vlen strings (NC_STRING), let me have a look....

@shoyer
Copy link
Contributor Author

shoyer commented Oct 23, 2017

What about setting the _FillValue attribute, rather than the fill_value interpreted by netCDF-C/HDF5 (i.e., by nc_set_fill)?

Currently, overriding _FillValue by setting the attribute is not allowed, but I don't think the attribute is actually directly understood by netCDF-C (though I could be wrong here):

if name == '_FillValue':
msg='_FillValue attribute must be set when variable is '+\
'created (using fill_value keyword to createVariable)'
raise AttributeError(msg)

@dopplershift
Copy link
Member

Well, there's this:

Attribute names commencing with underscore ('_') are reserved for use by the netCDF library.

Given the wording in Appendix A I don't think the python wrapper should be playing games here. @WardF anything to add?

@WardF
Copy link
Member

WardF commented Oct 23, 2017

You can set _FillValue by hand and it is interpreted by the C library; you don't have to use nc_set_fill. This is in the documentation somewhere, I will see if I can track it down.

@jswhit
Copy link
Collaborator

jswhit commented Oct 23, 2017

Pull request #732 allows _FillValue to be set for vlen string variables using the fill_value createVariable kwarg. Still cannot be set for non-string vlens and compound variables though.

@shoyer
Copy link
Contributor Author

shoyer commented Oct 23, 2017

@jswhit Awesome, the possibility of setting fill-value for vlen strings would solve the major issue here.

@jswhit
Copy link
Collaborator

jswhit commented Oct 24, 2017

Can you give it a try and let me know if it does what you need?

@shoyer
Copy link
Contributor Author

shoyer commented Oct 24, 2017

@jswhit I can give it a try, but I would trust a unit test more than my anecdotal experience :)

@shoyer
Copy link
Contributor Author

shoyer commented Oct 24, 2017

Yes, it works -- even with automatic decoding of array values matching the fill-value into the appropriate missing value!

In [4]: ds = netCDF4.Dataset('varlen.nc', 'w')

In [5]: ds.createDimension('x', 2)
Out[5]: <class 'netCDF4._netCDF4.Dimension'>: name = 'x', size = 2

In [6]: var = ds.createVariable('y', str, ('x',), fill_value='<missing>')

In [7]: var[0] = 'first'

In [8]: ds.close()

In [11]: import xarray

In [12]: ds = xarray.open_dataset('varlen.nc')

In [13]: ds.load()
Out[13]:
<xarray.Dataset>
Dimensions:  (x: 2)
Dimensions without coordinates: x
Data variables:
    y        (x) object 'first' nan

@jswhit
Copy link
Collaborator

jswhit commented Oct 25, 2017

pull request merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants