You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When reading a netCDF dataset with decode_cf and mask_and_scale set to True, Xarray uses the scale_factor and _FillValue/missing_value attributes of each variable in the dataset to apply the proper masking and scaling. However, from what I can tell, it does not handle certain other common attributes when masking, in particular: valid_max, valid_min, and valid_range. I can't find any direct statement of this behavior in the Xarray documentation or by searching this repository, but I encountered the behavior myself and found a mention in the documentation for the xcube package (this relates to zarr rather than netCDF but is the only mention I could find).
It is nontrivial to handle this as a user, because you (rightfully) lose the scale_factor attribute on read when mask_and_scale is true. Since valid_min/_max/_range are stored in the same domain as the packed data if conventions are followed (i.e. unscaled if there is a scale_factor), it becomes complicated to use them for masking after the fact.
I can only find one discussion (#822) on whether these attributes should or should not be handled by Xarray. In that thread, it was brought up that 1) netCDF4-python doesn't handle this on their end, 2) this doesn't really matter from a technical standpoint anyway because Xarray uses its own logic for scaling, and 3) apparently, they are not directly part of the CF conventions, but rather the NUG convention.
However, netCDF4-python does mask values outside valid_min/_max/_range when opening a dataset (Unidata/netcdf4-python#670), so I feel it would be natural to do the same in Xarray, at least when decode_cf and mask_and_scale are both True. Additionally, according to the netCDF attribute conventions, "generic applications should treat values outside the valid range as missing". I'm not sure any of this was the case back in 2016 when this was last discussed.
I propose that mask_and_scale should (optionally?) mask values which are invalid according to these attributes. If there are reasons not to, then perhaps, at least, valid_min/_max/_range could be transformed by scale_factor and add_offset when scaling is applied to the rest of the dataset, so that users can easily create the relevant masks themselves.
The text was updated successfully, but these errors were encountered:
Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!
@claytharrison Sorry for the massive delay here. This slipped somehow through the cracks.
Thanks for the detailed problem description.
For xarray I currently see only three solutions/workarounds handling these types of packed data:
scale_factor and/or add_offset are saved within variables encoding dict. Users could transform the valid_min/valid_max/valid_range with those and create appropriate masks.
do 1. as part of decoding and scale/offset the valid_* attributes. Reverse the process on encoding.
add actual_range attribute (if not available) when decoding (derived from valid_* attributes). actual_range should have the type intended for the unpacked data.
Simplest solution, but less user friendly is 1. Solution 2. is too much involved and error prone. Solution 3. would be less invasive and most user friendly. There might be other solutions which I do not have on the list right now.
I'd favour solution 3. which is conforming to standard, user friendly and relatively easy to handle in the encoding step.
What is your issue?
When reading a netCDF dataset with
decode_cf
andmask_and_scale
set toTrue
, Xarray uses thescale_factor
and_FillValue
/missing_value
attributes of each variable in the dataset to apply the proper masking and scaling. However, from what I can tell, it does not handle certain other common attributes when masking, in particular:valid_max
,valid_min
, andvalid_range
. I can't find any direct statement of this behavior in the Xarray documentation or by searching this repository, but I encountered the behavior myself and found a mention in the documentation for the xcube package (this relates to zarr rather than netCDF but is the only mention I could find).It is nontrivial to handle this as a user, because you (rightfully) lose the
scale_factor
attribute on read whenmask_and_scale
is true. Sincevalid_min
/_max
/_range
are stored in the same domain as the packed data if conventions are followed (i.e. unscaled if there is ascale_factor
), it becomes complicated to use them for masking after the fact.I can only find one discussion (#822) on whether these attributes should or should not be handled by Xarray. In that thread, it was brought up that 1) netCDF4-python doesn't handle this on their end, 2) this doesn't really matter from a technical standpoint anyway because Xarray uses its own logic for scaling, and 3) apparently, they are not directly part of the CF conventions, but rather the NUG convention.
However, netCDF4-python does mask values outside
valid_min
/_max
/_range
when opening a dataset (Unidata/netcdf4-python#670), so I feel it would be natural to do the same in Xarray, at least whendecode_cf
andmask_and_scale
are bothTrue
. Additionally, according to the netCDF attribute conventions, "generic applications should treat values outside the valid range as missing". I'm not sure any of this was the case back in 2016 when this was last discussed.I propose that
mask_and_scale
should (optionally?) mask values which are invalid according to these attributes. If there are reasons not to, then perhaps, at least,valid_min
/_max
/_range
could be transformed byscale_factor
andadd_offset
when scaling is applied to the rest of the dataset, so that users can easily create the relevant masks themselves.The text was updated successfully, but these errors were encountered: