NetCDF `valid_min`/`_max`/`_range` do not mask datasets and do not get scaled

### What is your issue?

When reading a netCDF dataset with `decode_cf` and `mask_and_scale` set to `True`, Xarray uses the `scale_factor` and `_FillValue`/`missing_value` attributes of each variable in the dataset to apply the proper masking and scaling. However, from what I can tell, it does not handle certain other common attributes when masking, in particular: `valid_max`, `valid_min`, and `valid_range`. I can't find any direct statement of this behavior in the Xarray documentation or by searching this repository, but I encountered the behavior myself and found a mention in the [documentation for the xcube package](https://xcube.readthedocs.io/en/latest/cubespec.html#encoding-of-missing-values) (this relates to zarr rather than netCDF but is the only mention I could find).

It is nontrivial to handle this as a user, because you (rightfully) lose the `scale_factor` attribute on read when `mask_and_scale` is true. Since `valid_min`/`_max`/`_range` are stored in the same domain as the packed data if conventions are followed (i.e. unscaled if there is a `scale_factor`), it becomes complicated to use them for masking after the fact.

I can only find one discussion (#822) on whether these attributes should or should not be handled by Xarray. In that thread, it was brought up that 1) netCDF4-python doesn't handle this on their end, 2) this doesn't really matter from a technical standpoint anyway because Xarray uses its own logic for scaling, and 3) apparently, they are not directly part of the CF conventions, but [rather the NUG convention](https://github.com/pydata/xarray/issues/822#issuecomment-208214490).

However, netCDF4-python does mask values outside `valid_min`/`_max`/`_range` when opening a dataset (Unidata/netcdf4-python#670), so I feel it would be natural to do the same in Xarray, at least when `decode_cf` and `mask_and_scale` are both `True`. Additionally, according to the [netCDF attribute conventions](https://docs.unidata.ucar.edu/netcdf-c/current/attribute_conventions.html), "generic applications should treat values outside the valid range as missing". I'm not sure any of this was the case back in 2016 when this was last discussed.

I propose that `mask_and_scale` should (optionally?) mask values which are invalid according to these attributes. If there are reasons not to, then perhaps, at least, `valid_min`/`_max`/`_range` could be transformed by `scale_factor` and `add_offset` when scaling is applied to the rest of the dataset, so that users can easily create the relevant masks themselves.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

NetCDF `valid_min`/`_max`/`_range` do not mask datasets and do not get scaled #8359

What is your issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

NetCDF valid_min/_max/_range do not mask datasets and do not get scaled #8359

Description

What is your issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

NetCDF `valid_min`/`_max`/`_range` do not mask datasets and do not get scaled #8359