Best practice when the _Unsigned attribute is present in NetCDF files

Some (large) data providers are writing NetCDF-4-extended files but using an `_Unsigned` attribute to indicate that a signed data type should be interpreted as unsigned bytes. 

Background: https://github.com/Unidata/netcdf4-python/issues/656

From the background discussion above, it is my understanding that xarray does not honor the attribute because it’s not a part of the CF spec, is only mentioned as a proposed attribute in the [NetCDF Best Practices](http://www.unidata.ucar.edu/software/netcdf/docs/BestPractices.html
), and because "xarray wants the `Variable` dtype to be the same as the dtype of the data returned."

Taking the above as a given, it is necessary for xarray users encountering such variables to do the following after reading the data:

```dtype = data.encoding['dtype'].str.replace('i', 'u')
scale_factor = data.encoding['scale_factor']
add_offset = data.encoding['add_offset']
unscale = ((data - add_offset)/scale_factor).data.astype(dtype).astype('float64')
fixed = unscale * scale_factor + add_offset
```

The un-scaling step can be saved by turning off auto mask and scale.

In order to automate the above process while still being able to use the functionality of `Dataset`, one approach might be to automatically perform the above steps on some known list of variables, and then reassign those variables to the `Dataset`. The downside is the need to read all variables up front, which could be expensive when processing large datasets where not all variables are needed. 

Is there another approach that would preserve lazy data loading, for instance by providing pre/post hooks for transformation functions at the  `__getitem__` stage? Is there something I could do to help document that as a best practice?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Best practice when the _Unsigned attribute is present in NetCDF files #1444

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Best practice when the _Unsigned attribute is present in NetCDF files #1444

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions