Skip to content

Allows numpy.bool_ attrs when writing with h5netcdf engineΒ #4981

Closed
@caenrigen

Description

@caenrigen

What happened:

Round trip dataset using h5netcdf + invalid_netcdf=True fails for bool attribute due to lack of support in xarray for numpy.bool_ (note that is not the same as numpy.bool)

What you expected to happen:

Attributes of numpy.bool_ type should be supported because the h5netcdf has no issue with it.

Minimal Complete Verifiable Example:

import h5netcdf
import numpy as np
import xarray as xr

dset = xr.Dataset(
    coords={"x0": np.array([1, 2, 3])},
    data_vars={"y0": ("x0", np.array([5, 6, 7]))}
)
dset.attrs["my_bool_attr"] = True
print(type(dset.attrs["my_bool_attr"]))

print("\n", dset)
<class 'bool'>

 <xarray.Dataset>
Dimensions:  (x0: 3)
Coordinates:
  * x0       (x0) int64 1 2 3
Data variables:
    y0       (x0) int64 5 6 7
Attributes:
    my_bool_attr:  True
file_name = "my_dset.nc"
dset.to_netcdf(file_name, engine="h5netcdf", invalid_netcdf=True)

loaded_dset = xr.load_dataset(file_name, engine="h5netcdf")
print(type(loaded_dset.attrs["my_bool_attr"]))

print("\n", loaded_dset)

# This should be working
file_name2 = "file_numpy_bool.nc"
loaded_dset.to_netcdf(file_name2, engine="h5netcdf", invalid_netcdf=True)
loaded_dset2 = xr.load_dataset(file_name2, engine="h5netcdf")

print("\n", loaded_dset)

Raises:

<class 'numpy.bool_'>

 <xarray.Dataset>
Dimensions:  (x0: 3)
Coordinates:
  * x0       (x0) int64 1 2 3
Data variables:
    y0       (x0) int64 5 6 7
Attributes:
    my_bool_attr:  True
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-f74d2909a4a1> in <module>
      6 # This should be working
      7 file_name2 = "file_numpy_bool.nc"
----> 8 loaded_dset.to_netcdf(file_name2, engine="h5netcdf", invalid_netcdf=True)
      9 loaded_dset2 = xr.load_dataset(file_name2, engine="h5netcdf")
     10 

/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1697             unlimited_dims=unlimited_dims,
   1698             compute=compute,
-> 1699             invalid_netcdf=invalid_netcdf,
   1700         )
   1701 

/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1057     # validate Dataset keys, DataArray names, and attr keys/values
   1058     _validate_dataset_names(dataset)
-> 1059     _validate_attrs(dataset)
   1060 
   1061     try:

/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/backends/api.py in _validate_attrs(dataset)
    229     # Check attrs on the dataset itself
    230     for k, v in dataset.attrs.items():
--> 231         check_attr(k, v)
    232 
    233     # Check attrs on each variable within the dataset

/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/backends/api.py in check_attr(name, value)
    221         if not isinstance(value, (str, Number, np.ndarray, np.number, list, tuple)):
    222             raise TypeError(
--> 223                 f"Invalid value for attr {name!r}: {value!r} must be a number, "
    224                 "a string, an ndarray or a list/tuple of "
    225                 "numbers/strings for serialization to netCDF "

TypeError: Invalid value for attr 'my_bool_attr': True must be a number, a string, an ndarray or a list/tuple of numbers/strings for serialization to netCDF files

Anything else we need to know?:

Xarray is awesome!! Thank you, guys! ❀

We are adopting it fully for a quantum computing experimental framework (Quantify). This issue is very critical for us at the moment 😭

For reference, the following works in h5netcdf:

with h5netcdf.File('mydata.nc', 'w', invalid_netcdf=True) as f:
    f.dimensions = {'x': 5}
    v = f.create_variable('hello', ('x',), float)
    v.attrs['foo'] = True
    print(v.attrs)
    print(type(v.attrs["foo"]))
    
with h5netcdf.File('mydata.nc', 'r') as file:
    v = file['hello']
    print(v.attrs)
    print(type(v.attrs["foo"]))
    
with h5netcdf.File('mydata.nc', 'w', invalid_netcdf=True) as f:
    f.dimensions = {'x': 5}
    v = f.create_variable('hello', ('x',), float)
    v.attrs['foo'] = np.bool_(True)
    print(v.attrs)
    print(type(v.attrs["foo"]))
<class 'h5netcdf.attrs.Attributes'>
foo: True
<class 'numpy.bool_'>
<class 'h5netcdf.attrs.Attributes'>
foo: True
<class 'numpy.bool_'>
<class 'h5netcdf.attrs.Attributes'>
foo: True
<class 'numpy.bool_'>

How to fix:

In /xarray/backends/api.py:221:

        if not isinstance(value, (str, Number, np.ndarray, np.number, list, tuple)):
            raise TypeError(
                f"Invalid value for attr {name!r}: {value!r} must be a number, "
                "a string, an ndarray or a list/tuple of "
                "numbers/strings for serialization to netCDF "
                "files"
            )

add np.bool_:

        if not isinstance(value, (str, Number, np.ndarray, np.number, np.bool_, list, tuple)):
            raise TypeError(
                f"Invalid value for attr {name!r}: {value!r} must be a number, "
                "a string, an ndarray or a list/tuple of "
                "numbers/strings for serialization to netCDF "
                "files"
            )

I did a quick test (xarray-0.17.1.dev3+g48378c4b) and it seems to work, but the tests (without any changes to the code) fail on my local clone of the repo so it would be more difficult to go through a PR for this.

Considering that it is a single line of code, could this be deployed relatively easily into the master branch by some maintainer? That would be highly appreciated! πŸ₯Ί

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:05:27)
[Clang 9.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 18.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: None

xarray: 0.17.0
pandas: 1.0.1
numpy: 1.19.3
scipy: 1.5.4
netCDF4: None
pydap: None
h5netcdf: 0.10.0
h5py: 2.10.0
Nio: None
zarr: 2.6.1
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.1.3
cartopy: None
seaborn: 0.11.0
numbagg: None
pint: None
setuptools: 45.2.0.post20200209
pip: 20.0.2
conda: None
pytest: 5.3.5
IPython: 7.12.0
sphinx: 3.2.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions