Description
What happened:
Round trip dataset using h5netcdf
+ invalid_netcdf=True
fails for bool attribute due to lack of support in xarray for numpy.bool_
(note that is not the same as numpy.bool
)
What you expected to happen:
Attributes of numpy.bool_
type should be supported because the h5netcdf
has no issue with it.
Minimal Complete Verifiable Example:
import h5netcdf
import numpy as np
import xarray as xr
dset = xr.Dataset(
coords={"x0": np.array([1, 2, 3])},
data_vars={"y0": ("x0", np.array([5, 6, 7]))}
)
dset.attrs["my_bool_attr"] = True
print(type(dset.attrs["my_bool_attr"]))
print("\n", dset)
<class 'bool'>
<xarray.Dataset>
Dimensions: (x0: 3)
Coordinates:
* x0 (x0) int64 1 2 3
Data variables:
y0 (x0) int64 5 6 7
Attributes:
my_bool_attr: True
file_name = "my_dset.nc"
dset.to_netcdf(file_name, engine="h5netcdf", invalid_netcdf=True)
loaded_dset = xr.load_dataset(file_name, engine="h5netcdf")
print(type(loaded_dset.attrs["my_bool_attr"]))
print("\n", loaded_dset)
# This should be working
file_name2 = "file_numpy_bool.nc"
loaded_dset.to_netcdf(file_name2, engine="h5netcdf", invalid_netcdf=True)
loaded_dset2 = xr.load_dataset(file_name2, engine="h5netcdf")
print("\n", loaded_dset)
Raises:
<class 'numpy.bool_'>
<xarray.Dataset>
Dimensions: (x0: 3)
Coordinates:
* x0 (x0) int64 1 2 3
Data variables:
y0 (x0) int64 5 6 7
Attributes:
my_bool_attr: True
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-f74d2909a4a1> in <module>
6 # This should be working
7 file_name2 = "file_numpy_bool.nc"
----> 8 loaded_dset.to_netcdf(file_name2, engine="h5netcdf", invalid_netcdf=True)
9 loaded_dset2 = xr.load_dataset(file_name2, engine="h5netcdf")
10
/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
1697 unlimited_dims=unlimited_dims,
1698 compute=compute,
-> 1699 invalid_netcdf=invalid_netcdf,
1700 )
1701
/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
1057 # validate Dataset keys, DataArray names, and attr keys/values
1058 _validate_dataset_names(dataset)
-> 1059 _validate_attrs(dataset)
1060
1061 try:
/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/backends/api.py in _validate_attrs(dataset)
229 # Check attrs on the dataset itself
230 for k, v in dataset.attrs.items():
--> 231 check_attr(k, v)
232
233 # Check attrs on each variable within the dataset
/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/backends/api.py in check_attr(name, value)
221 if not isinstance(value, (str, Number, np.ndarray, np.number, list, tuple)):
222 raise TypeError(
--> 223 f"Invalid value for attr {name!r}: {value!r} must be a number, "
224 "a string, an ndarray or a list/tuple of "
225 "numbers/strings for serialization to netCDF "
TypeError: Invalid value for attr 'my_bool_attr': True must be a number, a string, an ndarray or a list/tuple of numbers/strings for serialization to netCDF files
Anything else we need to know?:
Xarray is awesome!! Thank you, guys! β€
We are adopting it fully for a quantum computing experimental framework (Quantify). This issue is very critical for us at the moment π
For reference, the following works in h5netcdf
:
with h5netcdf.File('mydata.nc', 'w', invalid_netcdf=True) as f:
f.dimensions = {'x': 5}
v = f.create_variable('hello', ('x',), float)
v.attrs['foo'] = True
print(v.attrs)
print(type(v.attrs["foo"]))
with h5netcdf.File('mydata.nc', 'r') as file:
v = file['hello']
print(v.attrs)
print(type(v.attrs["foo"]))
with h5netcdf.File('mydata.nc', 'w', invalid_netcdf=True) as f:
f.dimensions = {'x': 5}
v = f.create_variable('hello', ('x',), float)
v.attrs['foo'] = np.bool_(True)
print(v.attrs)
print(type(v.attrs["foo"]))
<class 'h5netcdf.attrs.Attributes'>
foo: True
<class 'numpy.bool_'>
<class 'h5netcdf.attrs.Attributes'>
foo: True
<class 'numpy.bool_'>
<class 'h5netcdf.attrs.Attributes'>
foo: True
<class 'numpy.bool_'>
How to fix:
In /xarray/backends/api.py:221
:
if not isinstance(value, (str, Number, np.ndarray, np.number, list, tuple)):
raise TypeError(
f"Invalid value for attr {name!r}: {value!r} must be a number, "
"a string, an ndarray or a list/tuple of "
"numbers/strings for serialization to netCDF "
"files"
)
add np.bool_
:
if not isinstance(value, (str, Number, np.ndarray, np.number, np.bool_, list, tuple)):
raise TypeError(
f"Invalid value for attr {name!r}: {value!r} must be a number, "
"a string, an ndarray or a list/tuple of "
"numbers/strings for serialization to netCDF "
"files"
)
I did a quick test (xarray-0.17.1.dev3+g48378c4b
) and it seems to work, but the tests (without any changes to the code) fail on my local clone of the repo so it would be more difficult to go through a PR for this.
Considering that it is a single line of code, could this be deployed relatively easily into the master branch by some maintainer? That would be highly appreciated! π₯Ί
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:05:27)
[Clang 9.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 18.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: None
xarray: 0.17.0
pandas: 1.0.1
numpy: 1.19.3
scipy: 1.5.4
netCDF4: None
pydap: None
h5netcdf: 0.10.0
h5py: 2.10.0
Nio: None
zarr: 2.6.1
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.1.3
cartopy: None
seaborn: 0.11.0
numbagg: None
pint: None
setuptools: 45.2.0.post20200209
pip: 20.0.2
conda: None
pytest: 5.3.5
IPython: 7.12.0
sphinx: 3.2.1