Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allows numpy.bool_ attrs when writing with h5netcdf engine #4981

Closed
caenrigen opened this issue Mar 1, 2021 · 5 comments · Fixed by #4986
Closed

Allows numpy.bool_ attrs when writing with h5netcdf engine #4981

caenrigen opened this issue Mar 1, 2021 · 5 comments · Fixed by #4986

Comments

@caenrigen
Copy link
Contributor

What happened:

Round trip dataset using h5netcdf + invalid_netcdf=True fails for bool attribute due to lack of support in xarray for numpy.bool_ (note that is not the same as numpy.bool)

What you expected to happen:

Attributes of numpy.bool_ type should be supported because the h5netcdf has no issue with it.

Minimal Complete Verifiable Example:

import h5netcdf
import numpy as np
import xarray as xr

dset = xr.Dataset(
    coords={"x0": np.array([1, 2, 3])},
    data_vars={"y0": ("x0", np.array([5, 6, 7]))}
)
dset.attrs["my_bool_attr"] = True
print(type(dset.attrs["my_bool_attr"]))

print("\n", dset)
<class 'bool'>

 <xarray.Dataset>
Dimensions:  (x0: 3)
Coordinates:
  * x0       (x0) int64 1 2 3
Data variables:
    y0       (x0) int64 5 6 7
Attributes:
    my_bool_attr:  True
file_name = "my_dset.nc"
dset.to_netcdf(file_name, engine="h5netcdf", invalid_netcdf=True)

loaded_dset = xr.load_dataset(file_name, engine="h5netcdf")
print(type(loaded_dset.attrs["my_bool_attr"]))

print("\n", loaded_dset)

# This should be working
file_name2 = "file_numpy_bool.nc"
loaded_dset.to_netcdf(file_name2, engine="h5netcdf", invalid_netcdf=True)
loaded_dset2 = xr.load_dataset(file_name2, engine="h5netcdf")

print("\n", loaded_dset)

Raises:

<class 'numpy.bool_'>

 <xarray.Dataset>
Dimensions:  (x0: 3)
Coordinates:
  * x0       (x0) int64 1 2 3
Data variables:
    y0       (x0) int64 5 6 7
Attributes:
    my_bool_attr:  True
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-f74d2909a4a1> in <module>
      6 # This should be working
      7 file_name2 = "file_numpy_bool.nc"
----> 8 loaded_dset.to_netcdf(file_name2, engine="h5netcdf", invalid_netcdf=True)
      9 loaded_dset2 = xr.load_dataset(file_name2, engine="h5netcdf")
     10 

/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1697             unlimited_dims=unlimited_dims,
   1698             compute=compute,
-> 1699             invalid_netcdf=invalid_netcdf,
   1700         )
   1701 

/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1057     # validate Dataset keys, DataArray names, and attr keys/values
   1058     _validate_dataset_names(dataset)
-> 1059     _validate_attrs(dataset)
   1060 
   1061     try:

/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/backends/api.py in _validate_attrs(dataset)
    229     # Check attrs on the dataset itself
    230     for k, v in dataset.attrs.items():
--> 231         check_attr(k, v)
    232 
    233     # Check attrs on each variable within the dataset

/usr/local/anaconda3/envs/dclab/lib/python3.7/site-packages/xarray/backends/api.py in check_attr(name, value)
    221         if not isinstance(value, (str, Number, np.ndarray, np.number, list, tuple)):
    222             raise TypeError(
--> 223                 f"Invalid value for attr {name!r}: {value!r} must be a number, "
    224                 "a string, an ndarray or a list/tuple of "
    225                 "numbers/strings for serialization to netCDF "

TypeError: Invalid value for attr 'my_bool_attr': True must be a number, a string, an ndarray or a list/tuple of numbers/strings for serialization to netCDF files

Anything else we need to know?:

Xarray is awesome!! Thank you, guys! ❤

We are adopting it fully for a quantum computing experimental framework (Quantify). This issue is very critical for us at the moment 😭

For reference, the following works in h5netcdf:

with h5netcdf.File('mydata.nc', 'w', invalid_netcdf=True) as f:
    f.dimensions = {'x': 5}
    v = f.create_variable('hello', ('x',), float)
    v.attrs['foo'] = True
    print(v.attrs)
    print(type(v.attrs["foo"]))
    
with h5netcdf.File('mydata.nc', 'r') as file:
    v = file['hello']
    print(v.attrs)
    print(type(v.attrs["foo"]))
    
with h5netcdf.File('mydata.nc', 'w', invalid_netcdf=True) as f:
    f.dimensions = {'x': 5}
    v = f.create_variable('hello', ('x',), float)
    v.attrs['foo'] = np.bool_(True)
    print(v.attrs)
    print(type(v.attrs["foo"]))
<class 'h5netcdf.attrs.Attributes'>
foo: True
<class 'numpy.bool_'>
<class 'h5netcdf.attrs.Attributes'>
foo: True
<class 'numpy.bool_'>
<class 'h5netcdf.attrs.Attributes'>
foo: True
<class 'numpy.bool_'>

How to fix:

In /xarray/backends/api.py:221:

        if not isinstance(value, (str, Number, np.ndarray, np.number, list, tuple)):
            raise TypeError(
                f"Invalid value for attr {name!r}: {value!r} must be a number, "
                "a string, an ndarray or a list/tuple of "
                "numbers/strings for serialization to netCDF "
                "files"
            )

add np.bool_:

        if not isinstance(value, (str, Number, np.ndarray, np.number, np.bool_, list, tuple)):
            raise TypeError(
                f"Invalid value for attr {name!r}: {value!r} must be a number, "
                "a string, an ndarray or a list/tuple of "
                "numbers/strings for serialization to netCDF "
                "files"
            )

I did a quick test (xarray-0.17.1.dev3+g48378c4b) and it seems to work, but the tests (without any changes to the code) fail on my local clone of the repo so it would be more difficult to go through a PR for this.

Considering that it is a single line of code, could this be deployed relatively easily into the master branch by some maintainer? That would be highly appreciated! 🥺

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:05:27)
[Clang 9.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 18.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: None

xarray: 0.17.0
pandas: 1.0.1
numpy: 1.19.3
scipy: 1.5.4
netCDF4: None
pydap: None
h5netcdf: 0.10.0
h5py: 2.10.0
Nio: None
zarr: 2.6.1
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.1.3
cartopy: None
seaborn: 0.11.0
numbagg: None
pint: None
setuptools: 45.2.0.post20200209
pip: 20.0.2
conda: None
pytest: 5.3.5
IPython: 7.12.0
sphinx: 3.2.1

@dcherian
Copy link
Contributor

dcherian commented Mar 1, 2021

but the tests (without any changes to the code) fail on my local clone of the repo so it would be more difficult to go through a PR for this.

Can you post an issue about that with an error message?

You can always make your change, and open a PR, and have the CI check it :) . PRs are very welcome!

@shoyer
Copy link
Member

shoyer commented Mar 2, 2021

Very cool to hear you're using xarray in your quantum computing framework!

This sounds like a reasonable change to me. A PR would indeed be very welcome!

@caenrigen
Copy link
Contributor Author

but the tests (without any changes to the code) fail on my local clone of the repo so it would be more difficult to go through a PR for this.

Can you post an issue about that with an error message?

You can always make your change, and open a PR, and have the CI check it :) . PRs are very welcome!

Done #4985

@caenrigen
Copy link
Contributor Author

caenrigen commented Mar 2, 2021

Very cool to hear you're using xarray in your quantum computing framework!

This sounds like a reasonable change to me. A PR would indeed be very welcome!

PR created #4986

@caenrigen
Copy link
Contributor Author

Hey, guys!!

Does any of you have backends expertise? A review of PR #4986 from someone with knowledge on the topic would be appreciated so much 🥺 One collaborator already approves the hotfix but is waiting for an additional approval

Kind regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants