Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zarr does not like attrs added by intake-esm #324

Closed
jbusecke opened this issue Feb 18, 2021 · 2 comments · Fixed by #509
Closed

zarr does not like attrs added by intake-esm #324

jbusecke opened this issue Feb 18, 2021 · 2 comments · Fixed by #509
Labels
bug Issues that present a reasonable conviction there is a reproducible bug. good first issue Good for newcomers

Comments

@jbusecke
Copy link
Contributor

Intake-esm adds the attribute intake_esm_varname to datasets, and I have encountered cases where that ends up being None (still looking for the exact model).

Zarr does not like that type of metadata:

import xarray as xr
ds_test = xr.DataArray(5).to_dataset(name='test')
ds_test.attrs['test'] = None

ds_test.to_zarr('test.zarr')

gives

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-54ff2a1799bc> in <module>
      3 ds_test.attrs['test'] = None
      4 
----> 5 ds_test.to_zarr('test.zarr')

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region)
   1743             encoding = {}
   1744 
-> 1745         return to_zarr(
   1746             self,
   1747             store=store,

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region)
   1453     # validate Dataset keys, DataArray names, and attr keys/values
   1454     _validate_dataset_names(dataset)
-> 1455     _validate_attrs(dataset)
   1456 
   1457     if mode == "a":

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/backends/api.py in _validate_attrs(dataset)
    237     # Check attrs on the dataset itself
    238     for k, v in dataset.attrs.items():
--> 239         check_attr(k, v)
    240 
    241     # Check attrs on each variable within the dataset

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/backends/api.py in check_attr(name, value)
    228 
    229         if not isinstance(value, (str, Number, np.ndarray, np.number, list, tuple)):
--> 230             raise TypeError(
    231                 f"Invalid value for attr {name!r}: {value!r} must be a number, "
    232                 "a string, an ndarray or a list/tuple of "

TypeError: Invalid value for attr 'test': None must be a number, a string, an ndarray or a list/tuple of numbers/strings for serialization to netCDF files

Should the attribute be set to a string 'none' instead?

@andersy005 andersy005 added the bug Issues that present a reasonable conviction there is a reproducible bug. label Feb 18, 2021
@andersy005
Copy link
Member

Good catch. We could add some validators or just remove the custom metadata once were are done with assembling the datasets of interest.

@andersy005
Copy link
Member

The easiest solution is to remove the custom metadata info used by intake-esm. These can be removed right before returning the dataset in these two locations:

)
ds.attrs['intake_esm_dataset_key'] = self.key
self._ds = ds
return ds

ds.attrs['intake_esm_dataset_key'] = self.key
self._ds = ds
return ds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues that present a reasonable conviction there is a reproducible bug. good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants