'numpy.datetime64' object has no attribute 'year' writing from grib2 source #130

rsignell-usgs · 2022-02-28T14:50:18Z

I have a consolidated JSON with a "best time series" dataset that reads from a collection of HRRR grib2 files on S3.

The data access and visualization of data in the notebook looks fine, but when I attempt to save a subset (to either zarr or netcdf) I get an error from xarray:

AttributeError: 'numpy.datetime64' object has no attribute 'year'

I'm raising this issue here because I run a nearly identical workflow with NetCDF files, where we also have datetime64 objects for time in xarray, and don't get this problem. Hence I'm thinking it must have something to do with the reading from grib2 files?

This notebook should be reproducible (and doesn't require AWS credentials):
https://nbviewer.org/gist/rsignell-usgs/7d0a76a5f9fbf5781e9e97b66bd72882

The text was updated successfully, but these errors were encountered:

martindurant · 2022-02-28T14:52:59Z

I'll have a look when I have time, but my initial guess is that some attributes are left over from the original cftime encoding that is leaving xarray trying to make a conversion it shouldn't. Can you look at the attributes of the time variable? You might be able to solve by performing a trivial operation (+ timedelta(0)?) before save.

martindurant · 2022-02-28T14:59:44Z

The time column should definitely have been inlined in the references

Bytes | 1.38 kiB | 8 B

Maybe this is the problematic one, not valid_time, the coordinate, so you can remove it?

rsignell-usgs · 2022-02-28T17:18:15Z

I tried removing time and other potential problems by doing:

ds = ds.drop(['time', 'step', 'heightAboveGround']).rename({'valid_time':'time'})

but unfortunately still exact same error (you should be able to run this notebook locally without AWS credentials):

https://nbviewer.org/gist/5d10137059a1793f2606e39a5baf1ba2

martindurant · 2022-02-28T17:19:19Z

Can you also try multiplying by 1.0?

martindurant · 2022-02-28T17:19:33Z

(or subtracting timedelta(0) )

rsignell-usgs · 2022-02-28T21:47:37Z

@martindurant , I'm not sure I understand.
Multiply what by 1.0?
Before the json is created?

The datetime64 values are understood perfectly well in xarray -- the error only appears when persisted.
I don't understand why. Do you?
(should be easy for you to run yourself and check it out)

martindurant · 2022-02-28T21:48:59Z

I don't know why exactly, but I suspect that if you perform an arithmetic operation (like adding a timedelta zero), then any leftover encoding attributes might be usefully forgotten.

rsignell-usgs · 2022-03-01T13:27:17Z

@martindurant I'm still not sure what I'm supposed to multiply by 1.0 and at what point I'm supposed to do it.
Did you get a chance to try running the notebook that displays the error?

And here is the HRRR best time series notebook that created the json

martindurant · 2022-03-01T14:06:59Z

"forbidden" reading the JSON file. Do I need to run this in qhub?

rsignell-usgs · 2022-03-01T14:33:04Z

Grrr.... no, you don't need qhub. My bad. Just use this s_opts instead:

s_opts = {'anon':True, 'skip_instance_cache':True}

martindurant · 2022-03-01T14:57:29Z

ds2 = ds.assign(time=ds.time + pd.Timedelta(0))
ds2.isel(time=slice(-3,-1)).to_zarr('foo.zarr', 'w')

Allows you to write it, but xarray still wants to use some weird cftime encoding (even though the original data did not have that), causing overflow when you try to reload. Do you know how to not use cftime on write, just use actual datetime values? i.e., this is not a kerchunk problem, its an xarray/zarr encoding thing.

rsignell-usgs · 2022-03-01T21:09:47Z

@martindurant the CF-Conventions don't allow for datetime64 or even date strings: you need to supply float or int time values and a units attribute like "hours since 1990-01-01".

The xr.open_dataset() function has an argument we could set to use_cftime=False.

So you don't think this has anything to do with the grib2 codec or reading grib2 files?

keewis · 2022-03-02T12:53:33Z

this completes successfully:

subset = ds.assign_coords(time=ds.time.astype("datetime64[ns]")).isel(time=slice(-3,-1))
for var in subset.variables.values():
    var.encoding.pop("filters", None)
subset.to_zarr('foo.zarr', mode='w')

ds = xr.open_dataset("foo.zarr", engine="zarr").load()

cftime is only involved with dtypes that are not exactly datetime64[ns], so casting time avoids that. filters raised errors mentioning grib when I tried to load the resulting zarr.

rsignell-usgs · 2022-03-02T13:17:17Z

@keewis , whoa, thank you! I never would have figured this out! Is this something that users should handle using this approach or should we handle it in xarray?

martindurant · 2022-03-02T13:43:36Z

xarray ought to be aware that zarr can handle any of the numpy datetime ("M8") types, so no conversion ought to be necessary with the zarr backend. I wonder what those filters contained? Can we make an example of this with a pure xarray dataset?

So from kerckunk/grib's point of view, we should always ensure that we convert to ns resolution where we can. Where we can't (e.g., ice-age data?), we'll have to explicitly call cftime's encode to get the right values and attributes.

keewis · 2022-03-02T14:07:55Z

See pydata/xarray#6318 (comment) for the datetime64 handling

I wonder what those filters contained? Can we make an example of this with a pure xarray dataset?

the repr of those is: [GRIBCodec(var='t2m')]. If I don't remove that, the variable's .zattrs contains a bunch of GRIB_* variables, and the load fails with this:

Can't create file '/tmp/tmpesmcutpjgrib2.923a8.idx'
Traceback (most recent call last):
  File ".../lib/python3.9/site-packages/cfgrib/messages.py", line 261, in itervalues
    yield self.filestream.message_from_file(file, errors=errors)
  File ".../lib/python3.9/site-packages/cfgrib/messages.py", line 328, in message_from_file
    return Message.from_file(file, offset, **kwargs)
  File ".../lib/python3.9/site-packages/cfgrib/messages.py", line 102, in from_file
    raise EOFError("End of file: %r" % file)
EOFError: End of file: <_io.BufferedReader name='/tmp/tmpesmcutpjgrib2'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".../lib/python3.9/site-packages/cfgrib/messages.py", line 523, in from_indexpath_or_filestream
    self = cls.from_fieldset(filestream, index_keys, computed_keys)
  File ".../lib/python3.9/site-packages/cfgrib/messages.py", line 366, in from_fieldset
    return cls.from_fieldset_and_iteritems(fieldset, iteritems, index_keys, computed_keys)
  File ".../lib/python3.9/site-packages/cfgrib/messages.py", line 379, in from_fieldset_and_iteritems
    for field_id, raw_field in iteritems:
  File ".../lib/python3.9/site-packages/cfgrib/messages.py", line 279, in __iter__
    for message in self.itervalues():
  File ".../lib/python3.9/site-packages/cfgrib/messages.py", line 265, in itervalues
    raise EOFError("No valid message found: %r" % self.filestream.path)
EOFError: No valid message found: '/tmp/tmpesmcutpjgrib2'
Can't read index file '/tmp/tmpesmcutpjgrib2.923a8.idx'
Traceback (most recent call last):
  File ".../lib/python3.9/site-packages/cfgrib/messages.py", line 532, in from_indexpath_or_filestream
    index_mtime = os.path.getmtime(indexpath)
  File ".../lib/python3.9/genericpath.py", line 55, in getmtime
    return os.stat(filename).st_mtime
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpesmcutpjgrib2.923a8.idx'

---------------------------------------------------------------------------
EOFError                                  Traceback (most recent call last)
File .../lib/python3.9/site-packages/cfgrib/messages.py:261, in FileStreamItems.itervalues(self)
    260 try:
--> 261     yield self.filestream.message_from_file(file, errors=errors)
    262     valid_message_found = True

File .../lib/python3.9/site-packages/cfgrib/messages.py:328, in FileStream.message_from_file(self, file, offset, **kwargs)
    326 def message_from_file(self, file, offset=None, **kwargs):
    327     # type: (T.IO[bytes], T.Optional[OffsetType], T.Any) -> Message
--> 328     return Message.from_file(file, offset, **kwargs)

File .../lib/python3.9/site-packages/cfgrib/messages.py:102, in Message.from_file(cls, file, offset, **kwargs)
    101 if codes_id is None:
--> 102     raise EOFError("End of file: %r" % file)
    103 return cls(codes_id=codes_id, **kwargs)

EOFError: End of file: <_io.BufferedReader name='/tmp/tmpesmcutpjgrib2'>

During handling of the above exception, another exception occurred:

EOFError                                  Traceback (most recent call last)
Input In [49], in <module>
----> 1 xr.open_dataset("foo.zarr", engine="zarr").load()

File .../xarray/core/dataset.py:869, in Dataset.load(self, **kwargs)
    867 for k, v in self.variables.items():
    868     if k not in lazy_data:
--> 869         v.load()
    871 return self

File .../xarray/core/variable.py:446, in Variable.load(self, **kwargs)
    444     self._data = as_compatible_data(self._data.compute(**kwargs))
    445 elif not is_duck_array(self._data):
--> 446     self._data = np.asarray(self._data)
    447 return self

File .../xarray/core/indexing.py:552, in MemoryCachedArray.__array__(self, dtype)
    551 def __array__(self, dtype=None):
--> 552     self._ensure_cached()
    553     return np.asarray(self.array, dtype=dtype)

File .../xarray/core/indexing.py:549, in MemoryCachedArray._ensure_cached(self)
    547 def _ensure_cached(self):
    548     if not isinstance(self.array, NumpyIndexingAdapter):
--> 549         self.array = NumpyIndexingAdapter(np.asarray(self.array))

File .../xarray/core/indexing.py:522, in CopyOnWriteArray.__array__(self, dtype)
    521 def __array__(self, dtype=None):
--> 522     return np.asarray(self.array, dtype=dtype)

File .../xarray/core/indexing.py:423, in LazilyIndexedArray.__array__(self, dtype)
    421 def __array__(self, dtype=None):
    422     array = as_indexable(self.array)
--> 423     return np.asarray(array[self.key], dtype=None)

File .../xarray/coding/variables.py:70, in _ElementwiseFunctionArray.__array__(self, dtype)
     69 def __array__(self, dtype=None):
---> 70     return self.func(self.array)

File .../xarray/coding/variables.py:137, in _apply_mask(data, encoded_fill_values, decoded_fill_value, dtype)
    133 def _apply_mask(
    134     data: np.ndarray, encoded_fill_values: list, decoded_fill_value: Any, dtype: Any
    135 ) -> np.ndarray:
    136     """Mask all matching values in a NumPy arrays."""
--> 137     data = np.asarray(data, dtype=dtype)
    138     condition = False
    139     for fv in encoded_fill_values:

File .../xarray/core/indexing.py:423, in LazilyIndexedArray.__array__(self, dtype)
    421 def __array__(self, dtype=None):
    422     array = as_indexable(self.array)
--> 423     return np.asarray(array[self.key], dtype=None)

File .../xarray/backends/zarr.py:73, in ZarrArrayWrapper.__getitem__(self, key)
     71 array = self.get_array()
     72 if isinstance(key, indexing.BasicIndexer):
---> 73     return array[key.tuple]
     74 elif isinstance(key, indexing.VectorizedIndexer):
     75     return array.vindex[
     76         indexing._arrayize_vectorized_indexer(key, self.shape).tuple
     77     ]

File .../lib/python3.9/site-packages/zarr/core.py:720, in Array.__getitem__(self, selection)
    718     result = self.vindex[selection]
    719 else:
--> 720     result = self.get_basic_selection(pure_selection, fields=fields)
    721 return result

File .../lib/python3.9/site-packages/zarr/core.py:846, in Array.get_basic_selection(self, selection, out, fields)
    843     return self._get_basic_selection_zd(selection=selection, out=out,
    844                                         fields=fields)
    845 else:
--> 846     return self._get_basic_selection_nd(selection=selection, out=out,
    847                                         fields=fields)

File .../lib/python3.9/site-packages/zarr/core.py:889, in Array._get_basic_selection_nd(self, selection, out, fields)
    883 def _get_basic_selection_nd(self, selection, out=None, fields=None):
    884     # implementation of basic selection for array with at least one dimension
    885 
    886     # setup indexer
    887     indexer = BasicIndexer(selection, self)
--> 889     return self._get_selection(indexer=indexer, out=out, fields=fields)

File .../lib/python3.9/site-packages/zarr/core.py:1179, in Array._get_selection(self, indexer, out, fields)
   1173 if not hasattr(self.chunk_store, "getitems") or \
   1174    any(map(lambda x: x == 0, self.shape)):
   1175     # sequentially get one key at a time from storage
   1176     for chunk_coords, chunk_selection, out_selection in indexer:
   1177 
   1178         # load chunk selection into output array
-> 1179         self._chunk_getitem(chunk_coords, chunk_selection, out, out_selection,
   1180                             drop_axes=indexer.drop_axes, fields=fields)
   1181 else:
   1182     # allow storage to get multiple items at once
   1183     lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)

File .../lib/python3.9/site-packages/zarr/core.py:1883, in Array._chunk_getitem(self, chunk_coords, chunk_selection, out, out_selection, drop_axes, fields)
   1880         out[out_selection] = fill_value
   1882 else:
-> 1883     self._process_chunk(out, cdata, chunk_selection, drop_axes,
   1884                         out_is_ndarray, fields, out_selection)

File .../lib/python3.9/site-packages/zarr/core.py:1826, in Array._process_chunk(self, out, cdata, chunk_selection, drop_axes, out_is_ndarray, fields, out_selection, partial_read_decode)
   1824 except ArrayIndexError:
   1825     cdata = cdata.read_full()
-> 1826 chunk = self._decode_chunk(cdata)
   1828 # select data from chunk
   1829 if fields:

File .../lib/python3.9/site-packages/zarr/core.py:2083, in Array._decode_chunk(self, cdata, start, nitems, expected_shape)
   2081 if self._filters:
   2082     for f in reversed(self._filters):
-> 2083         chunk = f.decode(chunk)
   2085 # view as numpy array with correct dtype
   2086 chunk = ensure_ndarray(chunk)

File .../lib/python3.9/site-packages/kerchunk/grib2.py:182, in GRIBCodec.decode(self, buf, out)
    179 buf.tofile(fn)
    181 # do decode
--> 182 ds = cfgrib.open_file(fn)
    183 data = ds.variables[self.var].data
    184 if hasattr(data, "build_array"):

File .../lib/python3.9/site-packages/cfgrib/dataset.py:762, in open_file(path, grib_errors, indexpath, filter_by_keys, read_keys, time_dims, extra_coords, **kwargs)
    759 stream = messages.FileStream(path, errors=grib_errors)
    761 index_keys = compute_index_keys(time_dims, extra_coords)
--> 762 index = open_fileindex(stream, indexpath, index_keys, filter_by_keys=filter_by_keys)
    764 return open_from_index(index, read_keys, time_dims, extra_coords, **kwargs)

File .../lib/python3.9/site-packages/cfgrib/dataset.py:741, in open_fileindex(stream, indexpath, index_keys, filter_by_keys, computed_keys)
    733 def open_fileindex(
    734     stream: messages.FileStream,
    735     indexpath: str = messages.DEFAULT_INDEXPATH,
   (...)
    738     computed_keys: messages.ComputedKeysType = cfmessage.COMPUTED_KEYS,
    739 ) -> messages.FileIndex:
    740     index_keys = sorted(set(index_keys) | set(filter_by_keys))
--> 741     index = messages.FileIndex.from_indexpath_or_filestream(
    742         stream, index_keys, indexpath=indexpath, computed_keys=computed_keys
    743     )
    744     return index.subindex(filter_by_keys)

File .../lib/python3.9/site-packages/cfgrib/messages.py:549, in FileIndex.from_indexpath_or_filestream(cls, filestream, index_keys, indexpath, computed_keys, log)
    546 except Exception:
    547     log.exception("Can't read index file %r", indexpath)
--> 549 return cls.from_fieldset(filestream, index_keys, computed_keys)

File .../lib/python3.9/site-packages/cfgrib/messages.py:366, in FieldsetIndex.from_fieldset(cls, fieldset, index_keys, computed_keys)
    364 else:
    365     iteritems = enumerate(fieldset)
--> 366 return cls.from_fieldset_and_iteritems(fieldset, iteritems, index_keys, computed_keys)

File .../lib/python3.9/site-packages/cfgrib/messages.py:379, in FieldsetIndex.from_fieldset_and_iteritems(cls, fieldset, iteritems, index_keys, computed_keys)
    377 index_keys = list(index_keys)
    378 header_values_cache = {}  # type: T.Dict[T.Tuple[T.Any, type], T.Any]
--> 379 for field_id, raw_field in iteritems:
    380     field = ComputedKeysAdapter(raw_field, computed_keys)
    381     header_values = []

File .../lib/python3.9/site-packages/cfgrib/messages.py:279, in FileStreamItems.__iter__(self)
    277 old_offset = -1
    278 count = 0
--> 279 for message in self.itervalues():
    280     offset = message.message_get("offset", int)
    281     if offset == old_offset:

File .../lib/python3.9/site-packages/cfgrib/messages.py:265, in FileStreamItems.itervalues(self)
    263 except EOFError:
    264     if not valid_message_found:
--> 265         raise EOFError("No valid message found: %r" % self.filestream.path)
    266     break
    267 except Exception:

EOFError: No valid message found: '/tmp/tmpesmcutpjgrib2'

That might be an issue with my cfgrib installation, though

martindurant · 2022-03-02T14:13:22Z

Oh right, our filter to read GRIB. You would definitely need that to get any useful data out of the variables when reading from the original data, but indeed they make no sense at all for the newly created local zarr dataset. The GRIB_* attributes probably do carry useful information, though. I can't really tell whether popping them out as in the example affects the load or the save.

keewis · 2022-03-02T14:16:00Z

sorry, my bad, the attrs have nothing to do with this, and they are not affected by the pop, either.

Edit: for reference, it's .zarray that contains the filters:

    "filters": [
        {
            "id": "grib",
            "var": "t2m"
        }
    ],

rsignell-usgs · 2022-03-02T15:04:27Z

@martindurant , so you intend to remove these filters as part of kerchunk, right?

martindurant · 2022-03-02T15:29:49Z

No, the filter is essential to read the GRIB files, but it makes no sense when writing out to zarr. xarray should have a way to write out without using the original filters; it's like saying "just because these data came from some special encoding, when I write them again, I want to use the same encoding". We do not, however, have a way to re-encode into GRIB-in-zarr, so xarray should allow us not to do this. Maybe there is such an option, the equivalent of pop for the filters, above?

keewis · 2022-03-03T13:00:49Z

I brought this up in the meeting yesterday. In short, removing or overriding the filters explicitly, or clearing encoding entirely are the only options we have at the moment. See also pydata/xarray#6323.

rsignell-usgs · 2022-03-07T14:55:29Z

@martindurant , in light of pydata/xarray#6318 (comment), do you think it would be best practice to not use datetime64 in our kerchunked zarr representation, even though it's allowed?

martindurant · 2022-03-07T15:20:21Z

cftime encoding of timestamps cannot be the thing we rely on, since the datasets we are considering go well beyond climate/met and indeed the zarr virtual datasets are not necessarily to be read by xarray

martindurant · 2022-03-07T15:21:46Z

So in the current version of combine2 IF times were read with cftime (and you need to explicitly specify this), then they will be saved with cftime (default) or datetime64 (optional).

rsignell-usgs · 2022-03-07T15:34:25Z

nice.

martindurant · 2022-03-07T15:40:50Z

Should we merge combine2?

It needs a lot of docs to explain how to use it - beyond the long docstring; and updating the existing examples to use it.

rsignell-usgs · 2022-03-07T16:08:54Z

okay, maybe I can get someone to work on that.

rsignell-usgs mentioned this issue Mar 2, 2022

'numpy.datetime64' object has no attribute 'year' when writing to zarr or netcdf pydata/xarray#6318

Open

keewis mentioned this issue Mar 3, 2022

propagation of encoding pydata/xarray#6323

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'numpy.datetime64' object has no attribute 'year' writing from grib2 source #130

'numpy.datetime64' object has no attribute 'year' writing from grib2 source #130

rsignell-usgs commented Feb 28, 2022

martindurant commented Feb 28, 2022

martindurant commented Feb 28, 2022

rsignell-usgs commented Feb 28, 2022 •

edited

Loading

martindurant commented Feb 28, 2022

martindurant commented Feb 28, 2022

rsignell-usgs commented Feb 28, 2022 •

edited

Loading

martindurant commented Feb 28, 2022

rsignell-usgs commented Mar 1, 2022 •

edited

Loading

martindurant commented Mar 1, 2022

rsignell-usgs commented Mar 1, 2022 •

edited

Loading

martindurant commented Mar 1, 2022

rsignell-usgs commented Mar 1, 2022 •

edited

Loading

keewis commented Mar 2, 2022 •

edited

Loading

rsignell-usgs commented Mar 2, 2022

martindurant commented Mar 2, 2022

keewis commented Mar 2, 2022

martindurant commented Mar 2, 2022

keewis commented Mar 2, 2022 •

edited

Loading

rsignell-usgs commented Mar 2, 2022

martindurant commented Mar 2, 2022

keewis commented Mar 3, 2022

rsignell-usgs commented Mar 7, 2022

martindurant commented Mar 7, 2022

martindurant commented Mar 7, 2022

rsignell-usgs commented Mar 7, 2022 •

edited

Loading

martindurant commented Mar 7, 2022

rsignell-usgs commented Mar 7, 2022

'numpy.datetime64' object has no attribute 'year' writing from grib2 source #130

'numpy.datetime64' object has no attribute 'year' writing from grib2 source #130

Comments

rsignell-usgs commented Feb 28, 2022

martindurant commented Feb 28, 2022

martindurant commented Feb 28, 2022

rsignell-usgs commented Feb 28, 2022 • edited Loading

martindurant commented Feb 28, 2022

martindurant commented Feb 28, 2022

rsignell-usgs commented Feb 28, 2022 • edited Loading

martindurant commented Feb 28, 2022

rsignell-usgs commented Mar 1, 2022 • edited Loading

martindurant commented Mar 1, 2022

rsignell-usgs commented Mar 1, 2022 • edited Loading

martindurant commented Mar 1, 2022

rsignell-usgs commented Mar 1, 2022 • edited Loading

keewis commented Mar 2, 2022 • edited Loading

rsignell-usgs commented Mar 2, 2022

martindurant commented Mar 2, 2022

keewis commented Mar 2, 2022

martindurant commented Mar 2, 2022

keewis commented Mar 2, 2022 • edited Loading

rsignell-usgs commented Mar 2, 2022

martindurant commented Mar 2, 2022

keewis commented Mar 3, 2022

rsignell-usgs commented Mar 7, 2022

martindurant commented Mar 7, 2022

martindurant commented Mar 7, 2022

rsignell-usgs commented Mar 7, 2022 • edited Loading

martindurant commented Mar 7, 2022

rsignell-usgs commented Mar 7, 2022

rsignell-usgs commented Feb 28, 2022 •

edited

Loading

rsignell-usgs commented Feb 28, 2022 •

edited

Loading

rsignell-usgs commented Mar 1, 2022 •

edited

Loading

rsignell-usgs commented Mar 1, 2022 •

edited

Loading

rsignell-usgs commented Mar 1, 2022 •

edited

Loading

keewis commented Mar 2, 2022 •

edited

Loading

keewis commented Mar 2, 2022 •

edited

Loading

rsignell-usgs commented Mar 7, 2022 •

edited

Loading