Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot plot datetime.date dimension #8866

Closed
4 of 5 tasks
saschahofmann opened this issue Mar 22, 2024 · 9 comments · Fixed by #8873
Closed
4 of 5 tasks

Cannot plot datetime.date dimension #8866

saschahofmann opened this issue Mar 22, 2024 · 9 comments · Fixed by #8873
Labels

Comments

@saschahofmann
Copy link
Contributor

saschahofmann commented Mar 22, 2024

What happened?

I noticed that xarray doesnt support plotting when the x-axis is a datetime.date. In my case, I would like to plot hourly data aggregated by date. I know that in this particular case, I could just use .resample('1D') to achieve the same result and be able to plot it but I am wondering whether xarray shouldn't just also support plotting dates.

I am pretty sure that matplotlib supports date on the x-axis so maybe adding it to an acceptable type in plot/utils.py L675 in _ensure_plottable would already do the trick?

I am happy to look into this if this is a wanted feature.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np
import datetime 
start = datetime.datetime(2024, 1,1)
time = [start + datetime.timedelta(hours=x) for x in range(720)]

data = xr.DataArray(np.random.randn(len(time)), coords=dict(time=('time', time)))
data.groupby('time.date').mean().plot()

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

TypeError: Plotting requires coordinates to be numeric, boolean, or dates of type numpy.datetime64, datetime.datetime, cftime.datetime or pandas.Interval. Received data of type object instead.

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.13 (main, Aug 24 2023, 12:59:26) [Clang 15.0.0 (clang-1500.1.0.2.5)] python-bits: 64 OS: Darwin OS-release: 22.1.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development

xarray: 2023.12.0
pandas: 2.1.4
numpy: 1.26.3
scipy: 1.12.0
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.3.7
dask: 2024.1.1
distributed: None
matplotlib: 3.8.2
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.12.2
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.1.0
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.21.0
sphinx: None

@saschahofmann saschahofmann added bug needs triage Issue that has not been reviewed by xarray team member labels Mar 22, 2024
@Illviljan
Copy link
Contributor

Illviljan commented Mar 22, 2024

Why does the x-axis go from a dtype datetime to an object? Isn't that strange?

import xarray as xr
import numpy as np
import datetime

start = datetime.datetime(2024, 1,1)
time = [start + datetime.timedelta(hours=x) for x in range(720)]

data = xr.DataArray(np.random.randn(len(time)), coords=dict(time=('time', time)))
print(data.time.dtype) # datetime64[ns]

r = data.resample({"time":"1D"}).mean()
print(r.time.dtype) # datetime64[ns]

g = data.groupby('time.date').mean()
print(g.date.dtype) # object

@saschahofmann
Copy link
Contributor Author

I am not sure how xarray handles lists of datetime.datetime but it seems like they are automatically transformed to numpy's datetime64 dtype (I guess there is some pandas magic for that?).

For date's that's not happening e.g.

start = datetime.date(2024, 1,1)
time = [start + datetime.timedelta(days=x) for x in range(30)]
xr.DataArray(np.random.randn(len(time)), coords=dict(time=('time', time))).time.dtype

also returns object .

It would be nice if it was transformed to a datetime[D] but that sounds like a much bigger change?

@kmuehlbauer
Copy link
Contributor

It would be nice if it was transformed to a datetime[D] but that sounds like a much bigger change?

FYI: #7493

@saschahofmann
Copy link
Contributor Author

saschahofmann commented Mar 25, 2024

Alright based on that issue, I gather that making it work with through datetime64 is a huge change. I took another better look at plot/utils.py and think its not necessary.

As mentioned above, I believe just adding datetime.date to other_types in _ensure_plottable could already solve this specific problem?

It calls _valid_other_type which is just return all(isinstance(el, types) for el in np.ravel(x)). It checks every element of the array so the object dtype doesn't matter.

@kmuehlbauer
Copy link
Contributor

kmuehlbauer commented Mar 25, 2024

From what I can tell, datetime.date are not handled at all by xarray. It's just wrapped as numpy 'O' type.

This happens at Variable initialization:

if isinstance(data, np.ndarray) and data.dtype.kind in "OMm":
data = _possibly_convert_objects(data)

Here it checks for O-type and np.datetime64/np.timedelta64 and calls _possibly_convert_objects. But in that function only datetime.datetime and datetime.timedelta objects are converted to nanosecond precision. datetime.date is kept as numpy O.

def _possibly_convert_objects(values):
"""Convert arrays of datetime.datetime and datetime.timedelta objects into
datetime64 and timedelta64, according to the pandas convention. For the time
being, convert any non-nanosecond precision DatetimeIndex or TimedeltaIndex
objects to nanosecond precision. While pandas is relaxing this in version
2.0.0, in xarray we will need to make sure we are ready to handle
non-nanosecond precision datetimes or timedeltas in our code before allowing
such values to pass through unchanged. Converting to nanosecond precision
through pandas.Series objects ensures that datetimes and timedeltas are
within the valid date range for ns precision, as pandas will raise an error
if they are not.
"""
as_series = pd.Series(values.ravel(), copy=False)
if as_series.dtype.kind in "mM":
as_series = _as_nanosecond_precision(as_series)
result = np.asarray(as_series).reshape(values.shape)
if not result.flags.writeable:
# GH8843, pandas copy-on-write mode creates read-only arrays by default
try:
result.flags.writeable = True
except ValueError:
result = result.copy()
return result

Q1: Should xarray convert datetime.date to np.datetime64[ns]? If answered with yes, this should fix this issue immediately.
Q2: Should datetime.date be added to other_types as suggested by @saschahofmann?

Those questions are independent of each other. For Q2 I do not see something blocking this.

@kmuehlbauer kmuehlbauer removed the needs triage Issue that has not been reviewed by xarray team member label Mar 25, 2024
@saschahofmann
Copy link
Contributor Author

In #8873 I made the suggested change and checked that it works as expected. I wonder whether its even necessary to have datetime.datetime in the other_types list since its seems to be transformed to np.datetime64 but I guess its a fallback in case somehow these types creep through. So even if the response to Q1 is yes, it might be worth it to have the same fallback for date?

@dcherian
Copy link
Contributor

dcherian commented Mar 25, 2024

This is the intended return type of .dt.date. We inherit this from pandas: https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.date.html

If you want a datetime64 array .dt.floor("D") should work?

@saschahofmann
Copy link
Contributor Author

saschahofmann commented Mar 25, 2024

@dcherian this assumes that the original dtype is datetime.datetime/np.datetime64. If I create a DataArray using dt.date directly I would probably need to another conversion logic?

In any case, I think the proposed solution to enable plotting datetime.dates seems easy enough, or is there a good argument why not to add the type to other_types in _ensure_plottable?

As I mentioned, earlier in case of groupby there is also the option to use resample instead. But I'd say especially for beginners with xarray the easiest solution is not having them to worry whether they have a datetime.datetime datetime.date or np.datetime64 wherever possible?

@Illviljan
Copy link
Contributor

If this is the intended dtype for this groupby operation then I think we move forward with your suggested solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants