Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Future time decoding #5050

Merged
merged 4 commits into from
Mar 18, 2021

Conversation

znicholls
Copy link
Contributor

@znicholls znicholls commented Mar 18, 2021

  • Tests added
  • Passes pre-commit run --all-files
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

This identifies (and hopefully soon fixes) an issue with decoding times that are way into the future. Without re-introducing a change made in #4684, the test fails (see https://github.com/pydata/xarray/pull/5050/checks?check_run_id=2136824754). With the change put back in, the test passes (see https://github.com/pydata/xarray/pull/5050/checks?check_run_id=2136892412).

@spencerkclark can I ask for your help on this one please? It seems that the removal in #4684 caused this case (which wasn't tested) to break in v0.17 and it's unclear to me where the fix should be (I'm guessing pandas from the discussions in #4684...).

@znicholls znicholls changed the title Future time handling BUG: Future time decoding Mar 18, 2021
Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@znicholls oof this was an oversight on my part. I naively took the tests passing and the fact that the upstream issue in pandas had been fixed to mean that we did not need this anymore. As you noted it turns out we still do.

Basically the issue stems from this optimization:

# Cast input ordinals to integers of nanoseconds because pd.to_timedelta
# works much faster when dealing with integers (GH 1399).
flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
np.int64
)

If the encoded values are too large to be represented with a timedelta64[ns] type, this line does not catch it; instead it silently overflows:

In [1]: import xarray as xr; import pandas as pd; import numpy as np

In [2]: num_dates = np.array([164375])

In [3]: units = "days since 1850-01-01 00:00:00"

In [4]: calendar = "proleptic_gregorian"

In [5]: (num_dates * xr.coding.times._NS_PER_TIME_DELTA["D"]).astype(np.int64)
Out[5]: array([-4244744073709551616])

So this is indeed the correct place to fix this (since it has to do with our bespoke decoding logic and not to do with pandas). If you can just add a what's new entry, I think this is ready to go!

xarray/coding/times.py Show resolved Hide resolved
@znicholls znicholls marked this pull request as ready for review March 18, 2021 12:01
@znicholls
Copy link
Contributor Author

If you can just add a what's new entry, I think this is ready to go!

Done, thanks!

doc/whats-new.rst Outdated Show resolved Hide resolved
Co-authored-by: Spencer Clark <spencerkclark@gmail.com>
@spencerkclark
Copy link
Member

spencerkclark commented Mar 18, 2021

In the meantime it occurs to me that if you (or anyone else experiencing this issue) need a workaround, you should be able to set use_cftime=True in calls to open_dataset or decode_cf, and I think things should work (in that case it won't even try decoding the times with pandas).

@spencerkclark spencerkclark merged commit fbd48d4 into pydata:master Mar 18, 2021
@spencerkclark
Copy link
Member

Thanks @znicholls!

dcherian added a commit to dcherian/xarray that referenced this pull request Mar 18, 2021
…indow

* upstream/master:
  Fix regression in decoding large standard calendar times (pydata#5050)
  Fix sticky sidebar responsiveness on small screens (pydata#5039)
  Flexible indexes refactoring notes (pydata#4979)
  add a install xarray step to the upstream-dev CI (pydata#5044)
  Adds Dataset.query() method, analogous to pandas DataFrame.query() (pydata#4984)
  run tests on python 3.9 (pydata#5040)
  Add date attribute to datetime accessor (pydata#4994)
  📚 New theme & rearrangement of the docs (pydata#4835)
  upgrade ci-trigger to the most recent version (pydata#5037)
  GH5005 fix documentation on open_rasterio (pydata#5021)
  GHA for automatically canceling previous CI runs (pydata#5025)
  Implement GroupBy.__getitem__ (pydata#3691)
  conventions: decode unsigned integers to signed if _Unsigned=false (pydata#4966)
  Added support for numpy.bool_ (pydata#4986)
  Add additional str accessor methods for DataArray (pydata#4622)
dcherian added a commit to dcherian/xarray that referenced this pull request Mar 23, 2021
…-tasks

* upstream/master:
  Fix regression in decoding large standard calendar times (pydata#5050)
  Fix sticky sidebar responsiveness on small screens (pydata#5039)
  Flexible indexes refactoring notes (pydata#4979)
  add a install xarray step to the upstream-dev CI (pydata#5044)
  Adds Dataset.query() method, analogous to pandas DataFrame.query() (pydata#4984)
  run tests on python 3.9 (pydata#5040)
  Add date attribute to datetime accessor (pydata#4994)
  📚 New theme & rearrangement of the docs (pydata#4835)
  upgrade ci-trigger to the most recent version (pydata#5037)
  GH5005 fix documentation on open_rasterio (pydata#5021)
  GHA for automatically canceling previous CI runs (pydata#5025)
  Implement GroupBy.__getitem__ (pydata#3691)
  conventions: decode unsigned integers to signed if _Unsigned=false (pydata#4966)
  Added support for numpy.bool_ (pydata#4986)
  Add additional str accessor methods for DataArray (pydata#4622)
  add polyval to polyfit see also (pydata#5020)
  mention map_blocks in the docstring of apply_ufunc (pydata#5011)
  Switch backend API to v2 (pydata#4989)
  WIP: add new backend api documentation (pydata#4810)
  pin netCDF4=1.5.3 in min-all-deps (pydata#4982)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants