Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split out CFDatetimeCoder, deprecate use_cftime as kwarg #9901

Merged
merged 19 commits into from
Jan 4, 2025

Conversation

kmuehlbauer
Copy link
Contributor

@kmuehlbauer kmuehlbauer commented Dec 17, 2024

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

This makes something like this possible:

import xarray as xr
from xarray.coders import CFDatetimeCoder
# available with this PR
ds = xr.open_mfdataset(..., decode_times=CFDatetimeCoder(use_cftime=True))
# available with #9618
ds = xr.open_mfdataset(..., decode_times=CFDatetimeCoder(time_unit="s"))

@kmuehlbauer kmuehlbauer marked this pull request as ready for review December 17, 2024 15:54
doc/whats-new.rst Outdated Show resolved Hide resolved
doc/whats-new.rst Outdated Show resolved Hide resolved
xarray/backends/api.py Outdated Show resolved Hide resolved
xarray/backends/api.py Outdated Show resolved Hide resolved
xarray/conventions.py Outdated Show resolved Hide resolved
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this but since this is adding public API, can you open an issue and see if everyone is on board please

doc/whats-new.rst Outdated Show resolved Hide resolved
xarray/backends/api.py Outdated Show resolved Hide resolved
kmuehlbauer and others added 3 commits December 17, 2024 17:19
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
@kmuehlbauer
Copy link
Contributor Author

@dcherian Thanks for reviewing, addressed along your comments/suggestions. I've added a comment to #4490 for more visibility, hope that's sufficient.

@mathause
Copy link
Collaborator

Can you explicitely write how to change this in the deprecation warning? As is I don't find it obvious what users have to do. ('Use decode_times=xr.coders.CFDatetimeCoder(use_cftime=True) instead').

I am also a bit surprised that the CFDateTimeCoder does not imply use_cftime=True - should that be named DateTimeCoder? (or did I miss this discusdion?)

Is the use of DeprecationWarning intentional? That's still silent in certain situations, right?

(Removing an arg is always nice but I use this a lot and its much more verbose...)

@kmuehlbauer
Copy link
Contributor Author

kmuehlbauer commented Dec 21, 2024

@mathause Thanks for sharing your concerns and suggestions.

Can you explicitely write how to change this in the deprecation warning? As is I don't find it obvious what users have to do. ('Use decode_times=xr.coders.CFDatetimeCoder(use_cftime=True) instead').

Yes, this is a bit sparse, how is it with:

import xarray as xr
time_coder = xr.coders.CFDatetimeCoder(use_cftime=True)
ds = xr.open_dataset(decode_times=time_coder)

I am also a bit surprised that the CFDateTimeCoder does not imply use_cftime=True - should that be named DateTimeCoder? (or did I miss this discusdion?)

So the current state of affairs is:

The default state of open_dataset kwarg decode_times depends on the backend, for most backends this is True. This will trying to decode datetimes into numpy datetime64 (pandas DatetimeIndex), if the on-disk datetimes are in the ns-representable time-range and a standard calendar is used. If this fails, cftime is used resulting in a CFTimeIndex. Setting use_cftime=True means to use the cftime-package for decoding anyway. This will result in a CFTimeIndex. In most cases (standard calendar/ proleptic gregorian) cftime is not needed to correctly decode according CF into numpy datetime64. Does that make sense?

Is the use of DeprecationWarning intentional? That's still silent in certain situations, right?

As the current behaviour is preserved, we might skip the DeprecationWarning for now until all coders have been edited to conform to some solution which originated from #4490. In the last dev meeting @TomNicholas added an item to the agenda moving the cf coders into a new package (xref https://github.com/xarray-contrib/cf-codecs and cc @rabernat). So this would be a first step into this direction.

I'll rework the DeprecationWarning to more explicit in what to do, but we should decide whether we want to warn from now or postpone this until we have a clear way of how to do things.

xarray/conventions.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is much easier to adapt to the deprecation with the suggested message.

Does that make sense?

Yes, that makes sense - thanks!

xarray/conventions.py Outdated Show resolved Hide resolved
Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kmuehlbauer—I'm also on board with this update. Just a few docstring suggestions, but otherwise this looks good to me.

I assume we'll add something similar for decode_timedelta and CFTimedeltaCoder when the time_unit option is added in #9618?

doc/whats-new.rst Outdated Show resolved Hide resolved
xarray/backends/api.py Outdated Show resolved Hide resolved
xarray/backends/api.py Outdated Show resolved Hide resolved
xarray/backends/api.py Outdated Show resolved Hide resolved
xarray/backends/api.py Outdated Show resolved Hide resolved
xarray/conventions.py Outdated Show resolved Hide resolved
xarray/conventions.py Outdated Show resolved Hide resolved
xarray/tests/test_coding_times.py Show resolved Hide resolved
xarray/conventions.py Outdated Show resolved Hide resolved
xarray/conventions.py Outdated Show resolved Hide resolved
Co-authored-by: Spencer Clark <spencerkclark@gmail.com>
xarray/conventions.py Outdated Show resolved Hide resolved
Co-authored-by: Spencer Clark <spencerkclark@gmail.com>
@kmuehlbauer
Copy link
Contributor Author

I assume we'll add something similar for decode_timedelta and CFTimedeltaCoder when the time_unit option is added in #9618?

Yes, that would be a follow-up PR.

xarray/conventions.py Outdated Show resolved Hide resolved
@kmuehlbauer
Copy link
Contributor Author

Thanks @kmuehlbauer—I'm also on board with this update. Just a few docstring suggestions, but otherwise this looks good to me.

@spencerkclark Thanks for taking the time to review. I've added your suggestions.

xarray/backends/api.py Outdated Show resolved Hide resolved
xarray/backends/api.py Outdated Show resolved Hide resolved
xarray/backends/api.py Outdated Show resolved Hide resolved
xarray/backends/api.py Outdated Show resolved Hide resolved
xarray/conventions.py Outdated Show resolved Hide resolved
xarray/conventions.py Outdated Show resolved Hide resolved
@kmuehlbauer
Copy link
Contributor Author

One or two more approvals would be great. @shoyer would you mind taking a look here, if this goes into the right direction of your idea in #4490?

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me! Thanks @kmuehlbauer

@kmuehlbauer
Copy link
Contributor Author

Great! If there is anything to be added to this, please open a new issue. I'll continue with #9618 after merging.

@kmuehlbauer kmuehlbauer merged commit 1486bea into pydata:main Jan 4, 2025
29 checks passed
@st-bender
Copy link
Contributor

Hi,
I don't think this is a good idea, it will break code, again.
Frankly I don't see the point in this deprecation, when keeping compatibility is easy.
Why deprecate it, and then give an example instead of keeping the keyword version and just use that example when the keyword is used?

Please reconsider this.

Cheers.

@kmuehlbauer
Copy link
Contributor Author

@st-bender I'm not sure I understand. This will not break code, it just issues a deprecation warning (advertising the new invocation) if used like before, only if the new invocation is used this and use_cftime is passed as kwarg, this will raise an error.

All keyword arguments which are related to CF decoding will eventually be deprecated from the function signature along this issue #4490.

Could you please a bit more specific what exactly should be reconsidered?

@st-bender
Copy link
Contributor

Hi,

@st-bender I'm not sure I understand. This will not break code, it just issues a deprecation warning (advertising the new invocation) if used like before, only if the new invocation is used this and use_cftime is passed as kwarg, this will raise an error.

It will break when the deprecation cycle has ended and the keyword version has been finally removed.

All keyword arguments which are related to CF decoding will eventually be deprecated from the function signature along this issue #4490.

I am not particularly a fan of deprecations in general, and of ones that break backwards compatibility in particular. Even less so when keeping compatibility is easy.

Could you please a bit more specific what exactly should be reconsidered?

I was referring to the deprecation of keyword arguments and their subsequent removal.

I, as a user, have developed quite some code that uses xarray extensively. And I am really grateful for all the work you put into it. However, I fear that in the future all that work was for nothing and the code will not run anymore, and that I'd have to spend a considerable time and efforts to keep these programs running with future versions of xarray. Sure, I could pin the version, but that would also mean missing future improvements wrt speed and resource use.

Cheers.

@kmuehlbauer
Copy link
Contributor Author

Thanks @st-bender for the details. I agree that downstream code has to actively be adapted to the new usage and that this might also be quite tedious sometimes.

In this particular case the deprecation isn't done for nothing but to make xarray capable of using different coders (non CF).

Just to show what #4490 has in mind:

ds = xr.open_dataset(
    fname, 
    mask_and_scale=True, 
    decode_times=True, 
    decode_timedelta=False,
    use_cftime=True,
    concat_characters=True,
    decode_coords='coordinates'
)

might be rewritten as

dt_coder = xr.coders.CFDatetimeCoder(use_cftime=True)
coder = xr.coders.CFCoders(
    mask_and_scale=True, 
    decode_times=dt_coder, 
    decode_timedelta=False,
    concat_characters=True,
    decode_coords='coordinates'
ds = xr.open_dataset(fname, coder=coder)

And at the same time it would be possible to use totally different coder for different conventions (non CF). This will not be possible if we keep these keyword arguments in the function signature.

So the deprecation of use_cftime as a keyword argument to xr.open_dataset (and other functions) is only a first step on the way outlined above. I hope this explains a bit the reason behind it.

@st-bender
Copy link
Contributor

Thanks @st-bender for the details. I agree that downstream code has to actively be adapted to the new usage and that this might also be quite tedious sometimes.

I wish developers would consider the amount of work for downstream developers a bit more.

In this particular case the deprecation isn't done for nothing but to make xarray capable of using different coders (non CF).

Why not keep both?
One could check if the argument is bool or a CFDatetimeCoder instance.

Just to show what #4490 has in mind:

ds = xr.open_dataset(
    fname, 
    mask_and_scale=True, 
    decode_times=True, 
    decode_timedelta=False,
    use_cftime=True,
    concat_characters=True,
    decode_coords='coordinates'
)

might be rewritten as

dt_coder = xr.coders.CFDatetimeCoder(use_cftime=True)
coder = xr.coders.CFCoders(
    mask_and_scale=True, 
    decode_times=dt_coder, 
    decode_timedelta=False,
    concat_characters=True,
    decode_coords='coordinates'
ds = xr.open_dataset(fname, coder=coder)

And at the same time it would be possible to use totally different coder for different conventions (non CF). This will not be possible if we keep these keyword arguments in the function signature.

So the deprecation of use_cftime as a keyword argument to xr.open_dataset (and other functions) is only a first step on the way outlined above. I hope this explains a bit the reason behind it.

One does not exclude the other, that's basically my point.

Cheers.

@kmuehlbauer
Copy link
Contributor Author

Why not keep both?

Because these keyword arguments are solely for CF Conventions (and might not apply for all backends). To allow for other Conventions to be decoded likewise these keyword arguments have to be removed from the function signature and be combined in dedicated coder classes (or the like). This will also make implementation of new backends easier. You are encouraged to contribute to the discussion in #4490 how these things should be handled.

There is not much more to add from my side. Good chance that other @pydata/xarray devs can enhance here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants