-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
override units
for datetime64/timedelta64 variables to preserve integer dtype
#8201
Conversation
Unfortunately this is more involved. Coming back later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want a test for this?
xarray/coding/times.py
Outdated
@@ -777,6 +777,11 @@ def encode(self, variable: Variable, name: T_Name = None) -> Variable: | |||
safe_setitem(attrs, "units", units, name=name) | |||
safe_setitem(attrs, "calendar", calendar, name=name) | |||
|
|||
# remove dtype from encoding to prevent unnecessary casts | |||
# see GH #1064 | |||
if "dtype" in encoding: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if "dtype" in encoding: | |
encoding.pop("dtype", None) |
Ah sry, didn't see this. Good luck |
…prevent unnecessary casts
f1d2bf7
to
37db2b0
Compare
Thanks, finally, that wasn't that complicated :-) |
Co-authored-by: Spencer Clark <spencerkclark@gmail.com>
Co-authored-by: Spencer Clark <spencerkclark@gmail.com>
Thanks a bunch @spencerkclark, I've addressed your suggestions. |
Should we override here or warn the user instead? |
Quoting from @NotSqrt's comment #1064 (comment): If the resolution loss can't be fixed automatically, what would be nice in the warning is a link or a summary of what the user has to do to solve the resolution loss! I'm totally open here. The question is, what takes precedence |
To add to my above comment, with #7827 we added a warning when times/timedeltas had to be encoded to float64 because of units not fitting. In this PR for those cases where times/timedeltas had to be encoded to fit If needed we could add information to the above mentioned warning that Which approach should we take? |
Yeah, it seems like the two options are:
Exactly, I think this is the right way to frame it. I guess the answer is really up to the user. It's hard to know ahead of time what they care about more, so maybe the most conservative approach is to raise? Or do we feel having a default solution (with information on how to address it in other ways) is more helpful? |
To give a little context, I initially encountered those problems in this setup:
I don't care about the exact unit, I just don't want to randomly lose some precision on my data .. |
@spencerkclark @dcherian Yes, it's probably best to raise with a meaningful error message. |
Having a random exception pop up in production just because the delta between points has changed would have been very surprising. |
@NotSqrt Thanks for the additional context. From your use case you would prefer to automatically change the units and keep So let's try to find a default solution if
Are there more cases? @dcherian @spencerkclark Is it possible to check if BTW, this whole mess is another very strong argument for a removal of |
There is no way to decide if
Yes, I agree with this framing. |
@dcherian @spencerkclark I'm a bit biased here because my own workflows have similar issues as @NotSqrt's. I'm now thinking that my first approach overriding If provided |
dtype
from encoding for datetime64/timedelta64 variablesunits
for datetime64/timedelta64 variables to preserve integer dtype
The last iteration aligns with my case 1 from above. This is the flow now:
We could raise instead of warn in 2b, but I'm not sure that will do any good for automated workflows. |
Thanks @kmuehlbauer. I agree, if we are going to override by default, overriding the units is more palatable than the dtype, since it guarantees round-trip accuracy (in other words if one is staying in the xarray ecosystem, it's fairly clearly the way to go). While I can come up with hypothetical reasons for why one might want to preserve units encoding over the dtype, I don't think it would be a very common situation. So I'd say I am on board with your current approach, though we should make sure that the warning describes how one can preserve the units instead of the dtype by modifying the encoding if they'd like a different behavior. |
@spencerkclark I've added the needed instructions for achieving different behaviour, plus howto silence these warning. This should be good to go then from my side. It would be good to get this into the next version alongside the nanosecond fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, yeah, let's get this in. Thanks again @kmuehlbauer!
Co-authored-by: Spencer Clark <spencerkclark@gmail.com>
whats-new.rst
api.rst