-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apply to dataset #4863
base: main
Are you sure you want to change the base?
apply to dataset #4863
Conversation
Should this be a top-level
instead? |
sure, although I would use
|
xarray/core/computation.py
Outdated
If a ``DataArray``, result will have the same name as ``obj`` but the single data | ||
variable in the temporary ``Dataset`` will always have a generic name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this as simple as "DataArray
s will retain their name"? If so, maybe we don't need any notes? (very possible I'm missing some of the complexity, as ever)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I might not have explained that correctly. The temporary Dataset
generates always has a <this-array>
variable, but the original name will be restored by _from_temp_dataset
.
Edit: I rewrote it, is that easier to understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the name of the variable of the temporary dataset matter to the user though? To what extent is that just an implementation detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the function will see the Dataset
so it might be important to keep the note. For example, this would need to change the name in the units
dict from None
to <this-array>
to work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in core/parallel.py
there's a dataarray_to_dataset
(and inverse dataset_to_dataarray
) function that preserves name if possible. I think name preservation is a good thing for a user-facing function.
I think this is great! I agree the name doesn't sing, but also I can't think of a better one... |
04caf01
to
cdb0f3d
Compare
fa42c6a
to
0daf42d
Compare
I reckon this is ready to merge! One final pause on the name — |
|
I like that! |
I switched to |
Looks great! (no view on sentinel vs string though) |
xarray/core/computation.py
Outdated
if isinstance(obj, DataArray): | ||
ds = dataarray_to_dataset(obj) | ||
if obj.name is None: | ||
ds = ds.rename({_THIS_ARRAY: "<this-array>"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another option would be to use backends.core.api.DATAARRAY_VARIABLE
which is used when writing a DataArray to netcdf (I think). I don't feel strongly about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I don't know which name would be better, either. THIS_ARRAY
is not part of the public API so we can't use it, and None
obviously doesn't work, either. Using a string seems like a good choice but the exact value will almost always be arbitrary. The advantage of "<this-array>"
is that it is the string representation of THIS_ARRAY
, but that's the only reason I chose that. DATAARRAY_VARIABLE
or DATAARRAY_NAME
have the value f"__xarray_dataarray_{type}__"
, but neither of them are actually part of the public API (I think?), which means they have the same issue as THIS_ARRAY
(not sure if that's actually a problem, though: the simply reference a string).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought @keewis 's idea of self
was good from #5493 (comment), to the extent that could apply here
Edit: but then @dcherian pointed out this will fail if there's a dim called self
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this in xarray-contrib/pint-xarray#110, and for that at least it's actually an advantage that I have to pass the name. Not sure if that's the same for every other use case though (but defining the name explicitly is not much overhead so it should be fine)
upon further consideration, I think we have the choice between using a generic name, raising for unnamed I think I would prefer option 3a (always use the passed name, even if the |
IIUC people can do But no strong preference and +1 to merging this. |
passing the name makes it cleaner, but we could also add a default value ( |
Unit Test Results 6 files 6 suites 53m 52s ⏱️ Results for commit 12400cb. |
Yes agree! No strong view on what it should be, Would we remove the name before passing back the array? |
that name is temporary, which means it is only visible within the user function. |
as discussed in #4837, this adds a method that applies a function to a
DataArray
by first converting it to a temporary dataset using_to_temp_dataset
, applies the function and converts it back. I'm not really happy with the name but I can't find a better one.This function is really similar to
pipe
, so I guess a keyword argument to pipe would work, too. The disadvantage of that is thatpipe
passes all kwargs to the passed function, which means we would shadow a specific kwarg.pre-commit run --all-files
whats-new.rst
api.rst