-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DataArray.to_dask_dataframe() #7635
Implement DataArray.to_dask_dataframe() #7635
Conversation
xarray/core/dataarray.py
Outdated
Examples | ||
-------- | ||
|
||
da=xr.DataArray(np.random.rand(4,3,2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is not correctly formatted, see other functions as reference.
Also, don't use random values in examples, simply use np.ones(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.ones can hide errors when dealing with tricky shapes so something like np.arange(4*3*2).reshape(4,3,2)
is a little better.
xarray/tests/test_dataarray.py
Outdated
@@ -3205,6 +3205,39 @@ def test_to_dataframe_0length(self) -> None: | |||
assert len(actual) == 0 | |||
assert_array_equal(actual.index.names, list("ABC")) | |||
|
|||
def test_to_dask_dataframe(self) -> None: | |||
arr_np = np.random.randn(3, 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though it doesn't really matter in most cases, we try to avoid random values in tests.
Maybe use np.arange(3*4).reshape(3,4)
.
xarray/core/dataarray.py
Outdated
if self.ndim == 0: | ||
raise ValueError("Cannot convert a scalar to a dataframe") | ||
|
||
tmp_dataset = Dataset({name: self}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally we use the to tmp dataset method here, but since we only use it to construct the data frame and don't roundtrip it doesn't actually matter?
xarray/core/dataarray.py
Outdated
dim_order: Sequence of Hashable or None , optional | ||
Hierarchical dimension order for the resulting dataframe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dim_order: Sequence of Hashable or None , optional | |
Hierarchical dimension order for the resulting dataframe. | |
dim_order: Sequence of Hashable or None , optional | |
Hierarchical dimension order for the resulting dataframe. |
Follow numpys docstring conventions. More errors above and below.
xarray/core/dataarray.py
Outdated
if name is None: | ||
name = self.name | ||
|
||
if name is None: | ||
raise ValueError( | ||
"Cannot convert an unnamed DataArray to a " | ||
"dask dataframe : use the ``name`` parameter" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if name is None: | |
name = self.name | |
if name is None: | |
raise ValueError( | |
"Cannot convert an unnamed DataArray to a " | |
"dask dataframe : use the ``name`` parameter" | |
) |
Not needed when using self._to_dataset_whole
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be better to keep this error message ? When I removed it, The error shown was ' unable to convert unnamed DataArray to a Dataset without providing an explicit name ' . Keeping these lines can show the error message specific to dataarray to daskdataframe conversion.
xarray/core/dataarray.py
Outdated
Examples | ||
-------- | ||
|
||
da=xr.DataArray(np.random.rand(4,3,2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.ones can hide errors when dealing with tricky shapes so something like np.arange(4*3*2).reshape(4,3,2)
is a little better.
I have made the changes as suggested. Please review them .Thanks |
xarray/core/dataarray.py
Outdated
if name is None: | ||
name = self.name | ||
|
||
if name is None: | ||
raise ValueError( | ||
"Cannot convert an unnamed DataArray to a " | ||
"dask dataframe : use the ``name`` parameter ." | ||
) | ||
ds = self._to_dataset_whole(name) | ||
return ds.to_dask_dataframe(dim_order, set_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if name is None: | |
name = self.name | |
if name is None: | |
raise ValueError( | |
"Cannot convert an unnamed DataArray to a " | |
"dask dataframe : use the ``name`` parameter ." | |
) | |
ds = self._to_dataset_whole(name) | |
return ds.to_dask_dataframe(dim_order, set_index) | |
name = self.name if self.name is not None else _THIS_ARRAY | |
ds = self._to_dataset_whole(name, shallow_copy=False) | |
return ds.to_dask_dataframe(dim_order, set_index) |
I think we go with this. I don't think it should be necessary to name the dataarray which is more in line with how self._to_temp_dataset
works and setting dataarray.name = "new_name"
is easy enough.
xarray/core/dataarray.py
Outdated
|
||
name : Hashable or None, optional | ||
Name given to this array(required if unnamed). | ||
It is a keyword-only argument. A keyword-only argument can only be passed | ||
to the function using its name as a keyword argument , and not as a | ||
positional argument. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name : Hashable or None, optional | |
Name given to this array(required if unnamed). | |
It is a keyword-only argument. A keyword-only argument can only be passed | |
to the function using its name as a keyword argument , and not as a | |
positional argument. |
xarray/core/dataarray.py
Outdated
*, | ||
name: Hashable | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*, | |
name: Hashable | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the changes .Please review them.
…n2/xarray into method-dataarray-to-daskdataframe Updating branch doc/whats-new.rst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still an issue with the docstring example, probably some whitespace mismatch somewhere. It should just be copy/pasting the results from the ipython console.
doc/whats-new.rst
Outdated
@@ -70,6 +71,7 @@ New Features | |||
By `Deepak Cherian <https://github.com/dcherian>`_. | |||
- Improved performance in ``open_dataset`` for datasets with large object arrays (:issue:`7484`, :pull:`7494`). | |||
By `Alex Goodman <https://github.com/agoodm>`_ and `Deepak Cherian <https://github.com/dcherian>`_. | |||
- Added new method :py:meth:`DataArray.to_dask_dataframe`,convert a dataarray into a dask dataframe (:issue:`7409`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove.
xarray/core/dataarray.py
Outdated
vectors in contiguous order , so the last dimension in this list | ||
will be contiguous in the resulting DataFrame. This has a major influence | ||
on which operations are efficient on the resulting dask dataframe. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the changes .Please review them. Thanks
xarray/core/dataarray.py
Outdated
if self.name is None: | ||
raise ValueError( | ||
"Cannot convert an unnamed DataArray to a " | ||
"dask dataframe : use the ``name`` parameter ." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"dask dataframe : use the ``name`` parameter ." | |
"dask dataframe : use the ``.rename`` method to assign a name." |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Thanks for your patience here @dsgreen2 . This is a nice contribution. Welcome to Xarray! |
for more information, see https://pre-commit.ci
* main: Introduce Grouper objects internally (pydata#7561) [skip-ci] Add cftime groupby, resample benchmarks (pydata#7795) Fix groupby binary ops when grouped array is subset relative to other (pydata#7798) adjust the deprecation policy for python (pydata#7793) [pre-commit.ci] pre-commit autoupdate (pydata#7803) Allow the label run-upstream to run upstream CI (pydata#7787) Update asv links in contributing guide (pydata#7801) Implement DataArray.to_dask_dataframe() (pydata#7635) `ds.to_dict` with data as arrays, not lists (pydata#7739) Add lshift and rshift operators (pydata#7741) Use canonical name for set_horizonalalignment over alias set_ha (pydata#7786) Remove pandas<2 pin (pydata#7785) [pre-commit.ci] pre-commit autoupdate (pydata#7783)
Adds a method to_dask_dataframe() to convert a dataarray to a dask dataframe.
DataArray.to_dask_dataframe()
#7409whats-new.rst
api.rst
I have added the function to_dask_dataframe() in dataarray.py . This implementation is as suggested in issue #7409 . The function first converts the data array to a temporary dataset and then calls Dataset.to_dask_dataframe() method.
Could you please review it . Thank you.