Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from pydata:main #582

Merged
merged 105 commits into from
Nov 24, 2024
Merged

[pull] main from pydata:main #582

merged 105 commits into from
Nov 24, 2024

Conversation

pull[bot]
Copy link

@pull pull bot commented Oct 14, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

shoyer and others added 4 commits October 13, 2024 14:03
* Reimplement DataTree aggregations

They now allow for dimensions that are missing on particular nodes, and
use Xarray's standard generate_aggregations machinery, like aggregations
for DataArray and Dataset.

Fixes #8949, #8963

* add API docs on DataTree aggregations

* remove incorrectly added sel methods

* fix docstring reprs

* mypy fix

* fix self import

* remove unimplemented agg methods

* replace dim_arg_to_dims_set with parse_dims

* add parse_dims_as_set

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix mypy errors

* change tests to match slightly different error now thrown

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: TomNicholas <tom@cworthy.org>
…ee function (#9614)

* updating group type annotation for netcdf, hdf5, and zarr open_datatree function

* supporting only  in group type annotation for netcdf, hdf5, and zarr open_datatree function

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Rename inherited -> inherit in DataTree.to_dataset

* fixed one missed instance of kwarg from #9602

---------

Co-authored-by: Tom Nicholas <tom@cworthy.org>
* remove too-long underline

* draft section on data alignment

* fixes

* draft section on coordinate inheritance

* various improvements

* more improvements

* link from other page

* align call include all 3 datasets

* link back to use cases

* clarification

* small improvements

* remove TODO after #9532

* add todo about #9475

* correct xr.align example call

* add links to netCDF4 documentation

* Consistent voice

Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>

* keep indexes in lat lon selection to dodge #9475

* unpack generator properly

Co-authored-by: Stephan Hoyer <shoyer@google.com>

* ideas for next section

* briefly summarize what alignment means

* clarify that it's the data in each node that was previously unrelated

* fix incorrect indentation of code block

* display the tree with redundant coordinates again

* remove content about non-inherited coords for a follow-up PR

* remove todo

* remove todo now that aggregations are re-implemented

* remove link to (unmerged) migration guide

* remove todo about improving error message

* correct statement in data-structures docs

* fix internal link

---------

Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
Co-authored-by: Stephan Hoyer <shoyer@google.com>
@pull pull bot added the ⤵️ pull label Oct 14, 2024
kmuehlbauer and others added 25 commits October 14, 2024 15:52
* test unary op

* implement and generate unary ops

* test for unary op with inherited coordinates

* re-enable arithmetic tests

* implementation for binary ops

* test ds * dt commutativity

* ensure other types defer to DataTree, thus fixing #9365

* test for inplace binary op

* pseudocode implementation of inplace binary op, and xfail test

* remove some unneeded type: ignore comments

* return type should be DataTree

* type datatree ops as accepting dataset-compatible types too

* use same type hinting hack as Dataset does for __eq__ not being same as Mapping

* ignore return type

* add some methods to api docs

* don't try to import DataTree.astype in API docs

* test to check that single-node trees aren't broadcast

* return NotImplemented

* remove pseudocode for inplace binary ops

* map_over_subtree -> map_over_datasets
* sketch of migration guide

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* whatsnew

* add date

* spell out API changes in more detail

* details on backends integration

* explain alignment and open_groups

* explain coordinate inheritance

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* re-trigger CI

* remove bullet about map_over_subtree

* Markdown formatting for important warning block

Co-authored-by: Matt Savoie <github@flamingbear.com>

* Reorder changes in order of importance

Co-authored-by: Matt Savoie <github@flamingbear.com>

* Clearer wording on setting relationships

Co-authored-by: Matt Savoie <github@flamingbear.com>

* remove "technically"

Co-authored-by: Matt Savoie <github@flamingbear.com>

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Matt Savoie <github@flamingbear.com>
As mentioned in #2157, the docstring of `Dataset.groupby` does not
reflect deprecation of squeeze (as the docstring of `DataArray.groupby`
does) and states an incorrect default value.
* Add inherit=False option to DataTree.copy()

This PR adds a inherit=False option to DataTree.copy, so users can
decide if they want to inherit coordinates from parents or not when
creating a subtree.

The default behavior is `inherit=True`, which is a breaking change from
the current behavior where parent coordinates are dropped (which I
believe should be considered a bug).

* fix typing

* add migration guide note

* ignore typing error
* Bug fixes for DataTree indexing and aggregation

My implementation of indexing and aggregation was incorrect on child
nodes, re-creating the child nodes from the root.

There was also another bug when indexing inherited coordinates that meant
formerly inherited coordinates were incorrectly dropped from results.

* disable broken test
* type hints for datatree ops tests

* type hints for datatree aggregations tests

* type hints for datatree indexing tests

* type hint a lot more tests

* more type hints
* Add zip_subtrees for paired iteration over DataTrees

This should be used for implementing DataTree arithmetic inside
map_over_datasets, so the result does not depend on the order in which
child nodes are defined.

I have also added a minimal implementation of breadth-first-search with
an explicit queue the current recursion based solution in
xarray.core.iterators (which has been removed). The new implementation
is also slightly faster in my microbenchmark:

    In [1]: import xarray as xr

    In [2]: tree = xr.DataTree.from_dict({f"/x{i}": None for i in range(100)})

    In [3]: %timeit _ = list(tree.subtree)
    # on main
    87.2 μs ± 394 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

    # with this branch
    55.1 μs ± 294 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

* fix pytype error

* Tweaks per review
If the file is empty (or contains no variables matching any filtering done by the backend), use a different error message indicating that, rather than suggesting that the file has too many variables for this function.
* Updates to DataTree.equals and DataTree.identical

In contrast to `equals`, `identical` now also checks that any
inherited variables are inherited on both objects. However, they do
not need to be inherited from the same source. This aligns the
behavior of `identical` with the DataTree `__repr__`.

I've also removed the `from_root` argument from `equals` and `identical`.
If a user wants to compare trees from their roots, a better (simpler)
inference is to simply call these methods on the `.root` properties.
I would also like to remove the `strict_names` argument, but that will
require switching to use the new `zip_subtrees` (#9623) first.

* More efficient check for inherited coordinates
* Fix error and probably missing code cell in io.rst

* Make this even simpler, remove link to same section
* Replace black with ruff-format

* Fix formatting mistakes moving mypy comments

* Replace black with ruff in the contributing guides
* Add zip_subtrees for paired iteration over DataTrees

This should be used for implementing DataTree arithmetic inside
map_over_datasets, so the result does not depend on the order in which
child nodes are defined.

I have also added a minimal implementation of breadth-first-search with
an explicit queue the current recursion based solution in
xarray.core.iterators (which has been removed). The new implementation
is also slightly faster in my microbenchmark:

    In [1]: import xarray as xr

    In [2]: tree = xr.DataTree.from_dict({f"/x{i}": None for i in range(100)})

    In [3]: %timeit _ = list(tree.subtree)
    # on main
    87.2 μs ± 394 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

    # with this branch
    55.1 μs ± 294 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

* fix pytype error

* Re-implement map_over_datasets

The main changes:

- It is implemented using zip_subtrees, which means it should properly
  handle DataTrees where the nodes are defined in a different order.
- For simplicity, I removed handling of `**kwargs`, in order to preserve
  some flexibility for adding keyword arugments.
- I removed automatic skipping of empty nodes, because there are almost
  assuredly cases where that would make sense. This could be restored
  with a option keyword arugment.

* fix typing of map_over_datasets

* add group_subtrees

* wip fixes

* update isomorphic

* documentation and API change for map_over_datasets

* mypy fixes

* fix test

* diff formatting

* more mypy

* doc fix

* more doc fix

* add api docs

* add utility for joining path on windows

* docstring

* add an overload for two return values from map_over_datasets

* partial fixes per review

* fixes per review

* remove a couple of xfails
* _inherited_vars -> inherited_vars

* implementation using Coordinates

* datatree.DataTree -> xarray.DataTree

* only show inherited coordinates on root

* test that there is an Inherited coordinates header
* flox: Properly propagate multiindex

Closes #9648

* skip test on old pandas

* small optimization

* fix
* Fix multiple grouping with missing groups

Closes #9360

* Small repr improvement

* Small optimization in mask

* Add whats-new

* fix doctests
…ests (#9651)

* Add close() method to DataTree and clean-up open files in tests

This removes a bunch of warnings that were previously issued in
unit-tests.

* Unit tests for closing functionality
…ap_blocks`` (#9658)

* Reduce graph size through writing indexes directly into graph for map_blocks

* Reduce graph size through writing indexes directly into graph for map_blocks

* Update xarray/core/parallel.py

---------

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* Remove zarr pin

* Define zarr_v3 helper

* zarr-v3: filters / compressors -> codecs

* zarr-v3: update tests to avoid values equal to fillValue

* Various test fixes

* zarr_version fixes

* removed open_consolidated workarounds
* removed _store_version check
* pass through zarr_version

* fixup! zarr-v3: filters / compressors -> codecs

* fixup! fixup! zarr-v3: filters / compressors -> codecs

* fixup

* path / key normalization in set_variables

* fixes

* workaround nested consolidated metadata

* test: avoid fill_value

* test: Adjust call counts

* zarr-python 3.x Array.resize doesn't mutate

* test compatibility

- skip write_empty_chunks on 3.x
- update patch targets

* skip ZipStore with_mode

* test: more fill_value avoidance

* test: more fill_value avoidance

* v3 compat for instrumented test

* Handle zarr_version / zarr_format deprecation

* wip

* most Zarr tests passing

* unskip tests

* add custom Zarr _FillValue encoding / decoding

* relax dtype comparison in test_roundtrip_empty_vlen_string_array

* fix test_explicitly_omit_fill_value_via_encoding_kwarg

* fix test_append_string_length_mismatch_raises

* fix test_check_encoding_is_consistent_after_append for v3

* skip roundtrip_endian for zarr v3

* unskip datetimes and fix test_compressor_encoding

* unskip tests

* add back dtype skip

* point upstream to v3 branch

* Create temporary directory before using it

* Avoid zarr.storage.zip on v2

* fixed close_store_on_close bug

* Remove workaround, fixed upstream

* Restore original `w` mode.

* workaround for store closing with mode=w

* typing fixes

* compat

* Remove unnecessary pop

* fixed skip

* fixup types

* fixup types

* [test-upstream]

* Update install-upstream-wheels.sh

* set use_consolidated to false when user provides consolidated=False

* fix: import consolidated_metadata from package root

* fix: relax instrumented store checks for v3

* Adjust 2.18.3 thresholds

* skip datatree zarr tests w/ zarr 3 for now

* fixed kvstore usage

* typing fixes

* move zarr.codecs import

* fixup ignores

* storage options fix, skip

* fixed types

* Update ci/install-upstream-wheels.sh

* type fixes

* whats-new

* Update xarray/tests/test_backends_datatree.py

* fix type import

* set mapper, chunk_mapper

* Pass through zarr_format

* Fixup

* more cleanup

* revert test changes

* Update xarray/backends/zarr.py

* cleanup

* update docstring

* fix rtd

* tweak

---------

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
Co-authored-by: Joe Hamman <joe@earthmover.io>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Deepak Cherian <deepak@cherian.net>
DimitriPapadopoulos and others added 29 commits November 9, 2024 12:31
* rewrite the `min_deps_check` script

* call the new script

* unpin `micromamba`

* install `rich-click`

* enforce a minimum width of 120

* remove the background colors

* remove old min-deps script

* more changing of colors

* some more styling

* ... aaand some more styling

* move the style definition in one place

* compare versions *before* formatting

* move the definition `console` into `main`

* properly add two columns to the warnings tables

* define the styles using the class and RGB values
… group (#9763)

Bumps the actions group with 1 update: [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish).


Updates `pypa/gh-action-pypi-publish` from 1.11.0 to 1.12.2
- [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases)
- [Commits](pypa/gh-action-pypi-publish@v1.11.0...v1.12.2)

---
updated-dependencies:
- dependency-name: pypa/gh-action-pypi-publish
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Use ``map_overlap`` for rolling reducers with Dask

* Enable argmin test

* Update
* Optimize polyfit

Closes #5629

1. Use Variable instead of DataArray
2. Use `reshape_blockwise` when possible following #5629 (comment)

* clean up little more

* more clean up

* Add one comment

* Update doc/whats-new.rst

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix whats-new

* Update doc/whats-new.rst

Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
* Allow wrapping astropy.units.Quantity

* allow all np.ndarray subclasses

* whats new

* test np.matrix

* fix comment

---------

Co-authored-by: tvo <tvo.email@proton.me>
Co-authored-by: Justus Magin <keewis@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* fix cf decoding of grid_mapping

* fix linter

* unnest list, add tests

* add whats-new.rst entry

* check for second warning, copy to prevent windows error (?)

* revert copy, but set allow_cleanup_failures=ON_WINDOWS

* add itertools

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update xarray/conventions.py

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

* Update conventions.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test in test_conventions.py

* add comment

* revert backend tests

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* Reduce the number of tasks when the limit parameter is set on the push function

* Reduce the number of tasks when the limit parameter is set on the push function, and incorporate the method parameter for the cumreduction on the push method

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update xarray/core/dask_array_ops.py

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

* Use last instead of creating a custom function, and add a keepdims parameter for the last and first to make it compatible with the blelloch method

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove the keepdims on the last and first method and use the nanlast method directly, they already have the parameter

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Include the optimization of ffill and bfill on the whats-new.rst

* Use map_overlap when the n is smaller than all the chunks

* Avoid creating a numpy array to check if all the chunks are bigger than N on the push method

* Updating the whats-new.rst

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Patrick Hoefler <phofl@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* add 'User-Agent'-header to pooch.retrieve

* try sys.modules

* Apply suggestions from code review

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add whats-new.rst entry

---------

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix open_mfdataset for list of fsspec files

* Rewrite to for loop

* Fixup
* add ReadBuffer Protocol for open_mfdataset

* finally fix LSP violation

* move import out of TYPE_CHECKING
…9793)

Bumps the actions group with 1 update: [codecov/codecov-action](https://github.com/codecov/codecov-action).


Updates `codecov/codecov-action` from 4.6.0 to 5.0.2
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@v4.6.0...v5.0.2)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…wn to `sliding_window_view` (#9720)

* sliding_window_view: add new `automatic_rechunk` kwarg

Closes #9550
xref #4325

* Switch to ``sliding_window_kwargs``

* Add one more

* better docstring

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Rename to sliding_window_view_kwargs

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix code touching text

* Fix type ignore syntax

* Use type annotations instead of comments

* Fix code with two backticks in rst files

* Add pygrep-hooks pre-commit

* Fix typos in docs and code

* Add prettier pre-commit hook

* Apply suggestions from code review

* Update .pre-commit-config.yaml

Co-authored-by: Justus Magin <keewis@users.noreply.github.com>

* add to .gitignore

---------

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Justus Magin <keewis@users.noreply.github.com>
* initial namespace-aware implementation

* use np subclass, test duck dask arrays

* remove dask special casing and numpy fallback

* add isnat

* hard code the supported ufuncs

* handle np versions, separate unary/binary path

* explicit unary/binary creators

* add to api docs

* add whats new

* move numpy version check to tests

* fix docs for aliased np funcs

* fix whats new

---------

Co-authored-by: Stephan Hoyer <shoyer@google.com>
* Bump minimum versions

* tweak

* Update doc/whats-new.rst

Co-authored-by: Justus Magin <keewis@users.noreply.github.com>

---------

Co-authored-by: Justus Magin <keewis@users.noreply.github.com>
* ENH, TST: aux func for importing optional deps

* ENH: use our new helper func for importing optional deps

* FIX: use aux func for a few more cftime imports

* FIX: remove cruft....

* FIX: Make it play well with mypy

Per the proposal at #9561 (comment)

This pairs any use of (a now simplified) `attempt_import` with a direct import of the same module, guarded by an `if TYPE_CHECKING` block.

* FIX, TST: match error

* Update xarray/tests/test_utils.py

Co-authored-by: Michael Niklas  <mick.niklas@gmail.com>

* DOC: add examples section to docstring

* refactor: use try-except clause and return original error to user

- Also change raise ImportError to raise RuntimeError, since we are catching both ImportError and ModuleNotFoundError

* TST: test import of submodules

* FIX: Incorporate  @headtr1ck suggetsions

From

#9561 (comment)
#9561 (comment)

---------

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Michael Niklas <mick.niklas@gmail.com>
* Add utility for opening remote files with fsspec

* Apply Joe's suggestions from code review

Co-authored-by: Joe Hamman <jhamman1@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Lint

* Add what's new entry

* Type hint

* Make mypy happy

---------

Co-authored-by: Joe Hamman <jhamman1@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add GroupBy.shuffle()

* Cleanup

* Cleanup

* fix

* return groupby instance from shuffle

* Fix nD by

* Skip if no dask

* fix tests

* Add `chunks` to signature

* FIx self

* Another Self fix

* Forward chunks too

* [revert]

* undo flox limit

* [revert]

* fix types

* Add DataArray.shuffle_by, Dataset.shuffle_by

* Add doctest

* Refactor

* tweak docstrings

* fix typing

* Fix

* fix docstring

* bump min version to dask>=2024.08.1

* Fix typing

* Fix types

* remove shuffle_by for now.

* Add tests

* Support shuffling with multiple groupers

* Revert "remove shuffle_by for now."

This reverts commit 7a99c8f.

* bad merge

* Add a test

* Add docs

* bugfix

* Refactor out Dataset._shuffle

* fix types

* fix tests

* Handle by is chunked

* Some refactoring

* Remove shuffle_by

* shuffle -> distributed_shuffle

* return xarray object from distributed_shuffle

* fix

* fix doctest

* fix api

* Rename to `shuffle_to_chunks`

* update docs
* Compatibility with Zarr v3b2

* More guards with mode="w"

* refactoring

* tweak expected requestsC

* compat

* more compat

* fix
* Faster chunk checking for backend datasets

* limit size

* fix test

* optimize
* new blank whatsnew

* add note on map_over_subtree -> map_over_datasets
* ListedColormap: don't pass N colors

* fix somewhere else

* fix typing
@pull pull bot merged commit 552a74b into Illviljan:main Nov 24, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.