-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hypothesis strategy for generating Variable objects #8404
Hypothesis strategy for generating Variable objects #8404
Conversation
for more information, see https://pre-commit.ci
…s/xarray into hypothesis-strategies
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have time to check the tests yet, but here are a few comments
Testing your code | ||
================= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. It is true that the page has a different target audience than the other pages in the user guide, but then again applications can also be tested. And, so far the "internals" section describes implementation details or extension mechanisms that affect the internals.
return ( | ||
npst.integer_dtypes() | ||
| npst.unsigned_integer_dtypes() | ||
| npst.floating_dtypes() | ||
| npst.complex_number_dtypes() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do support string dtypes, but only for a subset of operations. Is this worth mentioning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not meant to be an exhaustive list (yet). It doesn't include datetimes either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, but most operations don't make sense on string or datetime dtypes so it might be better to make a separate list of dtypes for those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - I'm just saying let's defer detailed discussions of which types to test until another issue / PR, the point of this PR is to provide a framework flexible enough to easily test xarray functions with any type we want, which this achieves.
Co-authored-by: Justus Magin <keewis@users.noreply.github.com>
Co-authored-by: Justus Magin <keewis@users.noreply.github.com>
…omNicholas/xarray into hypothesis-strategies-variable
My guess is that this is an existing docstring, the location of which is being misreported due to the various wrappers that Hypothesis inserts. I'd be very surprised if Hypothesis is modifying docstrings somehow, but I guess trimming trailing whitespace is the kind of thing that could happen somewhere in the stack. No direct insight, but getting the full text of the docstring it's complaining about should help? |
I got the docs build to pass! The warning was due to extra lines in the examples of the @keewis do you want to review the tests before I merge it? (The test failures now are something groupby-related, and are also happening in #8521, so definitely not my fault!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't spot anything that we wouldn't be able to change after merging / releasing, so I'd say let's merge and see how well it works in practice.
xarray/testing/strategies.py
Outdated
) | ||
|
||
|
||
def smallish_arrays( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the only reason we have this function the default strategy for shape
(and maybe some additional typing)? If so, we might be able to use functools.partial
on npst.arrays
? Unless you meant to expose this as public API (it's not in the API reference)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the only reason. I did not think of using functools.partial
- that's a good idea, I can try that out before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That actually won't work because we do need to be able to pass shape
and dtype
to the array_strategy_fn
.
But I tried removing smallish_arrays
completely and the tests still seem to complete in a reasonable amount of time, so I've actually just taken it out for now.
* fix import of xarray.testing internals that was changed by pydata/xarray#8404 * bump minimum required version of xarray * linting
* main: (26 commits) Filter null values before plotting (pydata#8535) Update concat.py (pydata#8538) Add getitem to array protocol (pydata#8406) Added option to specify weights in xr.corr() and xr.cov() (pydata#8527) Filter out doctest warning (pydata#8539) Bump actions/setup-python from 4 to 5 (pydata#8540) Point users to where in their code they should make mods for Dataset.dims (pydata#8534) Add Cumulative aggregation (pydata#8512) dev whats-new Whats-new for 2023.12.0 (pydata#8532) explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526) Add `eval` method to Dataset (pydata#7163) Deprecate ds.dims returning dict (pydata#8500) test and fix empty xindexes repr (pydata#8521) Remove PR labeler bot (pydata#8525) Hypothesis strategy for generating Variable objects (pydata#8404) Use numbagg for `rolling` methods (pydata#8493) Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 (pydata#8514) fix RTD docs build (pydata#8519) Fix type of `.assign_coords` (pydata#8495) ...
* main: (58 commits) Adapt map_blocks to use new Coordinates API (pydata#8560) add xeofs to ecosystem.rst (pydata#8561) Offer a fixture for unifying DataArray & Dataset tests (pydata#8533) Generalize cumulative reduction (scan) to non-dask types (pydata#8019) Filter null values before plotting (pydata#8535) Update concat.py (pydata#8538) Add getitem to array protocol (pydata#8406) Added option to specify weights in xr.corr() and xr.cov() (pydata#8527) Filter out doctest warning (pydata#8539) Bump actions/setup-python from 4 to 5 (pydata#8540) Point users to where in their code they should make mods for Dataset.dims (pydata#8534) Add Cumulative aggregation (pydata#8512) dev whats-new Whats-new for 2023.12.0 (pydata#8532) explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526) Add `eval` method to Dataset (pydata#7163) Deprecate ds.dims returning dict (pydata#8500) test and fix empty xindexes repr (pydata#8521) Remove PR labeler bot (pydata#8525) Hypothesis strategy for generating Variable objects (pydata#8404) ...
commit 0a0f800 Merge: 33c8033 41d33f5 Author: Deepak Cherian <dcherian@users.noreply.github.com> Date: Tue Jan 2 20:42:51 2024 -0700 Merge branch 'main' into depr-groupby-squeeze-2 commit 33c8033 Author: Deepak Cherian <deepak@cherian.net> Date: Tue Jan 2 20:40:42 2024 -0700 Don't skip for resampling commit d7be352 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed Jan 3 03:24:13 2024 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit d13fa0e Author: Deepak Cherian <dcherian@users.noreply.github.com> Date: Tue Jan 2 20:23:43 2024 -0700 Apply suggestions from code review Co-authored-by: Michael Niklas <mick.niklas@gmail.com> commit dd6ea53 Author: Deepak Cherian <deepak@cherian.net> Date: Thu Dec 21 19:29:40 2023 -0700 Silence more warnings commit 44e5a41 Author: Deepak Cherian <deepak@cherian.net> Date: Thu Dec 21 19:21:06 2023 -0700 minimize test mods commit 94c1c1f Author: Deepak Cherian <deepak@cherian.net> Date: Thu Dec 21 18:55:46 2023 -0700 Add tests for pydata#8263 commit 0ab4eb6 Author: Deepak Cherian <deepak@cherian.net> Date: Thu Dec 21 18:47:41 2023 -0700 Fix typing commit a064430 Merge: d6a3f2d 03ec3cb Author: Deepak Cherian <deepak@cherian.net> Date: Thu Dec 21 18:47:04 2023 -0700 Merge branch 'main' into depr-groupby-squeeze-2 * main: Fix mypy type ignore (pydata#8564) Support for the new compression arguments. (pydata#7551) FIX: reverse index output of bottleneck move_argmax/move_argmin functions (pydata#8552) Adapt map_blocks to use new Coordinates API (pydata#8560) add xeofs to ecosystem.rst (pydata#8561) Offer a fixture for unifying DataArray & Dataset tests (pydata#8533) Generalize cumulative reduction (scan) to non-dask types (pydata#8019) commit d6a3f2d Author: Deepak Cherian <deepak@cherian.net> Date: Thu Dec 21 18:46:50 2023 -0700 Fix generator for aggregations commit 97f1695 Author: Deepak Cherian <deepak@cherian.net> Date: Tue Dec 19 10:58:11 2023 -0700 Fix docs commit 5b33b98 Author: Deepak Cherian <deepak@cherian.net> Date: Sun Dec 17 20:35:53 2023 -0700 fix whats-new commit 80b2b36 Author: Deepak Cherian <deepak@cherian.net> Date: Sun Dec 17 20:26:17 2023 -0700 Reduce more warnings commit 5f6f4ea Merge: a57d4ae 2971994 Author: Deepak Cherian <deepak@cherian.net> Date: Sat Dec 16 20:33:13 2023 -0700 Merge branch 'main' into depr-groupby-squeeze-2 * main: (26 commits) Filter null values before plotting (pydata#8535) Update concat.py (pydata#8538) Add getitem to array protocol (pydata#8406) Added option to specify weights in xr.corr() and xr.cov() (pydata#8527) Filter out doctest warning (pydata#8539) Bump actions/setup-python from 4 to 5 (pydata#8540) Point users to where in their code they should make mods for Dataset.dims (pydata#8534) Add Cumulative aggregation (pydata#8512) dev whats-new Whats-new for 2023.12.0 (pydata#8532) explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526) Add `eval` method to Dataset (pydata#7163) Deprecate ds.dims returning dict (pydata#8500) test and fix empty xindexes repr (pydata#8521) Remove PR labeler bot (pydata#8525) Hypothesis strategy for generating Variable objects (pydata#8404) Use numbagg for `rolling` methods (pydata#8493) Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 (pydata#8514) fix RTD docs build (pydata#8519) Fix type of `.assign_coords` (pydata#8495) ... commit a57d4ae Author: Deepak Cherian <deepak@cherian.net> Date: Fri Dec 1 21:36:04 2023 -0700 Test one more warning commit bf8139d Author: Deepak Cherian <dcherian@users.noreply.github.com> Date: Fri Dec 1 21:33:45 2023 -0700 Update xarray/tests/test_groupby.py commit 4e9a063 Author: Deepak Cherian <deepak@cherian.net> Date: Fri Dec 1 21:10:14 2023 -0700 Set squeeze=None for Dataset too commit c2e576e Author: Deepak Cherian <deepak@cherian.net> Date: Fri Dec 1 20:54:17 2023 -0700 Fix first, last commit 6d8e822 Author: Deepak Cherian <deepak@cherian.net> Date: Fri Dec 1 20:46:21 2023 -0700 better warning commit 62c334b Author: Deepak Cherian <deepak@cherian.net> Date: Fri Dec 1 20:45:17 2023 -0700 silence warnings commit b7805a8 Author: dcherian <deepak@cherian.net> Date: Tue Aug 15 10:54:25 2023 -0600 Deprecate `squeeze` in GroupBy. Closes pydata#2157
* fix import of xarray.testing internals that was changed by pydata/xarray#8404 * bump minimum required version of xarray * linting
Breaks out just the part of #6908 needed for generating arbitrary
xarray.Variable
objects. (so ignore the ginormous number of commits)EDIT: Check out this test which performs a mean on any subset of any Variable object!
@andersy005 @maxrjones @jhamman I thought this might be useful for the
NamedArray
testing. (xref #8370 and #8244)@keewis and @Zac-HD sorry for letting that PR languish for literally a year 😅 This PR addresses your feedback about accepting a callable that returns a strategy generating arrays. That suggestion makes some things a bit more complex in user code but actually allows me to simplify the internals of the
variables
strategy significantly. I'm actually really happy with this PR - I think it solves what we were discussing, and is a sensible checkpoint to merge before going back to making strategies for generating composite objects like DataArrays/Datasets work.whats-new.rst
api.rst