-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Hypothesis strategy for generating Variable objects #8404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
587ebb8
acbfa69
73d763f
db2deff
746cfc8
03cd9de
2fe3583
4db3629
14d11aa
418a359
c8a7d0e
d48aceb
a20e341
3a4816f
d0406a2
65a222d
e1d718a
57d0f5b
82c734c
029f19a
46895fe
50c62e9
e21555a
1688779
0a29d32
3259849
717fabe
d76e5b6
c25940c
cd7b065
742b18c
8e548b1
d1487d4
c8b53f2
8bac610
cf3beb5
d991357
a6405cf
400ae3e
3609a34
63ad529
4ffbcbd
469482d
472de00
ced1a9f
a3c9ad0
b387304
404111d
3764a7b
9723e45
2e44860
1cc073b
603e6bb
63bb362
69ec230
e5c7e23
fd3d357
52f2490
9b96470
41fe0b4
0e53aa1
f659b4b
d1be3ee
e88f5f0
4b88887
2a1dc66
9bddcec
b2887d4
3b8e8ae
0980061
0313b3e
e6ebb1f
4da8772
e6d7a34
5197d1b
15812fd
3dc9c7b
4374681
0f0c4fb
6a30af5
cac46dc
177d908
5424e37
c871273
7730a27
24549bc
3082a09
5df60dc
01078de
53290e2
bd2cb6e
c5e83c2
de26b2f
f81e14f
129e2c3
601d9e2
af24af5
9777c2a
4dcbc60
7841dd5
968ee72
fd6aa06
df3341e
de4de8f
af14dc2
d001dbb
d4c9cb5
7983e34
2ad7bb0
a9f7cd5
49a1c64
c1f1974
6482ad3
95cab79
839c4f0
ded711a
f3c80ed
010f28c
4b07992
ba99afa
700d652
0e01d76
79f40f0
4ff57ec
959222e
78825c4
2418a61
331f521
adca1d2
14998c1
71f01f9
9c10895
2833f01
1ddc515
618bfea
fe1ff1a
2e038ea
04c3dc1
cf35fb9
4811e8a
a036253
054a0dc
ececa07
0fa090d
a9ac6f1
43831ce
62dbe88
af5eb25
dc78254
5822390
eeb6b32
e13c6ac
a169e1f
46b36b9
00ed3d6
d265ddb
0e872a8
a941e60
3d43ed6
6c912d2
bdf3aed
afd526d
6bbd13b
29ecd7d
631e810
c613027
4412d98
1ea0dcf
cf1a45e
ea738cd
79b0094
c6d43ca
00079bd
cbcd486
69ddd08
ea90162
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,303 @@ | ||
.. _testing: | ||
|
||
Testing your code | ||
================= | ||
Comment on lines
+3
to
+4
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The intention of creating this page is that material on using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps this whole page should go under the xarray internals section of the docs instead of the user guide? Because this realistically is only going to be used by other library developers, not most users. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure. It is true that the page has a different target audience than the other pages in the user guide, but then again applications can also be tested. And, so far the "internals" section describes implementation details or extension mechanisms that affect the internals. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can work this out later. This location seems fine for now, and changing it isn't a backwards-compatibility issue. |
||
|
||
.. ipython:: python | ||
:suppress: | ||
|
||
import numpy as np | ||
import pandas as pd | ||
import xarray as xr | ||
|
||
np.random.seed(123456) | ||
|
||
.. _testing.hypothesis: | ||
|
||
Hypothesis testing | ||
------------------ | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
.. note:: | ||
|
||
Testing with hypothesis is a fairly advanced topic. Before reading this section it is recommended that you take a look | ||
at our guide to xarray's :ref:`data structures`, are familiar with conventional unit testing in | ||
`pytest <https://docs.pytest.org/>`_, and have seen the | ||
`hypothesis library documentation <https://hypothesis.readthedocs.io/>`_. | ||
|
||
`The hypothesis library <https://hypothesis.readthedocs.io/>`_ is a powerful tool for property-based testing. | ||
Instead of writing tests for one example at a time, it allows you to write tests parameterized by a source of many | ||
dynamically generated examples. For example you might have written a test which you wish to be parameterized by the set | ||
of all possible integers via :py:func:`hypothesis.strategies.integers()`. | ||
|
||
Property-based testing is extremely powerful, because (unlike more conventional example-based testing) it can find bugs | ||
that you did not even think to look for! | ||
|
||
Strategies | ||
~~~~~~~~~~ | ||
|
||
Each source of examples is called a "strategy", and xarray provides a range of custom strategies which produce xarray | ||
data structures containing arbitrary data. You can use these to efficiently test downstream code, | ||
quickly ensuring that your code can handle xarray objects of all possible structures and contents. | ||
|
||
These strategies are accessible in the :py:mod:`xarray.testing.strategies` module, which provides | ||
|
||
.. currentmodule:: xarray | ||
|
||
.. autosummary:: | ||
|
||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
testing.strategies.supported_dtypes | ||
testing.strategies.names | ||
testing.strategies.dimension_names | ||
testing.strategies.dimension_sizes | ||
testing.strategies.attrs | ||
testing.strategies.variables | ||
testing.strategies.unique_subset_of | ||
|
||
These build upon the numpy and array API strategies offered in :py:mod:`hypothesis.extra.numpy` and :py:mod:`hypothesis.extra.array_api`: | ||
|
||
.. ipython:: python | ||
|
||
import hypothesis.extra.numpy as npst | ||
|
||
Generating Examples | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
To see an example of what each of these strategies might produce, you can call one followed by the ``.example()`` method, | ||
which is a general hypothesis method valid for all strategies. | ||
|
||
.. ipython:: python | ||
|
||
import xarray.testing.strategies as xrst | ||
|
||
xrst.variables().example() | ||
xrst.variables().example() | ||
xrst.variables().example() | ||
|
||
You can see that calling ``.example()`` multiple times will generate different examples, giving you an idea of the wide | ||
range of data that the xarray strategies can generate. | ||
|
||
In your tests however you should not use ``.example()`` - instead you should parameterize your tests with the | ||
:py:func:`hypothesis.given` decorator: | ||
|
||
.. ipython:: python | ||
|
||
from hypothesis import given | ||
|
||
.. ipython:: python | ||
|
||
@given(xrst.variables()) | ||
def test_function_that_acts_on_variables(var): | ||
assert func(var) == ... | ||
|
||
|
||
Chaining Strategies | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
Xarray's strategies can accept other strategies as arguments, allowing you to customise the contents of the generated | ||
examples. | ||
|
||
.. ipython:: python | ||
|
||
# generate a Variable containing an array with a complex number dtype, but all other details still arbitrary | ||
from hypothesis.extra.numpy import complex_number_dtypes | ||
|
||
xrst.variables(dtype=complex_number_dtypes()).example() | ||
|
||
This also works with custom strategies, or strategies defined in other packages. | ||
For example you could imagine creating a ``chunks`` strategy to specify particular chunking patterns for a dask-backed array. | ||
|
||
Fixing Arguments | ||
~~~~~~~~~~~~~~~~ | ||
|
||
If you want to fix one aspect of the data structure, whilst allowing variation in the generated examples | ||
over all other aspects, then use :py:func:`hypothesis.strategies.just()`. | ||
|
||
.. ipython:: python | ||
|
||
import hypothesis.strategies as st | ||
|
||
# Generates only variable objects with dimensions ["x", "y"] | ||
xrst.variables(dims=st.just(["x", "y"])).example() | ||
|
||
(This is technically another example of chaining strategies - :py:func:`hypothesis.strategies.just()` is simply a | ||
special strategy that just contains a single example.) | ||
|
||
To fix the length of dimensions you can instead pass ``dims`` as a mapping of dimension names to lengths | ||
(i.e. following xarray objects' ``.sizes()`` property), e.g. | ||
|
||
.. ipython:: python | ||
|
||
# Generates only variables with dimensions ["x", "y"], of lengths 2 & 3 respectively | ||
xrst.variables(dims=st.just({"x": 2, "y": 3})).example() | ||
|
||
You can also use this to specify that you want examples which are missing some part of the data structure, for instance | ||
|
||
.. ipython:: python | ||
|
||
# Generates a Variable with no attributes | ||
xrst.variables(attrs=st.just({})).example() | ||
|
||
Through a combination of chaining strategies and fixing arguments, you can specify quite complicated requirements on the | ||
objects your chained strategy will generate. | ||
|
||
.. ipython:: python | ||
|
||
fixed_x_variable_y_maybe_z = st.fixed_dictionaries( | ||
{"x": st.just(2), "y": st.integers(3, 4)}, optional={"z": st.just(2)} | ||
) | ||
Comment on lines
+145
to
+147
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This feels like a great place to introduce the |
||
fixed_x_variable_y_maybe_z.example() | ||
|
||
special_variables = xrst.variables(dims=fixed_x_variable_y_maybe_z) | ||
|
||
special_variables.example() | ||
special_variables.example() | ||
|
||
Here we have used one of hypothesis' built-in strategies :py:func:`hypothesis.strategies.fixed_dictionaries` to create a | ||
strategy which generates mappings of dimension names to lengths (i.e. the ``size`` of the xarray object we want). | ||
This particular strategy will always generate an ``x`` dimension of length 2, and a ``y`` dimension of | ||
length either 3 or 4, and will sometimes also generate a ``z`` dimension of length 2. | ||
By feeding this strategy for dictionaries into the ``dims`` argument of xarray's :py:func:`~st.variables` strategy, | ||
we can generate arbitrary :py:class:`~xarray.Variable` objects whose dimensions will always match these specifications. | ||
|
||
Generating Duck-type Arrays | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Xarray objects don't have to wrap numpy arrays, in fact they can wrap any array type which presents the same API as a | ||
numpy array (so-called "duck array wrapping", see :ref:`wrapping numpy-like arrays <internals.duckarrays>`). | ||
|
||
Imagine we want to write a strategy which generates arbitrary ``Variable`` objects, each of which wraps a | ||
:py:class:`sparse.COO` array instead of a ``numpy.ndarray``. How could we do that? There are two ways: | ||
|
||
1. Create a xarray object with numpy data and use the hypothesis' ``.map()`` method to convert the underlying array to a | ||
different type: | ||
|
||
.. ipython:: python | ||
|
||
import sparse | ||
|
||
.. ipython:: python | ||
|
||
def convert_to_sparse(var): | ||
return var.copy(data=sparse.COO.from_numpy(var.to_numpy())) | ||
|
||
.. ipython:: python | ||
|
||
sparse_variables = xrst.variables(dims=xrst.dimension_names(min_dims=1)).map( | ||
convert_to_sparse | ||
) | ||
|
||
sparse_variables.example() | ||
sparse_variables.example() | ||
|
||
2. Pass a function which returns a strategy which generates the duck-typed arrays directly to the ``array_strategy_fn`` argument of the xarray strategies: | ||
|
||
.. ipython:: python | ||
|
||
def sparse_random_arrays(shape: tuple[int]) -> sparse._coo.core.COO: | ||
"""Strategy which generates random sparse.COO arrays""" | ||
if shape is None: | ||
shape = npst.array_shapes() | ||
else: | ||
shape = st.just(shape) | ||
density = st.integers(min_value=0, max_value=1) | ||
# note sparse.random does not accept a dtype kwarg | ||
return st.builds(sparse.random, shape=shape, density=density) | ||
|
||
|
||
def sparse_random_arrays_fn( | ||
*, shape: tuple[int, ...], dtype: np.dtype | ||
) -> st.SearchStrategy[sparse._coo.core.COO]: | ||
return sparse_random_arrays(shape=shape) | ||
|
||
|
||
.. ipython:: python | ||
|
||
sparse_random_variables = xrst.variables( | ||
array_strategy_fn=sparse_random_arrays_fn, dtype=st.just(np.dtype("float64")) | ||
) | ||
sparse_random_variables.example() | ||
|
||
Either approach is fine, but one may be more convenient than the other depending on the type of the duck array which you | ||
want to wrap. | ||
|
||
Compatibility with the Python Array API Standard | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Xarray aims to be compatible with any duck-array type that conforms to the `Python Array API Standard <https://data-apis.org/array-api/latest/>`_ | ||
(see our :ref:`docs on Array API Standard support <internals.duckarrays.array_api_standard>`). | ||
|
||
.. warning:: | ||
|
||
The strategies defined in :py:mod:`testing.strategies` are **not** guaranteed to use array API standard-compliant | ||
dtypes by default. | ||
For example arrays with the dtype ``np.dtype('float16')`` may be generated by :py:func:`testing.strategies.variables` | ||
(assuming the ``dtype`` kwarg was not explicitly passed), despite ``np.dtype('float16')`` not being in the | ||
array API standard. | ||
|
||
If the array type you want to generate has an array API-compliant top-level namespace | ||
(e.g. that which is conventionally imported as ``xp`` or similar), | ||
you can use this neat trick: | ||
|
||
.. ipython:: python | ||
:okwarning: | ||
|
||
from numpy import array_api as xp # available in numpy 1.26.0 | ||
|
||
from hypothesis.extra.array_api import make_strategies_namespace | ||
|
||
xps = make_strategies_namespace(xp) | ||
|
||
xp_variables = xrst.variables( | ||
array_strategy_fn=xps.arrays, | ||
dtype=xps.scalar_dtypes(), | ||
) | ||
xp_variables.example() | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Another array API-compliant duck array library would replace the import, e.g. ``import cupy as cp`` instead. | ||
|
||
Testing over Subsets of Dimensions | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
A common task when testing xarray user code is checking that your function works for all valid input dimensions. | ||
We can chain strategies to achieve this, for which the helper strategy :py:func:`~testing.strategies.unique_subset_of` | ||
is useful. | ||
|
||
It works for lists of dimension names | ||
|
||
.. ipython:: python | ||
|
||
dims = ["x", "y", "z"] | ||
xrst.unique_subset_of(dims).example() | ||
xrst.unique_subset_of(dims).example() | ||
|
||
as well as for mappings of dimension names to sizes | ||
|
||
.. ipython:: python | ||
|
||
dim_sizes = {"x": 2, "y": 3, "z": 4} | ||
xrst.unique_subset_of(dim_sizes).example() | ||
xrst.unique_subset_of(dim_sizes).example() | ||
|
||
This is useful because operations like reductions can be performed over any subset of the xarray object's dimensions. | ||
For example we can write a pytest test that tests that a reduction gives the expected result when applying that reduction | ||
along any possible valid subset of the Variable's dimensions. | ||
|
||
.. code-block:: python | ||
|
||
import numpy.testing as npt | ||
|
||
|
||
@given(st.data(), xrst.variables(dims=xrst.dimension_names(min_dims=1))) | ||
def test_mean(data, var): | ||
"""Test that the mean of an xarray Variable is always equal to the mean of the underlying array.""" | ||
|
||
# specify arbitrary reduction along at least one dimension | ||
reduction_dims = data.draw(xrst.unique_subset_of(var.dims, min_size=1)) | ||
|
||
# create expected result (using nanmean because arrays with Nans will be generated) | ||
reduction_axes = tuple(var.get_axis_num(dim) for dim in reduction_dims) | ||
expected = np.nanmean(var.data, axis=reduction_axes) | ||
|
||
# assert property is always satisfied | ||
result = var.mean(dim=reduction_dims).data | ||
npt.assert_equal(expected, result) |
Uh oh!
There was an error while loading. Please reload this page.