-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shoudn't assert_allclose
transpose datasets?
#5733
Comments
I agree this is another station on the journey between "byte identical" and "practically similar", which we've discovered over the past couple of years... :) I would lean towards having it fail for |
Is there any operation in xarray where dimension order matters? If yes, I'm not sure if it's a good idea to allow transposed dimension pass xr.testing.assert_allclose(ds1.data, ds2.data) # ok
np.testing.assert_allclose(ds1.data.values, ds2.data.values) # fails What about a |
This sounds good to me. We should also have a |
I have a related question import xarray as xr
da = xr.DataArray([[1, 1, 1], [2, 2, 2]], dims=("x", "y"))
xr.testing.assert_identical(da, da.transpose())
Strictly speaking, the values are different I guess. However I think this error would be clearer if it said that the dimension order was different but the values are equal once the dimensions are transposed. I.e. we could if set(a.dims) == set(b.dims):
a = a.transpose(b.dims)
# check values and raise if actually different
else:
# current behaviour Is this a good idea? |
I guess this comes down a bit to a philosophical question related to @benbovy s comment above. You can either make this operation be similar to the numpy equivalent (with some more xarray specific checks) or it can check whether the values at a certain combo of labels are the same/close. The latter would be the way I think about data in xarray as a user. To me the removal of axis logic (via labels) is one of the biggest draws for myself, but importantly I also pitch this as one of the big reasons to switch to xarray for beginners. I would argue that a 'strict' (numpy style) comparision is less practical in a scientific workflow and we do have the numpy implementation to achieve that functionality. However, this might be very opinionated on my end, and a better error message would already be a massive improvement. |
I think we definitely need a flag, because we have two conflicting use cases:
xr.testing.assert_allclose(ds1.data, ds2.data) # ok
np.testing.assert_allclose(ds1.data.values, ds2.data.values) # fails
The easiest way to cover both is to have an option to switch between them. In my opinion the only question is what the default comparison behaviour should be. I like the idea of adding a |
I don't think we can change this because it's very backwards incompatible and affects tests in downstream packages. But 👍 to adding a flag allowing users to opt out of dimension order checking. |
@jbusecke I agree with your point of view that "xarray-style" comparison is more practical in a scientific workflow. Especially if dimension order is irrelevant for most (all?) xarray operations, ignoring the order for However, it might be also dangerous / harmful if the workflow includes data conversion between labeled vs. unlabelled formats. There's a risk of checking for equality with xarray, then later converting to numpy and assuming that arrays are equal without feeling the need to check again. If dimension sizes are the same this might lead to very subtle bugs. Since it is easy to ignore or forget about default values, having a @dcherian I like your idea but I'm not sure what's best between your code snippet and checking equality of aligned dimensions datasets only if non-dimension-aligned are not equal. |
+1 for a Or failing that, it would at least be nice to have As pointed out, most of the xarray API is dimension-order-invariant and so it's odd to have no supported way to do comparisons in a dimension-order-invariant way. |
My data point: I was writing tests for my libraries and I was puzzled for some time until I realized
Don't all conversions to numpy have to keep dimension order in mind? |
We would happily take a PR to implement a keyword-only |
Awesome, I'll work on it. |
Would it make sense to extend this to broadcastable in general, so allow missing dimensions etc.? |
To be this would be too lax — it does matter if there are missing dims — they will have a different impact / repr etc... |
My changes so far mostly keep the old behavior unless the user specifies
I still need to look into Datatrees (first time I've heard about them) so I know what to do about them. |
Thanks for working on this @ignamv ! I've added some comments to your PR.
Yes I think that is fine!
You should just be able to copy the pattern you'll see in the assert functions already. Happy to help if it's not clear. |
…#5733) (#8991) * Add argument check_dims to assert_allclose to allow transposed inputs * Update whats-new.rst * Add `check_dims` argument to assert_equal and assert_identical + tests * Assert that dimensions match before transposing or comparing values * Add docstring for check_dims to assert_equal and assert_identical * Update doc/whats-new.rst Co-authored-by: Tom Nicholas <tom@cworthy.org> * Undo fat finger Co-authored-by: Tom Nicholas <tom@cworthy.org> * Add attribution to whats-new.rst * Replace check_dims with bool argument check_dim_order, rename align_dims to maybe_transpose_dims * Remove left-over half-made test * Remove check_dim_order argument from assert_identical * assert_allclose/equal: emit full diff if dimensions don't match * Rename check_dim_order test, test Dataset with different dim orders * Update whats-new.rst * Hide maybe_transpose_dims from Pytest traceback Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> * Ignore mypy error due to missing functools.partial.__name__ --------- Co-authored-by: Tom Nicholas <tom@cworthy.org> Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
…#5733) (#8991) * Add argument check_dims to assert_allclose to allow transposed inputs * Update whats-new.rst * Add `check_dims` argument to assert_equal and assert_identical + tests * Assert that dimensions match before transposing or comparing values * Add docstring for check_dims to assert_equal and assert_identical * Update doc/whats-new.rst Co-authored-by: Tom Nicholas <tom@cworthy.org> * Undo fat finger Co-authored-by: Tom Nicholas <tom@cworthy.org> * Add attribution to whats-new.rst * Replace check_dims with bool argument check_dim_order, rename align_dims to maybe_transpose_dims * Remove left-over half-made test * Remove check_dim_order argument from assert_identical * assert_allclose/equal: emit full diff if dimensions don't match * Rename check_dim_order test, test Dataset with different dim orders * Update whats-new.rst * Hide maybe_transpose_dims from Pytest traceback Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> * Ignore mypy error due to missing functools.partial.__name__ --------- Co-authored-by: Tom Nicholas <tom@cworthy.org> Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
* main: Avoid auto creation of indexes in concat (#8872) Fix benchmark CI (#9013) Avoid extra read from disk when creating Pandas Index. (#8893) Add a benchmark to monitor performance for large dataset indexing (#9012) Zarr: Optimize `region="auto"` detection (#8997) Trigger CI only if code files are modified. (#9006) Fix for ruff 0.4.3 (#9007) Port negative frequency fix for `pandas.date_range` to `cftime_range` (#8999) Bump codecov/codecov-action from 4.3.0 to 4.3.1 in the actions group (#9004) Speed up localize (#8536) Simplify fast path (#9001) Add argument check_dims to assert_allclose to allow transposed inputs (#5733) (#8991) Fix syntax error in test related to cupy (#9000)
* backend-indexing: Trigger CI only if code files are modified. (pydata#9006) Enable explicit use of key tuples (instead of *Indexer objects) in indexing adapters and explicitly indexed arrays (pydata#8870) add `.oindex` and `.vindex` to `BackendArray` (pydata#8885) temporary enable CI triggers on feature branch Avoid auto creation of indexes in concat (pydata#8872) Fix benchmark CI (pydata#9013) Avoid extra read from disk when creating Pandas Index. (pydata#8893) Add a benchmark to monitor performance for large dataset indexing (pydata#9012) Zarr: Optimize `region="auto"` detection (pydata#8997) Trigger CI only if code files are modified. (pydata#9006) Fix for ruff 0.4.3 (pydata#9007) Port negative frequency fix for `pandas.date_range` to `cftime_range` (pydata#8999) Bump codecov/codecov-action from 4.3.0 to 4.3.1 in the actions group (pydata#9004) Speed up localize (pydata#8536) Simplify fast path (pydata#9001) Add argument check_dims to assert_allclose to allow transposed inputs (pydata#5733) (pydata#8991) Fix syntax error in test related to cupy (pydata#9000)
I am trying to compare two datasets, one of which has possibly transposed dimensions on a data variable.
What happened:
In my mind this should pass
but instead it fails
Simply transposing
ds2
to the same dimensions ofds1
fixes this (since the data is the same after all)Since most of the other xarray operations are 'transpose-safe' (
(ds1+ds2) = (ds1 + ds2.transpose('x','y')
), shouldnt this one be too?Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.109+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.19.0
pandas: 1.2.4
numpy: 1.20.2
scipy: 1.6.2
netCDF4: 1.5.6
pydap: installed
h5netcdf: 0.11.0
h5py: 3.2.1
Nio: None
zarr: 2.7.1
cftime: 1.4.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: 1.2.2
cfgrib: 0.9.9.0
iris: None
bottleneck: 1.3.2
dask: 2021.04.1
distributed: 2021.04.1
matplotlib: 3.4.1
cartopy: 0.19.0
seaborn: None
numbagg: None
pint: 0.17
setuptools: 49.6.0.post20210108
pip: 20.3.4
conda: None
pytest: None
IPython: 7.22.0
sphinx: None
The text was updated successfully, but these errors were encountered: