Allow no padding for rolling windows #5603

kmsquire · 2021-07-14T19:04:55Z

Fixes Strided rolling.construct results in unexpected behavior #4743
Tests added
Passes pre-commit run --all-files
User visible changes (including notable bug fixes) are documented in whats-new.rst

Related to #2007.

github-actions · 2021-07-14T19:38:11Z

Unit Test Results

        6 files         6 suites 1h 9m 48s ⏱️
17 081 tests 15 238 ✔️ 1 735 💤 108 ❌
95 682 runs 86 859 ✔️ 8 175 💤 648 ❌

For more details on these failures, see this check.

Results for commit 5a2eadc.

♻️ This comment has been updated with latest results.

kmsquire · 2021-07-14T21:29:10Z

FWIW, the test failures are all the same error, which should be fixed, but should have nothing to do with this PR.

dcherian · 2021-07-16T16:59:19Z

Thanks @kmsquire I'll try and review this soon.

pep8speaks · 2021-07-16T20:20:07Z

Hello @kmsquire! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-08-11 16:38:29 UTC

kmsquire · 2021-07-16T20:25:36Z

@dcherian Thanks. I thought I was done, but I'm finding things that aren't working as expected. I'm working on adding more tests for expected behavior, so marking as WIP. Will ping back here when I'm finished (maybe today, maybe early next week).

* Adds a `pad: bool` parameter to DataArray.rolling / Dataset.rolling

Also changes the object returned during iteration, so that it is the same (modulo transposition) as the view created during rolling_window.construct() along the rolling axis.

kmsquire · 2021-07-16T23:41:55Z

@dcherian Okay, I think this is in good shape. I added some more tests, and fixed a few more bugs. Most of the fixes have been squashed back down to the original commit.

I left the second commit separate for now because it's breaking. Previously, iterating over a rolling window returned only returned blocks present in the original array, and ignored the chunk of nans before the start or after the end of the array. This meant that the view of the DataArray or Dataset that was returned for each iteration was potentially a different size.

Here, instead, the iterator was changed so that each returned view matches the corresponding slice of the output of the construct() function. To me, this seems more intuitive, and makes it easier to develop algorithms without having to call construct.

Other than the fact that it's breaking, the main drawback (and difference with construct()) is that the windows themselves are labeled with coordinates along the rolling axis, except for any nan values before the start or after the end of the actual data, which do not have labels. For comparison, in construct, the window dimension does not have any coordinates associated with it by default.

After writing this, I'm wondering if it might be useful to simply drop the coordinates along the rolling axis, so that the behavior matches the behavior of construct() even more closely.

I'm open to thoughts/comments/criticisms/suggestions/questions/whatever.

kmsquire · 2021-07-17T00:16:18Z

Also, FWIW, the test failure was caused by a problem in zarr/fsspec (fsspec/filesystem_spec#707), which is fixed in master on fsspec (fsspec/filesystem_spec#710).

So it should be fixed here whenever fsspec makes a release.

dcherian

I did an initial review and have a few questions. I haven't reviewed __iter__ yet.

Thanks for doing this @kmsquire it turned out to be harder than expected!

xarray/core/rolling.py

xarray/core/variable.py

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

kmsquire · 2021-07-19T21:02:27Z

@dcherian Thank you for reviewing. I've started working through your comments.

* Generalize type of dim (str -> Hashable) * Rename to_array -> to_list * Make error message clearer (only print unexpected inputs) * Allow lists of length one as inputs * Add tests

kmsquire · 2021-07-20T22:48:21Z

The code also introduces a problem, the following no longer works:
xr.DataArray([1, 2, 3], coords=dict(x=[1, 1, 1])).rolling(x=3).mean()
thus you may have to use isel instead of sel if possible. Therefore I was also unable to test the following (from #2007 (comment)):
monthly.pad(month=n_months, mode="wrap").rolling(center=True, month=n_months, pad=False).mean(skipna=False)

Fixed and added a test for this. The example at the bottom now works.

kmsquire · 2021-07-20T23:08:26Z

Mentioned in one of the comments above, but I think I've reached about the amount of time that I can spend on this right now. If there are other minor changes, please do let me know. I can also back out the breaking change if desired (although that will probably take some commit surgery).

dcherian · 2021-07-21T16:12:52Z

Thanks @kmsquire we can take it from here.

There was some discussion about syntax at the dev meeting today. There were multiple votes in favour of allowing full control of padding in the rolling object itself. So we could have

.rolling(time=5, x=3, pad={"x": {"mode": "wrap"}, "time": False}), OR
.rolling(time=5, x=3).pad({"x": {"mode": "wrap"}, "time": False})

@pydata/xarray thoughts?

kmsquire · 2021-07-22T17:18:58Z

There was some discussion about syntax at the dev meeting today. There were multiple votes in favour of allowing full control of padding in the rolling object itself. So we could have

.rolling(time=5, x=3, pad={"x": {"mode": "wrap"}, "time": False}), OR

.rolling(time=5, x=3).pad({"x": {"mode": "wrap"}, "time": False})

FWIW, I think 1 would be more efficient (but perhaps harder to implement). With this one, the DataArrayRolling object is only created once, with all of the required information.

If 2 were implemented, what should the return value of the .pad() call be? Would it call construct on the rolling object and return a DataArray? Or would it return an updated DataArrayRolling object? Or would it be a new type entirely (e.g., DataArrayPaddedRolling)?

dcherian · 2021-07-22T17:23:15Z

Your points about (2) being hard to interpret are a good reason to go with (1)! I was thinking it would return a DatasetRolling object but that's not the only interpretation as you point out.

* upstream/main: (31 commits) Refactor index vs. coordinate variable(s) (pydata#5636) pre-commit: autoupdate hook versions (pydata#5685) Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670) Speed up _mapping_repr (pydata#5661) update the link to `scipy`'s intersphinx file (pydata#5665) Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663) pre-commit: autoupdate hook versions (pydata#5660) fix the binder environment (pydata#5650) Update api.rst (pydata#5639) Kwargs to rasterio open (pydata#5609) Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633) new blank whats-new for v0.19.1 v0.19.0 release notes (pydata#5632) remove deprecations scheduled for 0.19 (pydata#5630) Make typing-extensions optional (pydata#5624) Plots get labels from pint arrays (pydata#5561) Add to_numpy() and as_numpy() methods (pydata#5568) pin fsspec (pydata#5627) pre-commit: autoupdate hook versions (pydata#5617) Add dataarray scatter with 3d support (pydata#4909) ...

Failing for center=True+pad=True; and maybe all pad=False?

dcherian · 2021-08-11T16:26:59Z

xarray/tests/test_dataarray.py

 @pytest.mark.parametrize("min_periods", (1, None))
 @pytest.mark.parametrize("backend", ["numpy"], indirect=True)
-def test_rolling_wrapped_bottleneck(da, name, center, min_periods):
-    bn = pytest.importorskip("bottleneck", minversion="1.1")
+@pytest.mark.parametrize("center", (True, False, None))


failing for center=True or pad=False (I think).

I think there's an issue with constructing expected: it doesn't seem like get_expected_rolling_indices is correct for all cases.

@cmdupuis3 I fixed the merge conflicts but there's still failing tests. We'd appreciate any help you can give here.

* upstream/main: (716 commits) Faq pull request (According to pull request pydata#7604 & issue pydata#1285 (pydata#7638) add timeouts for tests (pydata#7657) Pull Request Labeler - Undo workaround sync-labels bug (pydata#7667) [pre-commit.ci] pre-commit autoupdate (pydata#7651) Allow all integer dtypes in `polyval` (pydata#7619) [skip-ci] dev whats-new (pydata#7660) Redo whats-new for 2023.03.0 (pydata#7659) Set copy=False when calling pd.Series (pydata#7642) Pin pandas < 2 (pydata#7650) Whats-new for release 2023.03.0 (pydata#7643) Bump pypa/gh-action-pypi-publish from 1.7.1 to 1.8.1 (pydata#7648) Use more descriptive link texts (pydata#7625) Fix missing 'dim' argument in _get_nan_block_lengths (pydata#7598) Fix `pcolormesh` with str coords (pydata#7612) [skip-ci] Fix groupby binary ops benchmarks (pydata#7603) Remove incomplete sentence in IO docs (pydata#7631) Allow indexing unindexed dimensions using dask arrays (pydata#5873) Bump pypa/gh-action-pypi-publish from 1.6.4 to 1.7.1 (pydata#7618) [pre-commit.ci] pre-commit autoupdate (pydata#7620) add a test for scatter colorbar extend (pydata#7616) ...

* upstream/main: Save groupby codes after factorizing, pass to flox (pydata#7206) [skip-ci] Add compute to groupby benchmarks (pydata#7690) Delete built-in cfgrib backend (pydata#7670) Added a pronunciation guide to the word Xarray in the README.MD fil… (pydata#7677) boundarynorm fix (pydata#7553) Fix lazy negative slice rewriting. (pydata#7586) [pre-commit.ci] pre-commit autoupdate (pydata#7687) Adjust sidebar font colors (pydata#7674) Bump pypa/gh-action-pypi-publish from 1.8.1 to 1.8.3 (pydata#7682) Raise PermissionError when insufficient permissions (pydata#7629)

cmdupuis3 · 2024-03-15T16:58:44Z

Hey all, has there been any movement on this PR, or maybe elsewhere? It seems like a nice feature to me.

* upstream/main: (765 commits) increase typing annotations coverage in `xarray/core/indexing.py` (pydata#8857) pandas 3 MultiIndex fixes (pydata#8847) FIX: adapt handling of copy keyword argument in scipy backend for numpy >= 2.0dev (pydata#8851) FIX: do not cast _FillValue/missing_value in CFMaskCoder if _Unsigned is provided (pydata#8852) Implement setitem syntax for `.oindex` and `.vindex` properties (pydata#8845) Support pandas copy-on-write behaviour (pydata#8846) correctly encode/decode _FillValues/missing_values/dtypes for packed data (pydata#8713) Expand use of `.oindex` and `.vindex` (pydata#8790) Return a dataclass from Grouper.factorize (pydata#8777) [skip-ci] Fix upstream-dev env (pydata#8839) Add dask-expr for windows envs (pydata#8837) [skip-ci] Add dask-expr dependency to doc.yml (pydata#8835) Add `dask-expr` to environment-3.12.yml (pydata#8827) Make list_chunkmanagers more resilient to broken entrypoints (pydata#8736) Do not attempt to broadcast when global option ``arithmetic_broadcast=False`` (pydata#8784) try to get the `upstream-dev` CI to complete again (pydata#8823) Bump the actions group with 1 update (pydata#8818) Update documentation for clarity (pydata#8817) DOC: link to zarr.convenience.consolidate_metadata (pydata#8816) Refactor Grouper objects (pydata#8776) ...

kmsquire mentioned this pull request Jul 14, 2021

rolling: allow control over padding #2007

Open

kmsquire force-pushed the feature/rolling-pad branch from c3a75a0 to d58c7e8 Compare July 14, 2021 19:12

dcherian requested a review from mathause July 16, 2021 16:58

dcherian added the needs review label Jul 16, 2021

kmsquire changed the title ~~Allow no padding for rolling windows~~ WIP: Allow no padding for rolling windows Jul 16, 2021

kmsquire force-pushed the feature/rolling-pad branch from e4277e0 to 84b37d6 Compare July 16, 2021 23:19

kmsquire added 2 commits July 16, 2021 16:20

Allow padding to be turned off for rolling windows

039a9bf

* Adds a `pad: bool` parameter to DataArray.rolling / Dataset.rolling

Fix iteration on rolling windows to handle no padding

90228b0

Also changes the object returned during iteration, so that it is the same (modulo transposition) as the view created during rolling_window.construct() along the rolling axis.

kmsquire force-pushed the feature/rolling-pad branch from 84b37d6 to 587e16a Compare July 16, 2021 23:21

kmsquire changed the title ~~WIP: Allow no padding for rolling windows~~ Allow no padding for rolling windows Jul 16, 2021

kmsquire force-pushed the feature/rolling-pad branch 2 times, most recently from 9862d69 to 7d24702 Compare July 16, 2021 23:28

Add tests for rolling(..., pad=False)

c43db78

kmsquire force-pushed the feature/rolling-pad branch from 7d24702 to c43db78 Compare July 16, 2021 23:29

dcherian reviewed Jul 17, 2021

View reviewed changes

kmsquire and others added 2 commits July 19, 2021 13:49

Update xarray/core/rolling.py

30a0b47

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

Update xarray/core/rolling.py

0314d2f

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

kmsquire added 3 commits July 19, 2021 15:48

Rename _get_rolling_dim_coords -> _get_output_coords

18562b8

Add comment better explaining _get_output_coords

90ae11d

Simplify _get_output_coords

33ccbf6

kmsquire force-pushed the feature/rolling-pad branch from ca6a73d to 4371a31 Compare July 20, 2021 05:51

Remove utils.get_slice_offsets (unused)

5df7b12

Update expand_args_to_dims

cdadc60

* Generalize type of dim (str -> Hashable) * Rename to_array -> to_list * Make error message clearer (only print unexpected inputs) * Allow lists of length one as inputs * Add tests

kmsquire force-pushed the feature/rolling-pad branch from ec9a925 to cdadc60 Compare July 20, 2021 22:38

kmsquire added 2 commits July 20, 2021 15:41

Update get_pads() types, comments

a2b235c

Use "length" instead of "count" to refer to the length of a coordinate

b98ab3a

Update type of _get_output_dim_selector

b36a049

dcherian added 5 commits August 10, 2021 19:49

fixes.

923a598

Add whats-new

fea9baf

minor test cleanup

2084c08

consolidate bottleneck dataarray test

c29d1fb

Failing for center=True+pad=True; and maybe all pad=False?

dcherian reviewed Aug 11, 2021

View reviewed changes

dcherian added 2 commits August 11, 2021 10:32

consolidate bottleneck dataset tests

b509ebf

more test cleanup

5a2eadc

dcherian removed the needs review label Oct 29, 2021

headtr1ck added needs discussion topic-rolling labels Oct 12, 2022

dcherian added 3 commits March 27, 2023 12:11

merge tests

b963b29

dcherian added needs review and removed needs discussion labels Mar 29, 2023

dcherian added the contrib-help-wanted label Mar 21, 2024

dcherian mentioned this pull request Sep 26, 2024

rolling(...).construct(...) blows up chunk size #9550

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow no padding for rolling windows #5603

Allow no padding for rolling windows #5603

kmsquire commented Jul 14, 2021 •

edited

Loading

github-actions bot commented Jul 14, 2021 •

edited

Loading

kmsquire commented Jul 14, 2021

dcherian commented Jul 16, 2021

pep8speaks commented Jul 16, 2021 •

edited

Loading

kmsquire commented Jul 16, 2021

kmsquire commented Jul 16, 2021

kmsquire commented Jul 17, 2021

dcherian left a comment

kmsquire commented Jul 19, 2021

kmsquire commented Jul 20, 2021

kmsquire commented Jul 20, 2021

dcherian commented Jul 21, 2021

kmsquire commented Jul 22, 2021

dcherian commented Jul 22, 2021 •

edited

Loading

dcherian Aug 11, 2021

dcherian Mar 21, 2024

cmdupuis3 commented Mar 15, 2024

Allow no padding for rolling windows #5603

Are you sure you want to change the base?

Allow no padding for rolling windows #5603

Conversation

kmsquire commented Jul 14, 2021 • edited Loading

github-actions bot commented Jul 14, 2021 • edited Loading

Unit Test Results

kmsquire commented Jul 14, 2021

dcherian commented Jul 16, 2021

pep8speaks commented Jul 16, 2021 • edited Loading

Comment last updated at 2021-08-11 16:38:29 UTC

kmsquire commented Jul 16, 2021

kmsquire commented Jul 16, 2021

kmsquire commented Jul 17, 2021

dcherian left a comment

Choose a reason for hiding this comment

kmsquire commented Jul 19, 2021

kmsquire commented Jul 20, 2021

kmsquire commented Jul 20, 2021

dcherian commented Jul 21, 2021

kmsquire commented Jul 22, 2021

dcherian commented Jul 22, 2021 • edited Loading

dcherian Aug 11, 2021

Choose a reason for hiding this comment

dcherian Mar 21, 2024

Choose a reason for hiding this comment

cmdupuis3 commented Mar 15, 2024

kmsquire commented Jul 14, 2021 •

edited

Loading

github-actions bot commented Jul 14, 2021 •

edited

Loading

pep8speaks commented Jul 16, 2021 •

edited

Loading

dcherian commented Jul 22, 2021 •

edited

Loading