-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow no padding for rolling windows #5603
base: main
Are you sure you want to change the base?
Conversation
c3a75a0
to
d58c7e8
Compare
Unit Test Results 6 files 6 suites 1h 9m 48s ⏱️ For more details on these failures, see this check. Results for commit 5a2eadc. ♻️ This comment has been updated with latest results. |
FWIW, the test failures are all the same error, which should be fixed, but should have nothing to do with this PR. |
Thanks @kmsquire I'll try and review this soon. |
@dcherian Thanks. I thought I was done, but I'm finding things that aren't working as expected. I'm working on adding more tests for expected behavior, so marking as WIP. Will ping back here when I'm finished (maybe today, maybe early next week). |
e4277e0
to
84b37d6
Compare
* Adds a `pad: bool` parameter to DataArray.rolling / Dataset.rolling
Also changes the object returned during iteration, so that it is the same (modulo transposition) as the view created during rolling_window.construct() along the rolling axis.
84b37d6
to
587e16a
Compare
9862d69
to
7d24702
Compare
7d24702
to
c43db78
Compare
@dcherian Okay, I think this is in good shape. I added some more tests, and fixed a few more bugs. Most of the fixes have been squashed back down to the original commit. I left the second commit separate for now because it's breaking. Previously, iterating over a rolling window returned only returned blocks present in the original array, and ignored the chunk of Here, instead, the iterator was changed so that each returned view matches the corresponding slice of the output of the Other than the fact that it's breaking, the main drawback (and difference with After writing this, I'm wondering if it might be useful to simply drop the coordinates along the rolling axis, so that the behavior matches the behavior of I'm open to thoughts/comments/criticisms/suggestions/questions/whatever. |
Also, FWIW, the test failure was caused by a problem in zarr/fsspec (fsspec/filesystem_spec#707), which is fixed in master on fsspec (fsspec/filesystem_spec#710). So it should be fixed here whenever fsspec makes a release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did an initial review and have a few questions. I haven't reviewed __iter__
yet.
Thanks for doing this @kmsquire it turned out to be harder than expected!
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
@dcherian Thank you for reviewing. I've started working through your comments. |
ca6a73d
to
4371a31
Compare
* Generalize type of dim (str -> Hashable) * Rename to_array -> to_list * Make error message clearer (only print unexpected inputs) * Allow lists of length one as inputs * Add tests
ec9a925
to
cdadc60
Compare
Fixed and added a test for this. The example at the bottom now works. |
Mentioned in one of the comments above, but I think I've reached about the amount of time that I can spend on this right now. If there are other minor changes, please do let me know. I can also back out the breaking change if desired (although that will probably take some commit surgery). |
Thanks @kmsquire we can take it from here. There was some discussion about syntax at the dev meeting today. There were multiple votes in favour of allowing full control of padding in the rolling object itself. So we could have
@pydata/xarray thoughts? |
FWIW, I think 1 would be more efficient (but perhaps harder to implement). With this one, the If 2 were implemented, what should the return value of the |
Your points about (2) being hard to interpret are a good reason to go with (1)! I was thinking it would return a DatasetRolling object but that's not the only interpretation as you point out. |
* upstream/main: (31 commits) Refactor index vs. coordinate variable(s) (pydata#5636) pre-commit: autoupdate hook versions (pydata#5685) Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670) Speed up _mapping_repr (pydata#5661) update the link to `scipy`'s intersphinx file (pydata#5665) Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663) pre-commit: autoupdate hook versions (pydata#5660) fix the binder environment (pydata#5650) Update api.rst (pydata#5639) Kwargs to rasterio open (pydata#5609) Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633) new blank whats-new for v0.19.1 v0.19.0 release notes (pydata#5632) remove deprecations scheduled for 0.19 (pydata#5630) Make typing-extensions optional (pydata#5624) Plots get labels from pint arrays (pydata#5561) Add to_numpy() and as_numpy() methods (pydata#5568) pin fsspec (pydata#5627) pre-commit: autoupdate hook versions (pydata#5617) Add dataarray scatter with 3d support (pydata#4909) ...
Failing for center=True+pad=True; and maybe all pad=False?
xarray/tests/test_dataarray.py
Outdated
@pytest.mark.parametrize("min_periods", (1, None)) | ||
@pytest.mark.parametrize("backend", ["numpy"], indirect=True) | ||
def test_rolling_wrapped_bottleneck(da, name, center, min_periods): | ||
bn = pytest.importorskip("bottleneck", minversion="1.1") | ||
@pytest.mark.parametrize("center", (True, False, None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failing for center=True
or pad=False
(I think).
I think there's an issue with constructing expected
: it doesn't seem like get_expected_rolling_indices
is correct for all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmdupuis3 I fixed the merge conflicts but there's still failing tests. We'd appreciate any help you can give here.
* upstream/main: (716 commits) Faq pull request (According to pull request pydata#7604 & issue pydata#1285 (pydata#7638) add timeouts for tests (pydata#7657) Pull Request Labeler - Undo workaround sync-labels bug (pydata#7667) [pre-commit.ci] pre-commit autoupdate (pydata#7651) Allow all integer dtypes in `polyval` (pydata#7619) [skip-ci] dev whats-new (pydata#7660) Redo whats-new for 2023.03.0 (pydata#7659) Set copy=False when calling pd.Series (pydata#7642) Pin pandas < 2 (pydata#7650) Whats-new for release 2023.03.0 (pydata#7643) Bump pypa/gh-action-pypi-publish from 1.7.1 to 1.8.1 (pydata#7648) Use more descriptive link texts (pydata#7625) Fix missing 'dim' argument in _get_nan_block_lengths (pydata#7598) Fix `pcolormesh` with str coords (pydata#7612) [skip-ci] Fix groupby binary ops benchmarks (pydata#7603) Remove incomplete sentence in IO docs (pydata#7631) Allow indexing unindexed dimensions using dask arrays (pydata#5873) Bump pypa/gh-action-pypi-publish from 1.6.4 to 1.7.1 (pydata#7618) [pre-commit.ci] pre-commit autoupdate (pydata#7620) add a test for scatter colorbar extend (pydata#7616) ...
* upstream/main: Save groupby codes after factorizing, pass to flox (pydata#7206) [skip-ci] Add compute to groupby benchmarks (pydata#7690) Delete built-in cfgrib backend (pydata#7670) Added a pronunciation guide to the word Xarray in the README.MD fil… (pydata#7677) boundarynorm fix (pydata#7553) Fix lazy negative slice rewriting. (pydata#7586) [pre-commit.ci] pre-commit autoupdate (pydata#7687) Adjust sidebar font colors (pydata#7674) Bump pypa/gh-action-pypi-publish from 1.8.1 to 1.8.3 (pydata#7682) Raise PermissionError when insufficient permissions (pydata#7629)
Hey all, has there been any movement on this PR, or maybe elsewhere? It seems like a nice feature to me. |
* upstream/main: (765 commits) increase typing annotations coverage in `xarray/core/indexing.py` (pydata#8857) pandas 3 MultiIndex fixes (pydata#8847) FIX: adapt handling of copy keyword argument in scipy backend for numpy >= 2.0dev (pydata#8851) FIX: do not cast _FillValue/missing_value in CFMaskCoder if _Unsigned is provided (pydata#8852) Implement setitem syntax for `.oindex` and `.vindex` properties (pydata#8845) Support pandas copy-on-write behaviour (pydata#8846) correctly encode/decode _FillValues/missing_values/dtypes for packed data (pydata#8713) Expand use of `.oindex` and `.vindex` (pydata#8790) Return a dataclass from Grouper.factorize (pydata#8777) [skip-ci] Fix upstream-dev env (pydata#8839) Add dask-expr for windows envs (pydata#8837) [skip-ci] Add dask-expr dependency to doc.yml (pydata#8835) Add `dask-expr` to environment-3.12.yml (pydata#8827) Make list_chunkmanagers more resilient to broken entrypoints (pydata#8736) Do not attempt to broadcast when global option ``arithmetic_broadcast=False`` (pydata#8784) try to get the `upstream-dev` CI to complete again (pydata#8823) Bump the actions group with 1 update (pydata#8818) Update documentation for clarity (pydata#8817) DOC: link to zarr.convenience.consolidate_metadata (pydata#8816) Refactor Grouper objects (pydata#8776) ...
pre-commit run --all-files
whats-new.rst
Related to #2007.