Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix recombination in groupby when changing size along the grouped dimension #3807

Merged
merged 6 commits into from
Mar 17, 2020

Conversation

ej81
Copy link
Contributor

@ej81 ej81 commented Feb 28, 2020

When a a Dataset or DataArray is recombined after a groupby operation, xarray tries to restore the original coordinate. This causes problems if the group operation changes the size of the groups along the grouped dimension (e.g. some filtering operation that removes entries).

Restoring the original coordinate appears to be necessary only if it was squeezed from the groups, otherwise it is already restored by concatenation of the groups. This fix:

  1. avoids reordering if the size of the dimension has changed;
  2. avoids restoring a coordinate that has not been squeezed out.
  • Tests added
  • Passes isort -rc . && black . && mypy . && flake8
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks foot the clear tests.

I’ll let others comment as they may know the code better, otherwise will let’s tomorrow

@max-sixty
Copy link
Collaborator

(Also you can merge master for the tests to pass IYL)

@ej81 ej81 force-pushed the fix-groupby-change-size branch from 08fbb41 to 394c46e Compare February 28, 2020 17:56
dcherian and others added 4 commits March 15, 2020 03:02
…e-size

* upstream/master: (24 commits)
  Fix alignment with join="override" when some dims are unindexed (pydata#3839)
  Fix CFTimeIndex-related errors stemming from updates in pandas (pydata#3764)
  Doctests fixes (pydata#3846)
  add xpublish to related projects (pydata#3850)
  update installation instruction (pydata#3849)
  Pint support for top-level functions (pydata#3611)
  un-xfail tests that append to netCDF files with scipy (pydata#3805)
  remove panel conversion (pydata#3845)
  Add nxarray to related-projects.rst (pydata#3848)
  Implement skipna kwarg in xr.quantile (pydata#3844)
  Allow `where` to receive a callable (pydata#3827)
  update macos image (pydata#3838)
  Label "Installed Versions" item in Issue template (pydata#3832)
  Add note on diff's n differing from pandas (pydata#3822)
  DOC: Add rioxarray and other external examples (pydata#3757)
  Use stable RTD image.
  removed mention that 'dims' are inferred from 'coords'-dict when omit… (pydata#3821)
  Coarsen keep attrs 3376 (pydata#3801)
  Turn on html repr by default (pydata#3812)
  Fix zarr append with groups (pydata#3610)
  ...
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ej81 . This looks great. I merged master and consolidated the test code a tiny bit. I'll merge when the tests pass.

I see that this is your first PR. Welcome to xarray!

@dcherian dcherian mentioned this pull request Mar 16, 2020
@dcherian dcherian closed this Mar 17, 2020
@dcherian dcherian reopened this Mar 17, 2020
@dcherian dcherian merged commit 65a5bff into pydata:master Mar 17, 2020
@max-sixty
Copy link
Collaborator

Thanks @ej81 !

dcherian added a commit to dcherian/xarray that referenced this pull request Mar 22, 2020
* upstream/master:
  add spacing in the versions section of the issue report (pydata#3876)
  map_blocks: allow user function to add new unindexed dimension. (pydata#3817)
  Delete associated indexes when deleting coordinate variables. (pydata#3840)
  remove macos build while waiting for libwebp fix (pydata#3875)
  Fix html repr on non-str keys (pydata#3870)
  Allow ellipsis to be used in stack (pydata#3826)
  Improve where docstring (pydata#3836)
  Add DataArray.pad, Dataset.pad, Variable.pad (pydata#3596)
  Fix some warnings (pydata#3864)
  Feature/weighted (pydata#2922)
  Fix recombination in groupby when changing size along the grouped dimension (pydata#3807)
  Blacken the doctest code in docstrings (pydata#3857)
  Fix multi-index with categorical values. (pydata#3860)
  Fix interp bug when indexer shares coordinates with array (pydata#3758)
  Fix alignment with join="override" when some dims are unindexed (pydata#3839)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants