Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate squeeze in GroupBy. #8507

Merged
merged 22 commits into from
Jan 8, 2024
Merged

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Dec 2, 2023

@dcherian dcherian marked this pull request as draft December 2, 2023 00:39
Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1! Thanks!

xarray/core/groupby.py Outdated Show resolved Hide resolved
@dcherian dcherian force-pushed the depr-groupby-squeeze-2 branch from deec828 to c2e576e Compare December 2, 2023 03:56
xarray/tests/test_groupby.py Outdated Show resolved Hide resolved
@dcherian dcherian marked this pull request as ready for review December 2, 2023 04:36
* main: (26 commits)
  Filter null values before plotting (pydata#8535)
  Update concat.py (pydata#8538)
  Add getitem to array protocol (pydata#8406)
  Added option to specify weights in xr.corr() and xr.cov() (pydata#8527)
  Filter out doctest warning (pydata#8539)
  Bump actions/setup-python from 4 to 5 (pydata#8540)
  Point users to where in their code they should make mods for Dataset.dims (pydata#8534)
  Add Cumulative aggregation (pydata#8512)
  dev whats-new
  Whats-new for 2023.12.0 (pydata#8532)
  explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526)
  Add `eval` method to Dataset (pydata#7163)
  Deprecate ds.dims returning dict (pydata#8500)
  test and fix empty xindexes repr (pydata#8521)
  Remove PR labeler bot (pydata#8525)
  Hypothesis strategy for generating Variable objects (pydata#8404)
  Use numbagg for `rolling` methods (pydata#8493)
  Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 (pydata#8514)
  fix RTD docs build (pydata#8519)
  Fix type of `.assign_coords` (pydata#8495)
  ...
* main:
  Fix mypy type ignore (pydata#8564)
  Support for the new compression arguments. (pydata#7551)
  FIX: reverse index output of bottleneck move_argmax/move_argmin functions (pydata#8552)
  Adapt map_blocks to use new Coordinates API (pydata#8560)
  add xeofs to ecosystem.rst (pydata#8561)
  Offer a fixture for unifying DataArray & Dataset tests (pydata#8533)
  Generalize cumulative reduction (scan) to non-dask types (pydata#8019)
@dcherian
Copy link
Contributor Author

This unfortunately ended up being a lot more invasive, to minimize warnings raised.
Could use a second round of review.

@dcherian dcherian force-pushed the depr-groupby-squeeze-2 branch from c335941 to dd6ea53 Compare December 22, 2023 03:07
xarray/core/groupby.py Outdated Show resolved Hide resolved
xarray/core/groupby.py Outdated Show resolved Hide resolved
):
if squeeze in [None, True] and grouper.can_squeeze:
if isinstance(indices, slice):
if indices.stop - indices.start == 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that save?
What about None or negative Start-Stop Values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are not possible today:
group_indices: T_GroupIndices = [slice(i, i + 1) for i in range(size)] at Line 455

and at Line 553:

group_indices: T_GroupIndices = [
            slice(i, j) for i, j in zip(sbins[:-1], sbins[1:])
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no you're right, there's an edge case in TimeResamplerGrouper where stop is None, and we're resampling to the same frequency as the data so grouper.can_squeeze is True.

Copy link
Contributor Author

@dcherian dcherian Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no the squeezing code doesn't actually run. But i have now explicitly skipped squeezing for resampling by ensuring can_squeeze is False. This is the current behaviour on main

dcherian and others added 2 commits January 2, 2024 20:23
Co-authored-by: Michael Niklas  <mick.niklas@gmail.com>
@dcherian dcherian added plan to merge Final call for comments and removed needs review labels Jan 3, 2024
@dcherian dcherian merged commit b35f761 into pydata:main Jan 8, 2024
26 checks passed
@max-sixty
Copy link
Collaborator

Great work @dcherian !

@dcherian dcherian deleted the depr-groupby-squeeze-2 branch January 8, 2024 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Groupby reduction fails when all groups are of size 1 Surprising .groupby behavior with float index
3 participants