Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix groupby binary ops when grouped array is subset relative to other #7798

Merged
merged 4 commits into from
May 2, 2023

Conversation

dcherian
Copy link
Contributor

cc @slevang

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the broken test was actually wrong before, now the actual result looks more like what I would expect.

xarray/tests/test_groupby.py Outdated Show resolved Hide resolved
@slevang
Copy link
Contributor

slevang commented Apr 30, 2023

Thanks for the quick fix! Not sure about the bitshift test but I'm assuming @headtr1ck is right.

@abrammer
Copy link
Contributor

abrammer commented Apr 30, 2023

Apologies, that's my bad. Looks like I introduced a broken test and didn't manually double check the results coming back. The right shift test should have been:

    right_expected = Dataset(
        {
            "x": ("index", [0, 0, 2, 2]),
            "y": ("index", [-1, -1, -2, -2]),
            "level": ("index", [0, 0, 4, 4]),
            "index": [0, 1, 2, 3],
        }
    )

    right_actual = (left_expected.groupby("level") >> shift).reset_coords(names="level")
    assert_equal(right_expected, right_actual)

I haven't paid attention to this issue, but doing the groupby manually didn't have the bug fwiw.

Probably overkill test that only fails at the last assert before this fix

def test_groupby_math_bitshift() -> None:
    # create new dataset of int's only
    ds = Dataset(
        {
            "x": ("index", np.ones(4, dtype=int)),
            "y": ("index", np.ones(4, dtype=int) * -1),
            "level": ("index", [1, 1, 2, 2]),
            "index": [0, 1, 2, 3],
        }
    )
    shift = DataArray([1, 2, 1], [("level", [1, 2, 8])])

    left_expected = Dataset(
        {
            "x": ("index", [2, 2, 4, 4]),
            "y": ("index", [-2, -2, -4, -4]),
            "level": ("index", [2, 2, 8, 8]),
            "index": [0, 1, 2, 3],
        }
    )

    left_manual = []
    for lev, group in ds.groupby("level"):
        shifter = shift.sel(level=lev)
        left_manual.append(group << shifter)
    left_actual = xr.concat(left_manual, dim="index").reset_coords(names="level")
    assert_equal(left_expected, left_actual)

    left_actual = (ds.groupby("level") << shift).reset_coords(names="level")
    assert_equal(left_expected, left_actual)

    right_expected = Dataset(
        {
            "x": ("index", [0, 0, 2, 2]),
            "y": ("index", [-1, -1, -2, -2]),
            "level": ("index", [0, 0, 4, 4]),
            "index": [0, 1, 2, 3],
        }
    )
    right_manual = []
    for lev, group in left_expected.groupby("level"):
        shifter = shift.sel(level=lev)
        right_manual.append(group >> shifter)
    right_actual = xr.concat(right_manual, dim="index").reset_coords(names="level")
    assert_equal(right_expected, right_actual)

    right_actual = (left_expected.groupby("level") >> shift).reset_coords(names="level")
    assert_equal(right_expected, right_actual)

dcherian and others added 2 commits April 30, 2023 16:42
Co-authored-by: Alan Brammer <a1brammer@gmail.com>
Co-authored-by: Mick <mick.niklas@gmail.com>
@dcherian dcherian added the plan to merge Final call for comments label May 1, 2023
doc/whats-new.rst Outdated Show resolved Hide resolved
@dcherian dcherian merged commit ca84a1e into pydata:main May 2, 2023
@dcherian dcherian deleted the fix-groupby-binary-op branch May 2, 2023 14:48
@slevang
Copy link
Contributor

slevang commented May 3, 2023

Would it be possible to run another bug fix release with this incorporated? Or I guess we're already on to 2023.5.0 according to the date.

dcherian added a commit to dcherian/xarray that referenced this pull request May 6, 2023
* main:
  Introduce Grouper objects internally (pydata#7561)
  [skip-ci] Add cftime groupby, resample benchmarks (pydata#7795)
  Fix groupby binary ops when grouped array is subset relative to other (pydata#7798)
  adjust the deprecation policy for python (pydata#7793)
  [pre-commit.ci] pre-commit autoupdate (pydata#7803)
  Allow the label run-upstream to run upstream CI (pydata#7787)
  Update asv links in contributing guide (pydata#7801)
  Implement DataArray.to_dask_dataframe() (pydata#7635)
  `ds.to_dict` with data as arrays, not lists (pydata#7739)
  Add lshift and rshift operators (pydata#7741)
  Use canonical name for set_horizonalalignment over alias set_ha (pydata#7786)
  Remove pandas<2 pin (pydata#7785)
  [pre-commit.ci] pre-commit autoupdate (pydata#7783)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More groupby indexing problems
4 participants