Skip to content
This repository has been archived by the owner on May 3, 2023. It is now read-only.

3.8 master #64

Merged
merged 6 commits into from
Oct 17, 2019
Merged

3.8 master #64

merged 6 commits into from
Oct 17, 2019

Conversation

TomAugspurger
Copy link
Contributor

xref #63

Closes #61

@TomAugspurger
Copy link
Contributor Author

The failure at https://travis-ci.org/MacPython/pandas-wheels/jobs/598133573#L5572 is vaguely familiar

_______________________________ test_apply[True] _______________________________
5573[gw0] linux -- Python 3.7.0 /venv/bin/python
5574
5575ordered = True
5576
5577    @pytest.mark.xfail(
5578        is_platform_windows() and PY37, reason="Flaky, GH-27902", strict=False
5579    )
5580    @pytest.mark.parametrize("ordered", [True, False])
5581    def test_apply(ordered):
5582        # GH 10138
5583    
5584        dense = Categorical(list("abc"), ordered=ordered)
5585    
5586        # 'b' is in the categories but not in the list
5587        missing = Categorical(list("aaa"), categories=["a", "b"], ordered=ordered)
5588        values = np.arange(len(dense))
5589        df = DataFrame({"missing": missing, "dense": dense, "values": values})
5590        grouped = df.groupby(["missing", "dense"], observed=True)
5591    
5592        # missing category 'b' should still exist in the output index
5593        idx = MultiIndex.from_arrays([missing, dense], names=["missing", "dense"])
5594        expected = DataFrame([0, 1, 2.0], index=idx, columns=["values"])
5595    
5596        result = grouped.apply(lambda x: np.mean(x))
5597>       assert_frame_equal(result, expected)
5598E       AssertionError: DataFrame are different
5599E       
5600E       DataFrame shape mismatch
5601E       [left]:  (3, 3)
5602E       [right]: (3, 1)
5603
5604/venv/lib/python3.7/site-packages/pandas/tests/groupby/test_categorical.py:233: AssertionError
5605______________________________ test_apply[False] _______________________________
5606[gw0] linux -- Python 3.7.0 /venv/bin/python
5607
5608ordered = False
5609
5610    @pytest.mark.xfail(
5611        is_platform_windows() and PY37, reason="Flaky, GH-27902", strict=False
5612    )
5613    @pytest.mark.parametrize("ordered", [True, False])
5614    def test_apply(ordered):
5615        # GH 10138
5616    
5617        dense = Categorical(list("abc"), ordered=ordered)
5618    
5619        # 'b' is in the categories but not in the list
5620        missing = Categorical(list("aaa"), categories=["a", "b"], ordered=ordered)
5621        values = np.arange(len(dense))
5622        df = DataFrame({"missing": missing, "dense": dense, "values": values})
5623        grouped = df.groupby(["missing", "dense"], observed=True)
5624    
5625        # missing category 'b' should still exist in the output index
5626        idx = MultiIndex.from_arrays([missing, dense], names=["missing", "dense"])
5627        expected = DataFrame([0, 1, 2.0], index=idx, columns=["values"])
5628    
5629        result = grouped.apply(lambda x: np.mean(x))
5630>       assert_frame_equal(result, expected)
5631E       AssertionError: DataFrame are different
5632E       
5633E       DataFrame shape mismatch
5634E       [left]:  (3, 3)
5635E       [right]: (3, 1)
5636
5637

Does anyone recall it (@jbrockmendel, @jreback)

@jbrockmendel
Copy link
Contributor

Looks like I removed the PY37 xfail in pandas-dev/pandas#27715 and Tom put it back for windows-only in pandas-dev/pandas#27956. xfailing this case seems reasonable for now.

@TomAugspurger
Copy link
Contributor Author

Interesting, thanks.

It may be that NumPy 1.14.x is the culprit, rather than windows vs. linux.

For now I'd recommend skipping it on all platforms though, and backporting to 0.25.x.

@jbrockmendel
Copy link
Contributor

Tracking this down I ended up at the same place you got here.

group = pd.DataFrame({"missing": "a", "dense": "a", "values": 0}, index=[0])
group["missing"] = group["missing"].astype("category")
group["dense"] = group["dense"].astype("category")

The relevant call is np.mean(group), which we expect to return Series(0., index=["values"]) but is instead returning Series([0., 1., 0.], index=["missing", "dense", "values"]).

np.mean itself dispatches to DataFrame.mean, which calls DataFrame._reduce, with op=nanops.nanmean. DataFrame._reduce has some try/except stuff that is not playing nicely with pdb, so I'm going to wrap it up here and make a PR to reinstate the xfail.

@TomAugspurger
Copy link
Contributor Author

Going to merge this just so we're ready to go with the 3.8 wheels.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wheels for 3.8.0rc1
2 participants