TST/CLN: correctly skip in indexes/common; add test for duplicated #21902

h-vetinari · 2018-07-13T21:20:41Z

Splitting up #21645

Added tests for duplicated
Following ENH: add return_inverse to duplicated for DataFrame/Series/Index/MultiIndex #21645 (comment), turned several blank return statements (which falsely pass the test) into pytest.skip.

codecov · 2018-07-14T02:14:08Z

Codecov Report

Merging #21902 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #21902   +/-   ##
=======================================
  Coverage   92.07%   92.07%           
=======================================
  Files         169      169           
  Lines       50684    50684           
=======================================
  Hits        46668    46668           
  Misses       4016     4016

Flag	Coverage Δ
#multiple	`90.48% <ø> (ø)`	⬆️
#single	`42.34% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3bcc2bb...f9c9aab. Read the comment docs.

jreback · 2018-07-14T14:16:49Z

pandas/tests/indexes/common.py

        idx = self._holder([indices[0]] * 5)
        assert not idx.is_unique
        assert idx.has_duplicates

+    @pytest.mark.parametrize('keep', ['first', 'last', False])


is this a duplicate these (pun intended) or are not testing indices duplicated currently?

@jreback .duplicated is hardly ever tested directly, only indirectly for stuff like .drop_duplicates. Regardless of the changes to .duplicated in #21645, I think duplicated should be tested separately.

so these are not relevant?

pandas/tests/indexes/common.py: def test_duplicates(self, indices): pandas/tests/indexes/test_category.py: def test_duplicates(self): pandas/tests/indexes/test_range.py: def test_duplicates(self):

@jreback No, test_duplicates (at least for pandas/tests/indexes/common.py, which is what this PR is about) tests .is_unique and .has_duplicates, but not the .duplicated-method itself.

can you point to the coverage that shows this is NOT tested?

@jreback I never said that it is not tested, just that it is only tested implicitly. Any call to .drop_duplicates will invoke duplicated, so obviously the coverage works out.

jreback · 2018-07-16T10:58:56Z

pandas/tests/indexes/common.py

        idx = self._holder([indices[0]] * 5)
        assert not idx.is_unique
        assert idx.has_duplicates

+    @pytest.mark.parametrize('keep', ['first', 'last', False])
+    def test_duplicated(self, indices, keep):
+        if type(indices) is not self._holder:


can you use isinstance here

@jreback isinstance of what? I copied the checks from other tests, because I didn't know all the different cases that flow into common.py.

isinstance of self._holder, is this what we do elsewhere?

not in indexes/common.py, as far as I can see. There's several isinstance of course, but never for self._holder, which can apparently be None as well.

@jreback If I change

if type(indices) is not self._holder:

to

if not isinstance(indices, self._holder):

I get 9 failures (instead of skips), all of which are from

tests/indexes/test_numeric.TestInt64Index, when run against DatetimeIndex, PeriodIndex or TimedeltaIndex, mainly because of TypeError: Unsafe NumPy casting, you must explicitly cast it seems.

@jreback Seem to have found something: in my normal workbook, I get

pd.__version__ # '0.24.0.dev0+321.g0fe6ded52.dirty' isinstance(pd.PeriodIndex, pd.Int64Index) # False

but for some reason, within tests/indexes/common.py, this is the opposite:

[...output of test run...] @pytest.mark.parametrize('keep', ['first', 'last', False]) def test_duplicated(self, indices, keep): if not isinstance(indices, self._holder): pytest.skip('Can only check if we know the index type') if not len(indices) or isinstance(indices, (MultiIndex)): # MultiIndex tested separately in: # tests/indexes/multi/test_unique_and_duplicates pytest.skip('Skip check for empty Index and MultiIndex') if isinstance(indices, (PeriodIndex, DatetimeIndex)): # this branch should be impossible for Int64Index # after the instance-check above! > raise ValueError(f'{type(indices).__name__}, {self._holder.__name__}, ' f'{isinstance(indices, self._holder)}, {type(indices) is self._holder}') E ValueError: PeriodIndex, Int64Index, True, False

Same happens for DatetimeIndex. It does work for the original is not variant, so I'm leaving that as it is for now.

Opened a follow-up: #22211

Nevermind, I had gotten confused between instances and classes.

So DatetimeIndex, PeriodIndex and TimedeltaIndex are subclasses of Int64Index - but are unsafe to cast to Int64...? I guess this is intentional?

In any case, under these circumstances, I'm even more convinced that it's best to just stay with the if type(indices) is not self._holder: condition.

jreback · 2018-07-16T10:59:16Z

pandas/tests/indexes/common.py

+
+        idx = self._holder(indices)
+        if idx.has_duplicates:
+            # We need to be able to control creation of duplicates here


this comment is a bit obtuse, can you reword

h-vetinari · 2018-07-19T11:08:42Z

@jreback All green. Any more feedback / comments?

jreback · 2018-07-20T13:02:33Z

pandas/tests/indexes/common.py

@@ -37,7 +37,7 @@ def verify_pickle(self, indices):
    def test_pickle_compat_construction(self):
        # this is testing for pickle compat
        if self._holder is None:
-            return


is this actually hit?

I don't know, I didn't write these tests, and the inner workings of indexes/common.py are not immediately apparent (which files call it, what do they fill indices with, etc.).

Happy to leave the bare return in there, if that's preferred.

well, put a breakpoint there and see if you would.

I'm not able to do work on any of this for 2 weeks now. I'll have a look after.

jreback · 2018-07-20T13:04:52Z

pandas/tests/indexes/common.py

+
+        n, k = len(idx), 10
+        duplicated_selection = np.random.choice(n, k * n)
+        expected = pd.Series(duplicated_selection).duplicated(keep=keep).values


I hate to check as a numpy array, much prefer to check the type and use assert_index_equal or assert_series_equal. is this how the other tests are?

Um, index.duplicated() (without return_inverse) yields a numpy array - this is the documented signature (I guess because selecting on an Index really only needs an ndarray).

All the manual duplicated-tests actually create their own data, and know what the correct outcome should be. Here, we're feeding tons of different things through that test, so we need to determine - as I'm doing with duplicated_selection what is actually duplicate; self._holder(idx.values[duplicated_selection]) is then a duplicate Index of the correct type, but we know where its duplicates are (from inspecting duplicated_selection), and therefore can validate.

h-vetinari · 2018-08-05T17:15:17Z

is this actually hit?

@jreback The if-branch you mentioned in test_pickle_compat_construction was not hit, therefore I removed it. Please re-review.

h-vetinari · 2018-08-05T18:58:49Z

The travis failure is unrelated. There seems to be a problem with the 3.5 job when collecting parametrized tests, in that the order is not stable, which yields a failure due to different tests

==================================== ERRORS ====================================
_____________________________ ERROR collecting gw1 _____________________________
Different tests were collected between gw0 and gw1. The difference is:

[...]

 pandas/tests/frame/test_apply.py::TestDataFrameAggregate::()::test_agg_cython_table[axis 0-df0-sum-expected0]
-pandas/tests/frame/test_apply.py::TestDataFrameAggregate::()::test_agg_cython_table[axis 0-df1-func1-expected1]
-pandas/tests/frame/test_apply.py::TestDataFrameAggregate::()::test_agg_cython_table[axis 0-df2-sum-expected2]
+pandas/tests/frame/test_apply.py::TestDataFrameAggregate::()::test_agg_cython_table[axis 0-df1-sum-expected1]
+pandas/tests/frame/test_apply.py::TestDataFrameAggregate::()::test_agg_cython_table[axis 0-df2-func2-expected2]
 pandas/tests/frame/test_apply.py::TestDataFrameAggregate::()::test_agg_cython_table[axis 0-df3-max-expected3]
 pandas/tests/frame/test_apply.py::TestDataFrameAggregate::()::test_agg_cython_table[axis 0-df4-amax-expected4]
 pandas/tests/frame/test_apply.py::TestDataFrameAggregate::()::test_agg_cython_table[axis 0-df5-func5-expected5]

[...]

h-vetinari · 2018-08-08T20:41:58Z

@jreback ping. Should be ready to go - travis failure is unrelated.

jreback · 2018-08-09T00:34:02Z

@h-vetinari actually no, your PR is causing the fail. you have non-determinism in the test generation. usually this is because the ordering of the fixtures / parameters is based on a dictionary.

h-vetinari · 2018-08-09T07:51:24Z

@jreback

actually no, your PR is causing the fail

I honestly can't see how this would be the case. I only change tests (no fixtures or anything) in

pandas/tests/indexes/common.py
pandas/tests/indexes/test_category.py
pandas/tests/indexes/test_range.py

and the collection errors are in pandas/tests/frame/test_apply.test_agg_cython_table.

The parametrization of that test is

    @pytest.mark.parametrize("df, func, expected", chain(
        _get_cython_table_params(
            DataFrame(), [
                ('sum', Series()),
                ('max', Series()),
                ('min', Series()),
                ('all', Series(dtype=bool)),
                ('any', Series(dtype=bool)),
                ('mean', Series()),
                ('prod', Series()),
                ('std', Series()),
                ('var', Series()),
                ('median', Series()),
            ]),
        _get_cython_table_params(
            DataFrame([[np.nan, 1], [1, 2]]), [
                ('sum', Series([1., 3])),
                ('max', Series([1., 2])),
                ('min', Series([1., 1])),
                ('all', Series([True, True])),
                ('any', Series([True, True])),
                ('mean', Series([1, 1.5])),
                ('prod', Series([1., 2])),
                ('std', Series([np.nan, 0.707107])),
                ('var', Series([np.nan, 0.5])),
                ('median', Series([1, 1.5])),
            ]),
    ))

h-vetinari · 2018-08-09T08:06:01Z

rebased again, let's see if it helps

h-vetinari · 2018-08-09T12:26:50Z

@jreback all green

h-vetinari · 2018-08-09T12:58:24Z

Btw, that issue with the test order is related to #22156, #22157

jreback · 2018-08-10T10:37:26Z

thanks @h-vetinari

…andas-dev#21902)

h-vetinari mentioned this pull request Jul 13, 2018

ENH: add return_inverse to duplicated for DataFrame/Series/Index/MultiIndex #21645

Closed

jreback requested changes Jul 14, 2018

View reviewed changes

jreback added the Testing pandas testing functions or related to the test suite label Jul 14, 2018

jreback added this to the 0.24.0 milestone Jul 14, 2018

jreback requested changes Jul 16, 2018

View reviewed changes

h-vetinari force-pushed the tst_index_common_skip branch 2 times, most recently from c0d3ec4 to 7f74578 Compare July 16, 2018 16:16

jreback requested changes Jul 20, 2018

View reviewed changes

h-vetinari force-pushed the tst_index_common_skip branch from db29d9b to cfa6182 Compare August 5, 2018 17:38

h-vetinari mentioned this pull request Aug 6, 2018

BUG: broken inheritance within pytest? #22211

Closed

h-vetinari added 4 commits August 9, 2018 09:56

TST/CLN: correctly skip in indexes/common; add test for duplicated

0f4175d

Rename ambiguous test names

4ab753a

Incorporate review (jreback)

b6e2858

Review; remove unhit instance-check

f9c9aab

h-vetinari force-pushed the tst_index_common_skip branch from cfa6182 to f9c9aab Compare August 9, 2018 08:05

topper-123 mentioned this pull request Aug 9, 2018

BUG: make conftest._cython_table deterministic for Python<3.6 #22157

Merged

jreback approved these changes Aug 10, 2018

View reviewed changes

jreback merged commit c7d6264 into pandas-dev:master Aug 10, 2018

h-vetinari deleted the tst_index_common_skip branch August 10, 2018 17:17

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

TST/CLN: correctly skip in indexes/common; add test for duplicated (p…

749a1e0

…andas-dev#21902)

Uh oh!

TST/CLN: correctly skip in indexes/common; add test for duplicated #21902

TST/CLN: correctly skip in indexes/common; add test for duplicated #21902

Uh oh!

Conversation

h-vetinari commented Jul 13, 2018

Uh oh!

codecov bot commented Jul 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

h-vetinari Jul 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback Jul 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

h-vetinari Jul 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

h-vetinari commented Jul 19, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

h-vetinari Jul 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

h-vetinari commented Aug 5, 2018

Uh oh!

h-vetinari commented Aug 5, 2018

Uh oh!

h-vetinari commented Aug 8, 2018

Uh oh!

jreback commented Aug 9, 2018

Uh oh!

h-vetinari commented Aug 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

h-vetinari commented Aug 9, 2018

Uh oh!

h-vetinari commented Aug 9, 2018

Uh oh!

codecov bot commented Jul 14, 2018 •

edited

Loading

h-vetinari Jul 14, 2018 •

edited

Loading

jreback Jul 16, 2018 •

edited

Loading

h-vetinari Jul 16, 2018 •

edited

Loading

h-vetinari Jul 20, 2018 •

edited

Loading

h-vetinari commented Aug 9, 2018 •

edited

Loading