Skip to content

API/BUG: freq retention in value_counts #33830

@jbrockmendel

Description

@jbrockmendel
dti = pd.date_range('2016-01-01', periods=5)

dti.value_counts().index.freq    # <-- None
dti.factorize()[1].freq   # <-- None

mi = pd.MultiIndex.from_arrays([dti, dti])

mi.levels[0].freq   # <-- None

There is a comment in tests.indexes.datetimes.test_datetime test_factorize suggesting that freq should be preserved by factorize, but that is not checked and would fail if it were

        # freq must be preserved
        idx3 = date_range("2000-01", periods=4, freq="M", tz="Asia/Tokyo")
        exp_arr = np.array([0, 1, 2, 3], dtype=np.intp)
        arr, idx = idx3.factorize()
        tm.assert_numpy_array_equal(arr, exp_arr)
        tm.assert_index_equal(idx, idx3)

So the question: do we want to try to preserve freq in factorize?

xref #33677 for the MultiIndex case

Update One more: Categorical:

dti = pd.date_range('2016-01-01', periods=5)
cat = pd.Categorical(dti)
cat.categories.freq   # <-- None

Metadata

Metadata

Assignees

Labels

AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugFrequencyDateOffsetsNeeds TestsUnit test(s) needed to prevent regressionsfreq retentionUser expects "freq" attribute to be preservedgood first issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions