Skip to content

WARN introduce FutureWarning for value_counts behaviour change #49640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 20 commits into from
Closed
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ The above statement is simply passing a ``Series`` of ``True``/``False`` objects
returning all rows with ``True``.

.. ipython:: python
:okwarning:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix the docs rather that ignoring the warnings


is_dinner = tips["time"] == "Dinner"
is_dinner
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,7 @@ Count number of records by category
What is the number of passengers in each of the cabin classes?

.. ipython:: python
:okwarning:

titanic["Pclass"].value_counts()

Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/10min.rst
Original file line number Diff line number Diff line change
Expand Up @@ -430,6 +430,7 @@ Histogramming
See more at :ref:`Histogramming and Discretization <basics.discretization>`.

.. ipython:: python
:okwarning:

s = pd.Series(np.random.randint(0, 7, size=10))
s
Expand Down
4 changes: 4 additions & 0 deletions doc/source/user_guide/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -689,6 +689,7 @@ The :meth:`~Series.value_counts` Series method and top-level function computes a
of a 1D array of values. It can also be used as a function on regular arrays:

.. ipython:: python
:okwarning:

data = np.random.randint(0, 7, size=50)
data
Expand All @@ -702,6 +703,7 @@ The :meth:`~DataFrame.value_counts` method can be used to count combinations acr
By default all columns are used but a subset can be selected using the ``subset`` argument.

.. ipython:: python
:okwarning:

data = {"a": [1, 2, 3, 4], "b": ["x", "x", "y", "y"]}
frame = pd.DataFrame(data)
Expand Down Expand Up @@ -741,6 +743,7 @@ and :func:`qcut` (bins based on sample quantiles) functions:
normally distributed data into equal-size quartiles like so:

.. ipython:: python
:okwarning:

arr = np.random.randn(30)
factor = pd.qcut(arr, [0, 0.25, 0.5, 0.75, 1])
Expand Down Expand Up @@ -2102,6 +2105,7 @@ The number of columns of each type in a ``DataFrame`` can be found by calling
``DataFrame.dtypes.value_counts()``.

.. ipython:: python
:okwarning:

dft.dtypes.value_counts()

Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/categorical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -611,6 +611,7 @@ following operations are possible with categorical data:
even if some categories are not present in the data:

.. ipython:: python
:okwarning:

s = pd.Series(pd.Categorical(["a", "b", "c", "c"], categories=["c", "a", "b", "d"]))
s.value_counts()
Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -694,6 +694,7 @@ The :ref:`Pivot <reshaping.pivot>` docs.
<https://stackoverflow.com/questions/15589354/frequency-tables-in-pandas-like-plyr-in-r>`__

.. ipython:: python
:okwarning:

grades = [48, 99, 75, 80, 42, 80, 72, 68, 36, 78]
df = pd.DataFrame(
Expand Down
3 changes: 3 additions & 0 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,7 @@ For instance, you can use the ``converters`` argument
of :func:`~pandas.read_csv`:

.. ipython:: python
:okwarning:

data = "col_1\n1\n2\n'A'\n4.22"
df = pd.read_csv(StringIO(data), converters={"col_1": str})
Expand All @@ -428,6 +429,7 @@ Or you can use the :func:`~pandas.to_numeric` function to coerce the
dtypes after reading in the data,

.. ipython:: python
:okwarning:

df2 = pd.read_csv(StringIO(data))
df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
Expand Down Expand Up @@ -4329,6 +4331,7 @@ nan representation on disk (which converts to/from ``np.nan``), this
defaults to ``nan``.

.. ipython:: python
:okwarning:

df_mixed = pd.DataFrame(
{
Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/missing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ sentinel value that can be represented by NumPy in a singular dtype (datetime64[
pandas objects provide compatibility between ``NaT`` and ``NaN``.

.. ipython:: python
:okwarning:

df2 = df.copy()
df2["timestamp"] = pd.Timestamp("20120101")
Expand Down
2 changes: 2 additions & 0 deletions doc/source/user_guide/scale.rst
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,7 @@ counts up to this point. As long as each individual file fits in memory, this wi
work for arbitrary-sized datasets.

.. ipython:: python
:okwarning:

%%time
files = pathlib.Path("data/timeseries/").glob("ts*.parquet")
Expand Down Expand Up @@ -302,6 +303,7 @@ returns a Dask Series with the same dtype and the same name.
To get the actual result you can call ``.compute()``.

.. ipython:: python
:okwarning:

%time ddf["name"].value_counts().compute()

Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.10.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ Retrieving unique values in an indexable or data column.
You can now store ``datetime64`` in data columns

.. ipython:: python
:okwarning:

df_mixed = df.copy()
df_mixed["datetime64"] = pd.Timestamp("20010102")
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.11.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,7 @@ Furthermore ``datetime64[ns]`` columns are created by default, when passed datet
(:issue:`2809`, :issue:`2810`)

.. ipython:: python
:okwarning:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

older release notes ok to ignore (or convert to code blocks )


df = pd.DataFrame(np.random.randn(6, 2), pd.date_range('20010102', periods=6),
columns=['A', ' B'])
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -561,6 +561,7 @@ integer dtype for the values.
*pandas 1.0.0*

.. ipython:: python
:okwarning:

pd.Series([2, 1, 1, None], dtype="Int64").value_counts().dtype

Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,7 @@ behavior is now consistent with ``unique``, ``isin`` and others
(:issue:`42688`).

.. ipython:: python
:okwarning:

s = pd.Series([True, None, pd.NaT, None, pd.NaT, None])
res = s.value_counts(dropna=False)
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.5.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Bug fixes

Other
~~~~~
-
- Introduced ``FutureWarning`` notifying about behaviour change in :meth:`DataFrame.value_counts`, :meth:`Series.value_counts`, :meth:`DataFrameGroupBy.value_counts`, :meth:`SeriesGroupBy.value_counts` - the resulting series will by default now be named ``'counts'`` (or ``'proportion'`` if ``normalize=True``), and the index (if present) will be taken from the original object's name (:issue:`49497`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this warrants a full on section and before / after on how to fix

-

.. ---------------------------------------------------------------------------
Expand Down
9 changes: 9 additions & 0 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
final,
overload,
)
import warnings

import numpy as np

Expand All @@ -37,6 +38,7 @@
cache_readonly,
doc,
)
from pandas.util._exceptions import find_stack_level

from pandas.core.dtypes.common import (
is_categorical_dtype,
Expand Down Expand Up @@ -991,6 +993,13 @@ def value_counts(
NaN 1
dtype: int64
"""
warnings.warn(
"In pandas 2.0.0, the name of the resulting Series will be "
"'count' (or 'proportion' if `normalize=True`), and the index "
"will inherit the original object's name.",
FutureWarning,
stacklevel=find_stack_level(),
)
return value_counts(
self,
sort=sort,
Expand Down
6 changes: 5 additions & 1 deletion pandas/core/describe.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
Sequence,
cast,
)
import warnings

import numpy as np

Expand Down Expand Up @@ -252,7 +253,10 @@ def describe_categorical_1d(
Ignored, but in place to unify interface.
"""
names = ["count", "unique", "top", "freq"]
objcounts = data.value_counts()
with warnings.catch_warnings():
msg = "In pandas 2.0.0, the name of the resulting Series"
warnings.filterwarnings("ignore", msg, FutureWarning)
objcounts = data.value_counts()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we fix it here? the fact that u have to catch is a problem

count_unique = len(objcounts[objcounts != 0])
if count_unique > 0:
top, freq = objcounts.index[0], objcounts.iloc[0]
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -7035,6 +7035,12 @@ def value_counts(
NaN 1
dtype: int64
"""
warnings.warn(
"In pandas 2.0.0, the name of the resulting Series will be "
"'count' (or 'proportion' if `normalize=True`).",
FutureWarning,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to be extremely noisy

is there a way to do so that it doesn't show the warnjng

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responding to #49640 (comment) as well.

I thought @jreback expressed the idea of having the warning (with no way to silence) on the call; it seems like I misunderstood. My preferred route would be no warning; another option is to just have a DeprecationWarning - anyone running an interactive session won't see the warning, but those testing via pytest would. While I dislike the idea of adding name, I'm not -1 on it.

From the linked issue on adding name, I'm taking some liberties here but it seems @jorisvandenbossche is +1, @jbrockmendel is -0.5, @Dr-Irv is +1, and @MarcoGorelli was -1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume there will be a 1.5.3 release before 2.0.0? If so, perhaps let's put this off until then, don't want to rush a decision

I wouldn't be keen on adding name, but I'd still prefer that to not making the change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we did a DeprecationWarnjng though then this just is really a breaking change and we should just do it

so either am ok with either

  • just change it in 2.0
  • engineer a nice way to warn in 1.5.x but cannot be so noisy it's everywhere

stacklevel=find_stack_level(),
)
if subset is None:
subset = self.columns.tolist()

Expand Down
33 changes: 26 additions & 7 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -604,6 +604,12 @@ def value_counts(
bins=None,
dropna: bool = True,
) -> Series:
warnings.warn(
"In pandas 2.0.0, the name of the resulting Series will be "
"'count' (or 'proportion' if `normalize=True`).",
FutureWarning,
stacklevel=find_stack_level(),
)

from pandas.core.reshape.merge import get_join_indexers
from pandas.core.reshape.tile import cut
Expand All @@ -619,13 +625,18 @@ def value_counts(
# scalar bins cannot be done at top level
# in a backward compatible way
# GH38672 relates to categorical dtype
ser = self.apply(
Series.value_counts,
normalize=normalize,
sort=sort,
ascending=ascending,
bins=bins,
)
with warnings.catch_warnings():
# The warning has already been emitted above,
# no need to re-emit it for each group.
msg = "In pandas 2.0.0, the name of the resulting Series"
warnings.filterwarnings("ignore", msg, FutureWarning)
ser = self.apply(
Series.value_counts,
normalize=normalize,
sort=sort,
ascending=ascending,
bins=bins,
)
ser.index.names = names
return ser

Expand Down Expand Up @@ -1976,6 +1987,14 @@ def value_counts(
3 male low US 0.25
4 male medium FR 0.25
"""
if self.as_index:
warnings.warn(
"In pandas 2.0.0, the name of the resulting Series will be "
"'count' (or 'proportion' if `normalize=True`).",
FutureWarning,
stacklevel=find_stack_level(),
)

if self.axis == 1:
raise NotImplementedError(
"DataFrameGroupBy.value_counts only handles axis=0"
Expand Down
9 changes: 7 additions & 2 deletions pandas/io/formats/info.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
Mapping,
Sequence,
)
import warnings

from pandas._config import get_option

Expand Down Expand Up @@ -1097,5 +1098,9 @@ def _get_dataframe_dtype_counts(df: DataFrame) -> Mapping[str, int]:
"""
Create mapping between datatypes and their number of occurrences.
"""
# groupby dtype.name to collect e.g. Categorical columns
return df.dtypes.value_counts().groupby(lambda x: x.name).sum()
with warnings.catch_warnings():
# This warning is emitted on all calls - can remove it in 2.0.0
msg = "In pandas 2.0.0, the name of the resulting Series"
warnings.filterwarnings("ignore", msg, FutureWarning)
# groupby dtype.name to collect e.g. Categorical columns
return df.dtypes.value_counts().groupby(lambda x: x.name).sum()
3 changes: 2 additions & 1 deletion pandas/tests/arrays/boolean/test_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,8 @@ def test_value_counts_na():

def test_value_counts_with_normalize():
ser = pd.Series([True, False, pd.NA], dtype="boolean")
result = ser.value_counts(normalize=True)
with tm.assert_produces_warning(FutureWarning, match="In pandas 2.0.0, the name"):
result = ser.value_counts(normalize=True)
expected = pd.Series([1, 1], index=ser[:-1], dtype="Float64") / 2
assert expected.index.dtype == "boolean"
tm.assert_series_equal(result, expected)
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/arrays/floating/test_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,8 @@ def test_value_counts_empty():

def test_value_counts_with_normalize():
ser = pd.Series([0.1, 0.2, 0.1, pd.NA], dtype="Float64")
result = ser.value_counts(normalize=True)
with tm.assert_produces_warning(FutureWarning, match="In pandas 2.0.0, the name"):
result = ser.value_counts(normalize=True)
expected = pd.Series([2, 1], index=ser[:2], dtype="Float64") / 3
assert expected.index.dtype == ser.dtype
tm.assert_series_equal(result, expected)
Expand Down
6 changes: 4 additions & 2 deletions pandas/tests/arrays/integer/test_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,8 @@ def test_value_counts_na():
def test_value_counts_empty():
# https://github.com/pandas-dev/pandas/issues/33317
ser = pd.Series([], dtype="Int64")
result = ser.value_counts()
with tm.assert_produces_warning(FutureWarning, match="name of the result"):
result = ser.value_counts()
idx = pd.Index([], dtype=ser.dtype)
assert idx.dtype == ser.dtype
expected = pd.Series([], index=idx, dtype="Int64")
Expand All @@ -133,7 +134,8 @@ def test_value_counts_empty():
def test_value_counts_with_normalize():
# GH 33172
ser = pd.Series([1, 2, 1, pd.NA], dtype="Int64")
result = ser.value_counts(normalize=True)
with tm.assert_produces_warning(FutureWarning, match="In pandas 2.0.0, the name"):
result = ser.value_counts(normalize=True)
expected = pd.Series([2, 1], index=ser[:2], dtype="Float64") / 3
assert expected.index.dtype == ser.dtype
tm.assert_series_equal(result, expected)
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/arrays/string_/test_string.py
Original file line number Diff line number Diff line change
Expand Up @@ -494,7 +494,8 @@ def test_value_counts_na(dtype):

def test_value_counts_with_normalize(dtype):
ser = pd.Series(["a", "b", "a", pd.NA], dtype=dtype)
result = ser.value_counts(normalize=True)
with tm.assert_produces_warning(FutureWarning, match="In pandas 2.0.0, the name"):
result = ser.value_counts(normalize=True)
expected = pd.Series([2, 1], index=ser[:2], dtype="Float64") / 3
tm.assert_series_equal(result, expected)

Expand Down
Loading