pandas-dev · MarcoGorelli · Nov 10, 2022 · Nov 11, 2022 · Nov 11, 2022 · Nov 11, 2022
diff --git a/doc/source/getting_started/comparison/includes/filtering.rst b/doc/source/getting_started/comparison/includes/filtering.rst
@@ -9,6 +9,7 @@ The above statement is simply passing a ``Series`` of ``True``/``False`` objects
 returning all rows with ``True``.
 
 .. ipython:: python
+    :okwarning:
 
     is_dinner = tips["time"] == "Dinner"
     is_dinner

diff --git a/doc/source/getting_started/intro_tutorials/06_calculate_statistics.rst b/doc/source/getting_started/intro_tutorials/06_calculate_statistics.rst
@@ -224,6 +224,7 @@ Count number of records by category
 What is the number of passengers in each of the cabin classes?
 
 .. ipython:: python
+    :okwarning:
 
     titanic["Pclass"].value_counts()
 

diff --git a/doc/source/user_guide/10min.rst b/doc/source/user_guide/10min.rst
@@ -430,6 +430,7 @@ Histogramming
 See more at :ref:`Histogramming and Discretization <basics.discretization>`.
 
 .. ipython:: python
+   :okwarning:
 
    s = pd.Series(np.random.randint(0, 7, size=10))
    s

diff --git a/doc/source/user_guide/basics.rst b/doc/source/user_guide/basics.rst
@@ -689,6 +689,7 @@ The :meth:`~Series.value_counts` Series method and top-level function computes a
 of a 1D array of values. It can also be used as a function on regular arrays:
 
 .. ipython:: python
+   :okwarning:
 
    data = np.random.randint(0, 7, size=50)
    data
@@ -702,6 +703,7 @@ The :meth:`~DataFrame.value_counts` method can be used to count combinations acr
 By default all columns are used but a subset can be selected using the ``subset`` argument.
 
 .. ipython:: python
+    :okwarning:
 
     data = {"a": [1, 2, 3, 4], "b": ["x", "x", "y", "y"]}
     frame = pd.DataFrame(data)
@@ -741,6 +743,7 @@ and :func:`qcut` (bins based on sample quantiles) functions:
 normally distributed data into equal-size quartiles like so:
 
 .. ipython:: python
+   :okwarning:
 
    arr = np.random.randn(30)
    factor = pd.qcut(arr, [0, 0.25, 0.5, 0.75, 1])
@@ -2102,6 +2105,7 @@ The number of columns of each type in a ``DataFrame`` can be found by calling
 ``DataFrame.dtypes.value_counts()``.
 
 .. ipython:: python
+   :okwarning:
 
    dft.dtypes.value_counts()
 

diff --git a/doc/source/user_guide/categorical.rst b/doc/source/user_guide/categorical.rst
@@ -611,6 +611,7 @@ following operations are possible with categorical data:
 even if some categories are not present in the data:
 
 .. ipython:: python
+    :okwarning:
 
     s = pd.Series(pd.Categorical(["a", "b", "c", "c"], categories=["c", "a", "b", "d"]))
     s.value_counts()

diff --git a/doc/source/user_guide/cookbook.rst b/doc/source/user_guide/cookbook.rst
@@ -694,6 +694,7 @@ The :ref:`Pivot <reshaping.pivot>` docs.
 <https://stackoverflow.com/questions/15589354/frequency-tables-in-pandas-like-plyr-in-r>`__
 
 .. ipython:: python
+   :okwarning:
 
    grades = [48, 99, 75, 80, 42, 80, 72, 68, 36, 78]
    df = pd.DataFrame(

diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst
@@ -418,6 +418,7 @@ For instance, you can use the ``converters`` argument
 of :func:`~pandas.read_csv`:
 
 .. ipython:: python
+    :okwarning:
 
     data = "col_1\n1\n2\n'A'\n4.22"
     df = pd.read_csv(StringIO(data), converters={"col_1": str})
@@ -428,6 +429,7 @@ Or you can use the :func:`~pandas.to_numeric` function to coerce the
 dtypes after reading in the data,
 
 .. ipython:: python
+    :okwarning:
 
     df2 = pd.read_csv(StringIO(data))
     df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")
@@ -4329,6 +4331,7 @@ nan representation on disk (which converts to/from ``np.nan``), this
 defaults to ``nan``.
 
 .. ipython:: python
+   :okwarning:
 
     df_mixed = pd.DataFrame(
         {

diff --git a/doc/source/user_guide/missing_data.rst b/doc/source/user_guide/missing_data.rst
@@ -102,6 +102,7 @@ sentinel value that can be represented by NumPy in a singular dtype (datetime64[
 pandas objects provide compatibility between ``NaT`` and ``NaN``.
 
 .. ipython:: python
+   :okwarning:
 
    df2 = df.copy()
    df2["timestamp"] = pd.Timestamp("20120101")

diff --git a/doc/source/user_guide/scale.rst b/doc/source/user_guide/scale.rst
@@ -220,6 +220,7 @@ counts up to this point. As long as each individual file fits in memory, this wi
 work for arbitrary-sized datasets.
 
 .. ipython:: python
+   :okwarning:
 
    %%time
    files = pathlib.Path("data/timeseries/").glob("ts*.parquet")
@@ -302,6 +303,7 @@ returns a Dask Series with the same dtype and the same name.
 To get the actual result you can call ``.compute()``.
 
 .. ipython:: python
+   :okwarning:
 
    %time ddf["name"].value_counts().compute()
 

diff --git a/doc/source/whatsnew/v0.10.1.rst b/doc/source/whatsnew/v0.10.1.rst
@@ -83,6 +83,7 @@ Retrieving unique values in an indexable or data column.
 You can now store ``datetime64`` in data columns
 
 .. ipython:: python
+    :okwarning:
 
     df_mixed = df.copy()
     df_mixed["datetime64"] = pd.Timestamp("20010102")

diff --git a/doc/source/whatsnew/v0.11.0.rst b/doc/source/whatsnew/v0.11.0.rst
@@ -289,6 +289,7 @@ Furthermore ``datetime64[ns]`` columns are created by default, when passed datet
 (:issue:`2809`, :issue:`2810`)
 
 .. ipython:: python
+   :okwarning:
 
    df = pd.DataFrame(np.random.randn(6, 2), pd.date_range('20010102', periods=6),
                      columns=['A', ' B'])

diff --git a/doc/source/whatsnew/v1.0.0.rst b/doc/source/whatsnew/v1.0.0.rst
@@ -561,6 +561,7 @@ integer dtype for the values.
 *pandas 1.0.0*
 
 .. ipython:: python
+   :okwarning:
 
    pd.Series([2, 1, 1, None], dtype="Int64").value_counts().dtype
 

diff --git a/doc/source/whatsnew/v1.4.0.rst b/doc/source/whatsnew/v1.4.0.rst
@@ -325,6 +325,7 @@ behavior is now consistent with ``unique``, ``isin`` and others
 (:issue:`42688`).
 
 .. ipython:: python
+    :okwarning:
 
     s = pd.Series([True, None, pd.NaT, None, pd.NaT, None])
     res = s.value_counts(dropna=False)

diff --git a/doc/source/whatsnew/v1.5.2.rst b/doc/source/whatsnew/v1.5.2.rst
@@ -33,7 +33,7 @@ Bug fixes
 
 Other
 ~~~~~
--
+- Introduced ``FutureWarning`` notifying about behaviour change in :meth:`DataFrame.value_counts`, :meth:`Series.value_counts`, :meth:`DataFrameGroupBy.value_counts`, :meth:`SeriesGroupBy.value_counts` - the resulting series will by default now be named ``'counts'`` (or ``'proportion'`` if ``normalize=True``), and the index (if present) will be taken from the original object's name (:issue:`49497`)
 -
 
 .. ---------------------------------------------------------------------------

diff --git a/pandas/core/base.py b/pandas/core/base.py
@@ -17,6 +17,7 @@
     final,
     overload,
 )
+import warnings
 
 import numpy as np
 
@@ -37,6 +38,7 @@
     cache_readonly,
     doc,
 )
+from pandas.util._exceptions import find_stack_level
 
 from pandas.core.dtypes.common import (
     is_categorical_dtype,
@@ -991,6 +993,13 @@ def value_counts(
         NaN    1
         dtype: int64
         """
+        warnings.warn(
+            "In pandas 2.0.0, the name of the resulting Series will be "
+            "'count' (or 'proportion' if `normalize=True`), and the index "
+            "will inherit the original object's name.",
+            FutureWarning,
+            stacklevel=find_stack_level(),
+        )
         return value_counts(
             self,
             sort=sort,

diff --git a/pandas/core/describe.py b/pandas/core/describe.py
@@ -17,6 +17,7 @@
     Sequence,
     cast,
 )
+import warnings
 
 import numpy as np
 
@@ -252,7 +253,10 @@ def describe_categorical_1d(
         Ignored, but in place to unify interface.
     """
     names = ["count", "unique", "top", "freq"]
-    objcounts = data.value_counts()
+    with warnings.catch_warnings():
+        msg = "In pandas 2.0.0, the name of the resulting Series"
+        warnings.filterwarnings("ignore", msg, FutureWarning)
+        objcounts = data.value_counts()
     count_unique = len(objcounts[objcounts != 0])
     if count_unique > 0:
         top, freq = objcounts.index[0], objcounts.iloc[0]

diff --git a/pandas/core/frame.py b/pandas/core/frame.py
@@ -7035,6 +7035,12 @@ def value_counts(
                     NaN            1
         dtype: int64
         """
+        warnings.warn(
+            "In pandas 2.0.0, the name of the resulting Series will be "
+            "'count' (or 'proportion' if `normalize=True`).",
+            FutureWarning,
+            stacklevel=find_stack_level(),
+        )
         if subset is None:
             subset = self.columns.tolist()
 

@@ -604,6 +604,12 @@ def value_counts(
         bins=None,
         dropna: bool = True,
     ) -> Series:
+        warnings.warn(
+            "In pandas 2.0.0, the name of the resulting Series will be "
+            "'count' (or 'proportion' if `normalize=True`).",
+            FutureWarning,
+            stacklevel=find_stack_level(),
+        )
 
         from pandas.core.reshape.merge import get_join_indexers
         from pandas.core.reshape.tile import cut
@@ -619,13 +625,18 @@ def value_counts(
             # scalar bins cannot be done at top level
             # in a backward compatible way
             # GH38672 relates to categorical dtype
-            ser = self.apply(
-                Series.value_counts,
-                normalize=normalize,
-                sort=sort,
-                ascending=ascending,
-                bins=bins,
-            )
+            with warnings.catch_warnings():
+                # The warning has already been emitted above,
+                # no need to re-emit it for each group.
+                msg = "In pandas 2.0.0, the name of the resulting Series"
+                warnings.filterwarnings("ignore", msg, FutureWarning)
+                ser = self.apply(
+                    Series.value_counts,
+                    normalize=normalize,
+                    sort=sort,
+                    ascending=ascending,
+                    bins=bins,
+                )
             ser.index.names = names
             return ser
 
@@ -1976,6 +1987,14 @@ def value_counts(
         3    male       low      US        0.25
         4    male    medium      FR        0.25
         """
+        if self.as_index:
+            warnings.warn(
+                "In pandas 2.0.0, the name of the resulting Series will be "
+                "'count' (or 'proportion' if `normalize=True`).",
+                FutureWarning,
+                stacklevel=find_stack_level(),
+            )
+
         if self.axis == 1:
             raise NotImplementedError(
                 "DataFrameGroupBy.value_counts only handles axis=0"

diff --git a/pandas/io/formats/info.py b/pandas/io/formats/info.py
@@ -13,6 +13,7 @@
     Mapping,
     Sequence,
 )
+import warnings
 
 from pandas._config import get_option
 
@@ -1097,5 +1098,9 @@ def _get_dataframe_dtype_counts(df: DataFrame) -> Mapping[str, int]:
     """
     Create mapping between datatypes and their number of occurrences.
     """
-    # groupby dtype.name to collect e.g. Categorical columns
-    return df.dtypes.value_counts().groupby(lambda x: x.name).sum()
+    with warnings.catch_warnings():
+        # This warning is emitted on all calls - can remove it in 2.0.0
+        msg = "In pandas 2.0.0, the name of the resulting Series"
+        warnings.filterwarnings("ignore", msg, FutureWarning)
+        # groupby dtype.name to collect e.g. Categorical columns
+        return df.dtypes.value_counts().groupby(lambda x: x.name).sum()
diff --git a/pandas/tests/arrays/boolean/test_function.py b/pandas/tests/arrays/boolean/test_function.py
@@ -104,7 +104,8 @@ def test_value_counts_na():
 
 def test_value_counts_with_normalize():
     ser = pd.Series([True, False, pd.NA], dtype="boolean")
-    result = ser.value_counts(normalize=True)
+    with tm.assert_produces_warning(FutureWarning, match="In pandas 2.0.0, the name"):
+        result = ser.value_counts(normalize=True)
     expected = pd.Series([1, 1], index=ser[:-1], dtype="Float64") / 2
     assert expected.index.dtype == "boolean"
     tm.assert_series_equal(result, expected)

diff --git a/pandas/tests/arrays/floating/test_function.py b/pandas/tests/arrays/floating/test_function.py
@@ -119,7 +119,8 @@ def test_value_counts_empty():
 
 def test_value_counts_with_normalize():
     ser = pd.Series([0.1, 0.2, 0.1, pd.NA], dtype="Float64")
-    result = ser.value_counts(normalize=True)
+    with tm.assert_produces_warning(FutureWarning, match="In pandas 2.0.0, the name"):
+        result = ser.value_counts(normalize=True)
     expected = pd.Series([2, 1], index=ser[:2], dtype="Float64") / 3
     assert expected.index.dtype == ser.dtype
     tm.assert_series_equal(result, expected)

diff --git a/pandas/tests/arrays/integer/test_function.py b/pandas/tests/arrays/integer/test_function.py
@@ -123,7 +123,8 @@ def test_value_counts_na():
 def test_value_counts_empty():
     # https://github.com/pandas-dev/pandas/issues/33317
     ser = pd.Series([], dtype="Int64")
-    result = ser.value_counts()
+    with tm.assert_produces_warning(FutureWarning, match="name of the result"):
+        result = ser.value_counts()
     idx = pd.Index([], dtype=ser.dtype)
     assert idx.dtype == ser.dtype
     expected = pd.Series([], index=idx, dtype="Int64")
@@ -133,7 +134,8 @@ def test_value_counts_empty():
 def test_value_counts_with_normalize():
     # GH 33172
     ser = pd.Series([1, 2, 1, pd.NA], dtype="Int64")
-    result = ser.value_counts(normalize=True)
+    with tm.assert_produces_warning(FutureWarning, match="In pandas 2.0.0, the name"):
+        result = ser.value_counts(normalize=True)
     expected = pd.Series([2, 1], index=ser[:2], dtype="Float64") / 3
     assert expected.index.dtype == ser.dtype
     tm.assert_series_equal(result, expected)

diff --git a/pandas/tests/arrays/string_/test_string.py b/pandas/tests/arrays/string_/test_string.py
@@ -494,7 +494,8 @@ def test_value_counts_na(dtype):
 
 def test_value_counts_with_normalize(dtype):
     ser = pd.Series(["a", "b", "a", pd.NA], dtype=dtype)
-    result = ser.value_counts(normalize=True)
+    with tm.assert_produces_warning(FutureWarning, match="In pandas 2.0.0, the name"):
+        result = ser.value_counts(normalize=True)
     expected = pd.Series([2, 1], index=ser[:2], dtype="Float64") / 3
     tm.assert_series_equal(result, expected)
-Original file line number
+Diff line change
@@ Expand Up / @@ -33,7 +33,7 @@ Bug fixes @@
     Other
     ~~~~~
-    -
+    - Introduced ``FutureWarning`` notifying about behaviour change in :meth:`DataFrame.value_counts`, :meth:`Series.value_counts`, :meth:`DataFrameGroupBy.value_counts`, :meth:`SeriesGroupBy.value_counts` - the resulting series will by default now be named ``'counts'`` (or ``'proportion'`` if ``normalize=True``), and the index (if present) will be taken from the original object's name (:issue:`49497`)
     -
     .. ---------------------------------------------------------------------------
@@ Expand Down @@