Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Remove SettingWithCopyWarning #56614

Merged
merged 13 commits into from
Feb 4, 2024
2 changes: 0 additions & 2 deletions doc/source/reference/testing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,6 @@ Exceptions and warnings
errors.PossiblePrecisionLoss
errors.PyperclipException
errors.PyperclipWindowsException
errors.SettingWithCopyError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have another PR with the whatsnew note, please make sure that the removal of these are mentioned

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create a separate PR with a whatsnew

errors.SettingWithCopyWarning
errors.SpecificationError
errors.UndefinedVariableError
errors.UnsortedIndexError
Expand Down
8 changes: 1 addition & 7 deletions doc/source/user_guide/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,6 @@ and :ref:`other advanced indexing features <advanced.index_types>`.

See the :ref:`Indexing and Selecting Data <indexing>` for general indexing documentation.

.. warning::

Whether a copy or a reference is returned for a setting operation may
depend on the context. This is sometimes called ``chained assignment`` and
should be avoided. See :ref:`Returning a View versus Copy
<indexing.view_versus_copy>`.

See the :ref:`cookbook<cookbook.selection>` for some advanced strategies.

.. _advanced.hierarchical:
Expand Down Expand Up @@ -402,6 +395,7 @@ slicers on a single axis.
Furthermore, you can *set* the values using the following methods.

.. ipython:: python
:okwarning:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the pdep warning? Best to just change this example or remove if it won't work anymore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seem to be necessary anymore, lets see what ci has to say

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep this in for now, I can't reproduce locally but it shows up on ci, it's weird


df2 = dfmi.copy()
df2.loc(axis=0)[:, :, ["C1", "C3"]] = -10
Expand Down
252 changes: 5 additions & 247 deletions doc/source/user_guide/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,6 @@ this area.
production code, we recommended that you take advantage of the optimized
pandas data access methods exposed in this chapter.

.. warning::

Whether a copy or a reference is returned for a setting operation, may
depend on the context. This is sometimes called ``chained assignment`` and
should be avoided. See :ref:`Returning a View versus Copy
<indexing.view_versus_copy>`.

See the :ref:`MultiIndex / Advanced Indexing <advanced>` for ``MultiIndex`` and more advanced indexing documentation.

See the :ref:`cookbook<cookbook.selection>` for some advanced strategies.
Expand Down Expand Up @@ -299,12 +292,6 @@ largely as a convenience since it is such a common operation.
Selection by label
------------------

.. warning::

Whether a copy or a reference is returned for a setting operation, may depend on the context.
This is sometimes called ``chained assignment`` and should be avoided.
See :ref:`Returning a View versus Copy <indexing.view_versus_copy>`.

.. warning::

``.loc`` is strict when you present slicers that are not compatible (or convertible) with the index type. For example
Expand Down Expand Up @@ -445,12 +432,6 @@ For more information about duplicate labels, see
Selection by position
---------------------

.. warning::

Whether a copy or a reference is returned for a setting operation, may depend on the context.
This is sometimes called ``chained assignment`` and should be avoided.
See :ref:`Returning a View versus Copy <indexing.view_versus_copy>`.

pandas provides a suite of methods in order to get **purely integer based indexing**. The semantics follow closely Python and NumPy slicing. These are ``0-based`` indexing. When slicing, the start bound is *included*, while the upper bound is *excluded*. Trying to use a non-integer, even a **valid** label will raise an ``IndexError``.

The ``.iloc`` attribute is the primary access method. The following are valid inputs:
Expand Down Expand Up @@ -1722,234 +1703,11 @@ You can assign a custom index to the ``index`` attribute:
df_idx.index = pd.Index([10, 20, 30, 40], name="a")
df_idx

.. _indexing.view_versus_copy:

Returning a view versus a copy
------------------------------

.. warning::

:ref:`Copy-on-Write <copy_on_write>`
will become the new default in pandas 3.0. This means than chained indexing will
never work. As a consequence, the ``SettingWithCopyWarning`` won't be necessary
anymore.
See :ref:`this section <copy_on_write_chained_assignment>`
for more context.
We recommend turning Copy-on-Write on to leverage the improvements with

```
pd.options.mode.copy_on_write = True
```

even before pandas 3.0 is available.

When setting values in a pandas object, care must be taken to avoid what is called
``chained indexing``. Here is an example.

.. ipython:: python

dfmi = pd.DataFrame([list('abcd'),
list('efgh'),
list('ijkl'),
list('mnop')],
columns=pd.MultiIndex.from_product([['one', 'two'],
['first', 'second']]))
dfmi

Compare these two access methods:

.. ipython:: python

dfmi['one']['second']

.. ipython:: python

dfmi.loc[:, ('one', 'second')]

These both yield the same results, so which should you use? It is instructive to understand the order
of operations on these and why method 2 (``.loc``) is much preferred over method 1 (chained ``[]``).

``dfmi['one']`` selects the first level of the columns and returns a DataFrame that is singly-indexed.
Then another Python operation ``dfmi_with_one['second']`` selects the series indexed by ``'second'``.
This is indicated by the variable ``dfmi_with_one`` because pandas sees these operations as separate events.
e.g. separate calls to ``__getitem__``, so it has to treat them as linear operations, they happen one after another.

Contrast this to ``df.loc[:,('one','second')]`` which passes a nested tuple of ``(slice(None),('one','second'))`` to a single call to
``__getitem__``. This allows pandas to deal with this as a single entity. Furthermore this order of operations *can* be significantly
faster, and allows one to index *both* axes if so desired.

Why does assignment fail when using chained indexing?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. warning::

:ref:`Copy-on-Write <copy_on_write>`
will become the new default in pandas 3.0. This means than chained indexing will
never work. As a consequence, the ``SettingWithCopyWarning`` won't be necessary
anymore.
See :ref:`this section <copy_on_write_chained_assignment>`
for more context.
We recommend turning Copy-on-Write on to leverage the improvements with

```
pd.options.mode.copy_on_write = True
```

even before pandas 3.0 is available.

The problem in the previous section is just a performance issue. What's up with
the ``SettingWithCopy`` warning? We don't **usually** throw warnings around when
you do something that might cost a few extra milliseconds!

But it turns out that assigning to the product of chained indexing has
inherently unpredictable results. To see this, think about how the Python
interpreter executes this code:

.. code-block:: python

dfmi.loc[:, ('one', 'second')] = value
# becomes
dfmi.loc.__setitem__((slice(None), ('one', 'second')), value)

But this code is handled differently:

.. code-block:: python

dfmi['one']['second'] = value
# becomes
dfmi.__getitem__('one').__setitem__('second', value)

See that ``__getitem__`` in there? Outside of simple cases, it's very hard to
predict whether it will return a view or a copy (it depends on the memory layout
of the array, about which pandas makes no guarantees), and therefore whether
the ``__setitem__`` will modify ``dfmi`` or a temporary object that gets thrown
out immediately afterward. **That's** what ``SettingWithCopy`` is warning you
about!

.. note:: You may be wondering whether we should be concerned about the ``loc``
property in the first example. But ``dfmi.loc`` is guaranteed to be ``dfmi``
itself with modified indexing behavior, so ``dfmi.loc.__getitem__`` /
``dfmi.loc.__setitem__`` operate on ``dfmi`` directly. Of course,
``dfmi.loc.__getitem__(idx)`` may be a view or a copy of ``dfmi``.

Sometimes a ``SettingWithCopy`` warning will arise at times when there's no
obvious chained indexing going on. **These** are the bugs that
``SettingWithCopy`` is designed to catch! pandas is probably trying to warn you
that you've done this:

.. code-block:: python

def do_something(df):
foo = df[['bar', 'baz']] # Is foo a view? A copy? Nobody knows!
# ... many lines here ...
# We don't know whether this will modify df or not!
foo['quux'] = value
return foo

Yikes!

.. _indexing.evaluation_order:

Evaluation order matters
~~~~~~~~~~~~~~~~~~~~~~~~

.. warning::

:ref:`Copy-on-Write <copy_on_write>`
will become the new default in pandas 3.0. This means than chained indexing will
never work. As a consequence, the ``SettingWithCopyWarning`` won't be necessary
anymore.
See :ref:`this section <copy_on_write_chained_assignment>`
for more context.
We recommend turning Copy-on-Write on to leverage the improvements with

```
pd.options.mode.copy_on_write = True
```

even before pandas 3.0 is available.

When you use chained indexing, the order and type of the indexing operation
partially determine whether the result is a slice into the original object, or
a copy of the slice.

pandas has the ``SettingWithCopyWarning`` because assigning to a copy of a
slice is frequently not intentional, but a mistake caused by chained indexing
returning a copy where a slice was expected.

If you would like pandas to be more or less trusting about assignment to a
chained indexing expression, you can set the :ref:`option <options>`
``mode.chained_assignment`` to one of these values:

* ``'warn'``, the default, means a ``SettingWithCopyWarning`` is printed.
* ``'raise'`` means pandas will raise a ``SettingWithCopyError``
you have to deal with.
* ``None`` will suppress the warnings entirely.

.. ipython:: python
:okwarning:

dfb = pd.DataFrame({'a': ['one', 'one', 'two',
'three', 'two', 'one', 'six'],
'c': np.arange(7)})

# This will show the SettingWithCopyWarning
# but the frame values will be set
dfb['c'][dfb['a'].str.startswith('o')] = 42

This however is operating on a copy and will not work.

.. ipython:: python
:okwarning:
:okexcept:

with pd.option_context('mode.chained_assignment','warn'):
dfb[dfb['a'].str.startswith('o')]['c'] = 42

A chained assignment can also crop up in setting in a mixed dtype frame.

.. note::

These setting rules apply to all of ``.loc/.iloc``.

The following is the recommended access method using ``.loc`` for multiple items (using ``mask``) and a single item using a fixed index:

.. ipython:: python

dfc = pd.DataFrame({'a': ['one', 'one', 'two',
'three', 'two', 'one', 'six'],
'c': np.arange(7)})
dfd = dfc.copy()
# Setting multiple items using a mask
mask = dfd['a'].str.startswith('o')
dfd.loc[mask, 'c'] = 42
dfd

# Setting a single item
dfd = dfc.copy()
dfd.loc[2, 'a'] = 11
dfd

The following *can* work at times, but it is not guaranteed to, and therefore should be avoided:

.. ipython:: python
:okwarning:

dfd = dfc.copy()
dfd['a'][2] = 111
dfd

Last, the subsequent example will **not** work at all, and so should be avoided:

.. ipython:: python
:okwarning:
:okexcept:

with pd.option_context('mode.chained_assignment','raise'):
dfd.loc[0]['a'] = 1111

.. warning::

The chained assignment warnings / exceptions are aiming to inform the user of a possibly invalid
assignment. There may be false positives; situations where a chained assignment is inadvertently
reported.
:ref:`Copy-on-Write <copy_on_write>` is the new default with pandas 3.0.
This means than chained indexing will never work. As a consequence,
the ``SettingWithCopyWarning`` is not necessary anymore.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the As a consequence, the ``SettingWithCopyWarning`` is not necessary anymore. sentence since that warning will no longer exist

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

See :ref:`this section <copy_on_write_chained_assignment>`
for more context.
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.13.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ API changes
statistical mode(s) by axis/Series. (:issue:`5367`)

- Chained assignment will now by default warn if the user is assigning to a copy. This can be changed
with the option ``mode.chained_assignment``, allowed options are ``raise/warn/None``. See :ref:`the docs<indexing.view_versus_copy>`.
with the option ``mode.chained_assignment``, allowed options are ``raise/warn/None``.

.. ipython:: python

Expand Down
4 changes: 2 additions & 2 deletions doc/source/whatsnew/v0.13.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ Highlights include:
.. warning::

0.13.1 fixes a bug that was caused by a combination of having numpy < 1.8, and doing
chained assignment on a string-like array. Please review :ref:`the docs<indexing.view_versus_copy>`,
chained indexing can have unexpected results and should generally be avoided.
chained assignment on a string-like array.
Chained indexing can have unexpected results and should generally be avoided.

This would previously segfault:

Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -383,7 +383,7 @@ Other enhancements
- Added ``validate`` argument to :meth:`DataFrame.join` (:issue:`46622`)
- Added ``numeric_only`` argument to :meth:`.Resampler.sum`, :meth:`.Resampler.prod`, :meth:`.Resampler.min`, :meth:`.Resampler.max`, :meth:`.Resampler.first`, and :meth:`.Resampler.last` (:issue:`46442`)
- ``times`` argument in :class:`.ExponentialMovingWindow` now accepts ``np.timedelta64`` (:issue:`47003`)
- :class:`.DataError`, :class:`.SpecificationError`, :class:`.SettingWithCopyError`, :class:`.SettingWithCopyWarning`, :class:`.NumExprClobberingError`, :class:`.UndefinedVariableError`, :class:`.IndexingError`, :class:`.PyperclipException`, :class:`.PyperclipWindowsException`, :class:`.CSSWarning`, :class:`.PossibleDataLossError`, :class:`.ClosedFileError`, :class:`.IncompatibilityWarning`, :class:`.AttributeConflictWarning`, :class:`.DatabaseError`, :class:`.PossiblePrecisionLoss`, :class:`.ValueLabelTypeMismatch`, :class:`.InvalidColumnName`, and :class:`.CategoricalConversionWarning` are now exposed in ``pandas.errors`` (:issue:`27656`)
- :class:`.DataError`, :class:`.SpecificationError`, ``SettingWithCopyError``, ``SettingWithCopyWarning``, :class:`.NumExprClobberingError`, :class:`.UndefinedVariableError`, :class:`.IndexingError`, :class:`.PyperclipException`, :class:`.PyperclipWindowsException`, :class:`.CSSWarning`, :class:`.PossibleDataLossError`, :class:`.ClosedFileError`, :class:`.IncompatibilityWarning`, :class:`.AttributeConflictWarning`, :class:`.DatabaseError`, :class:`.PossiblePrecisionLoss`, :class:`.ValueLabelTypeMismatch`, :class:`.InvalidColumnName`, and :class:`.CategoricalConversionWarning` are now exposed in ``pandas.errors`` (:issue:`27656`)
- Added ``check_like`` argument to :func:`testing.assert_series_equal` (:issue:`47247`)
- Add support for :meth:`.DataFrameGroupBy.ohlc` and :meth:`.SeriesGroupBy.ohlc` for extension array dtypes (:issue:`37493`)
- Allow reading compressed SAS files with :func:`read_sas` (e.g., ``.sas7bdat.gz`` files)
Expand Down
16 changes: 6 additions & 10 deletions pandas/core/apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,6 @@

import numpy as np

from pandas._config import option_context

from pandas._libs import lib
from pandas._libs.internals import BlockValuesRefs
from pandas._typing import (
Expand Down Expand Up @@ -1076,14 +1074,12 @@ def apply_series_generator(self) -> tuple[ResType, Index]:

results = {}

with option_context("mode.chained_assignment", None):
for i, v in enumerate(series_gen):
# ignore SettingWithCopy here in case the user mutates
results[i] = self.func(v, *self.args, **self.kwargs)
if isinstance(results[i], ABCSeries):
# If we have a view on v, we need to make a copy because
# series_generator will swap out the underlying data
results[i] = results[i].copy(deep=False)
for i, v in enumerate(series_gen):
results[i] = self.func(v, *self.args, **self.kwargs)
if isinstance(results[i], ABCSeries):
# If we have a view on v, we need to make a copy because
# series_generator will swap out the underlying data
results[i] = results[i].copy(deep=False)

return results, res_index

Expand Down
Loading
Loading