-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: Remove SettingWithCopyWarning #56614
Changes from 8 commits
b2183ba
53c3f76
a0ddaa1
8e29d0d
42e8322
7929800
8ef42d7
9f826ee
60653ad
62385bf
00cb5ff
60c7ffe
516bd88
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,13 +11,6 @@ and :ref:`other advanced indexing features <advanced.index_types>`. | |
|
||
See the :ref:`Indexing and Selecting Data <indexing>` for general indexing documentation. | ||
|
||
.. warning:: | ||
|
||
Whether a copy or a reference is returned for a setting operation may | ||
depend on the context. This is sometimes called ``chained assignment`` and | ||
should be avoided. See :ref:`Returning a View versus Copy | ||
<indexing.view_versus_copy>`. | ||
|
||
See the :ref:`cookbook<cookbook.selection>` for some advanced strategies. | ||
|
||
.. _advanced.hierarchical: | ||
|
@@ -402,6 +395,7 @@ slicers on a single axis. | |
Furthermore, you can *set* the values using the following methods. | ||
|
||
.. ipython:: python | ||
:okwarning: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this the pdep warning? Best to just change this example or remove if it won't work anymore There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. doesn't seem to be necessary anymore, lets see what ci has to say There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll keep this in for now, I can't reproduce locally but it shows up on ci, it's weird |
||
|
||
df2 = dfmi.copy() | ||
df2.loc(axis=0)[:, :, ["C1", "C3"]] = -10 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,13 +29,6 @@ this area. | |
production code, we recommended that you take advantage of the optimized | ||
pandas data access methods exposed in this chapter. | ||
|
||
.. warning:: | ||
|
||
Whether a copy or a reference is returned for a setting operation, may | ||
depend on the context. This is sometimes called ``chained assignment`` and | ||
should be avoided. See :ref:`Returning a View versus Copy | ||
<indexing.view_versus_copy>`. | ||
|
||
See the :ref:`MultiIndex / Advanced Indexing <advanced>` for ``MultiIndex`` and more advanced indexing documentation. | ||
|
||
See the :ref:`cookbook<cookbook.selection>` for some advanced strategies. | ||
|
@@ -299,12 +292,6 @@ largely as a convenience since it is such a common operation. | |
Selection by label | ||
------------------ | ||
|
||
.. warning:: | ||
|
||
Whether a copy or a reference is returned for a setting operation, may depend on the context. | ||
This is sometimes called ``chained assignment`` and should be avoided. | ||
See :ref:`Returning a View versus Copy <indexing.view_versus_copy>`. | ||
|
||
.. warning:: | ||
|
||
``.loc`` is strict when you present slicers that are not compatible (or convertible) with the index type. For example | ||
|
@@ -445,12 +432,6 @@ For more information about duplicate labels, see | |
Selection by position | ||
--------------------- | ||
|
||
.. warning:: | ||
|
||
Whether a copy or a reference is returned for a setting operation, may depend on the context. | ||
This is sometimes called ``chained assignment`` and should be avoided. | ||
See :ref:`Returning a View versus Copy <indexing.view_versus_copy>`. | ||
|
||
pandas provides a suite of methods in order to get **purely integer based indexing**. The semantics follow closely Python and NumPy slicing. These are ``0-based`` indexing. When slicing, the start bound is *included*, while the upper bound is *excluded*. Trying to use a non-integer, even a **valid** label will raise an ``IndexError``. | ||
|
||
The ``.iloc`` attribute is the primary access method. The following are valid inputs: | ||
|
@@ -1722,234 +1703,11 @@ You can assign a custom index to the ``index`` attribute: | |
df_idx.index = pd.Index([10, 20, 30, 40], name="a") | ||
df_idx | ||
|
||
.. _indexing.view_versus_copy: | ||
|
||
Returning a view versus a copy | ||
------------------------------ | ||
|
||
.. warning:: | ||
|
||
:ref:`Copy-on-Write <copy_on_write>` | ||
will become the new default in pandas 3.0. This means than chained indexing will | ||
never work. As a consequence, the ``SettingWithCopyWarning`` won't be necessary | ||
anymore. | ||
See :ref:`this section <copy_on_write_chained_assignment>` | ||
for more context. | ||
We recommend turning Copy-on-Write on to leverage the improvements with | ||
|
||
``` | ||
pd.options.mode.copy_on_write = True | ||
``` | ||
|
||
even before pandas 3.0 is available. | ||
|
||
When setting values in a pandas object, care must be taken to avoid what is called | ||
``chained indexing``. Here is an example. | ||
|
||
.. ipython:: python | ||
|
||
dfmi = pd.DataFrame([list('abcd'), | ||
list('efgh'), | ||
list('ijkl'), | ||
list('mnop')], | ||
columns=pd.MultiIndex.from_product([['one', 'two'], | ||
['first', 'second']])) | ||
dfmi | ||
|
||
Compare these two access methods: | ||
|
||
.. ipython:: python | ||
|
||
dfmi['one']['second'] | ||
|
||
.. ipython:: python | ||
|
||
dfmi.loc[:, ('one', 'second')] | ||
|
||
These both yield the same results, so which should you use? It is instructive to understand the order | ||
of operations on these and why method 2 (``.loc``) is much preferred over method 1 (chained ``[]``). | ||
|
||
``dfmi['one']`` selects the first level of the columns and returns a DataFrame that is singly-indexed. | ||
Then another Python operation ``dfmi_with_one['second']`` selects the series indexed by ``'second'``. | ||
This is indicated by the variable ``dfmi_with_one`` because pandas sees these operations as separate events. | ||
e.g. separate calls to ``__getitem__``, so it has to treat them as linear operations, they happen one after another. | ||
|
||
Contrast this to ``df.loc[:,('one','second')]`` which passes a nested tuple of ``(slice(None),('one','second'))`` to a single call to | ||
``__getitem__``. This allows pandas to deal with this as a single entity. Furthermore this order of operations *can* be significantly | ||
faster, and allows one to index *both* axes if so desired. | ||
|
||
Why does assignment fail when using chained indexing? | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. warning:: | ||
|
||
:ref:`Copy-on-Write <copy_on_write>` | ||
will become the new default in pandas 3.0. This means than chained indexing will | ||
never work. As a consequence, the ``SettingWithCopyWarning`` won't be necessary | ||
anymore. | ||
See :ref:`this section <copy_on_write_chained_assignment>` | ||
for more context. | ||
We recommend turning Copy-on-Write on to leverage the improvements with | ||
|
||
``` | ||
pd.options.mode.copy_on_write = True | ||
``` | ||
|
||
even before pandas 3.0 is available. | ||
|
||
The problem in the previous section is just a performance issue. What's up with | ||
the ``SettingWithCopy`` warning? We don't **usually** throw warnings around when | ||
you do something that might cost a few extra milliseconds! | ||
|
||
But it turns out that assigning to the product of chained indexing has | ||
inherently unpredictable results. To see this, think about how the Python | ||
interpreter executes this code: | ||
|
||
.. code-block:: python | ||
|
||
dfmi.loc[:, ('one', 'second')] = value | ||
# becomes | ||
dfmi.loc.__setitem__((slice(None), ('one', 'second')), value) | ||
|
||
But this code is handled differently: | ||
|
||
.. code-block:: python | ||
|
||
dfmi['one']['second'] = value | ||
# becomes | ||
dfmi.__getitem__('one').__setitem__('second', value) | ||
|
||
See that ``__getitem__`` in there? Outside of simple cases, it's very hard to | ||
predict whether it will return a view or a copy (it depends on the memory layout | ||
of the array, about which pandas makes no guarantees), and therefore whether | ||
the ``__setitem__`` will modify ``dfmi`` or a temporary object that gets thrown | ||
out immediately afterward. **That's** what ``SettingWithCopy`` is warning you | ||
about! | ||
|
||
.. note:: You may be wondering whether we should be concerned about the ``loc`` | ||
property in the first example. But ``dfmi.loc`` is guaranteed to be ``dfmi`` | ||
itself with modified indexing behavior, so ``dfmi.loc.__getitem__`` / | ||
``dfmi.loc.__setitem__`` operate on ``dfmi`` directly. Of course, | ||
``dfmi.loc.__getitem__(idx)`` may be a view or a copy of ``dfmi``. | ||
|
||
Sometimes a ``SettingWithCopy`` warning will arise at times when there's no | ||
obvious chained indexing going on. **These** are the bugs that | ||
``SettingWithCopy`` is designed to catch! pandas is probably trying to warn you | ||
that you've done this: | ||
|
||
.. code-block:: python | ||
|
||
def do_something(df): | ||
foo = df[['bar', 'baz']] # Is foo a view? A copy? Nobody knows! | ||
# ... many lines here ... | ||
# We don't know whether this will modify df or not! | ||
foo['quux'] = value | ||
return foo | ||
|
||
Yikes! | ||
|
||
.. _indexing.evaluation_order: | ||
|
||
Evaluation order matters | ||
~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. warning:: | ||
|
||
:ref:`Copy-on-Write <copy_on_write>` | ||
will become the new default in pandas 3.0. This means than chained indexing will | ||
never work. As a consequence, the ``SettingWithCopyWarning`` won't be necessary | ||
anymore. | ||
See :ref:`this section <copy_on_write_chained_assignment>` | ||
for more context. | ||
We recommend turning Copy-on-Write on to leverage the improvements with | ||
|
||
``` | ||
pd.options.mode.copy_on_write = True | ||
``` | ||
|
||
even before pandas 3.0 is available. | ||
|
||
When you use chained indexing, the order and type of the indexing operation | ||
partially determine whether the result is a slice into the original object, or | ||
a copy of the slice. | ||
|
||
pandas has the ``SettingWithCopyWarning`` because assigning to a copy of a | ||
slice is frequently not intentional, but a mistake caused by chained indexing | ||
returning a copy where a slice was expected. | ||
|
||
If you would like pandas to be more or less trusting about assignment to a | ||
chained indexing expression, you can set the :ref:`option <options>` | ||
``mode.chained_assignment`` to one of these values: | ||
|
||
* ``'warn'``, the default, means a ``SettingWithCopyWarning`` is printed. | ||
* ``'raise'`` means pandas will raise a ``SettingWithCopyError`` | ||
you have to deal with. | ||
* ``None`` will suppress the warnings entirely. | ||
|
||
.. ipython:: python | ||
:okwarning: | ||
|
||
dfb = pd.DataFrame({'a': ['one', 'one', 'two', | ||
'three', 'two', 'one', 'six'], | ||
'c': np.arange(7)}) | ||
|
||
# This will show the SettingWithCopyWarning | ||
# but the frame values will be set | ||
dfb['c'][dfb['a'].str.startswith('o')] = 42 | ||
|
||
This however is operating on a copy and will not work. | ||
|
||
.. ipython:: python | ||
:okwarning: | ||
:okexcept: | ||
|
||
with pd.option_context('mode.chained_assignment','warn'): | ||
dfb[dfb['a'].str.startswith('o')]['c'] = 42 | ||
|
||
A chained assignment can also crop up in setting in a mixed dtype frame. | ||
|
||
.. note:: | ||
|
||
These setting rules apply to all of ``.loc/.iloc``. | ||
|
||
The following is the recommended access method using ``.loc`` for multiple items (using ``mask``) and a single item using a fixed index: | ||
|
||
.. ipython:: python | ||
|
||
dfc = pd.DataFrame({'a': ['one', 'one', 'two', | ||
'three', 'two', 'one', 'six'], | ||
'c': np.arange(7)}) | ||
dfd = dfc.copy() | ||
# Setting multiple items using a mask | ||
mask = dfd['a'].str.startswith('o') | ||
dfd.loc[mask, 'c'] = 42 | ||
dfd | ||
|
||
# Setting a single item | ||
dfd = dfc.copy() | ||
dfd.loc[2, 'a'] = 11 | ||
dfd | ||
|
||
The following *can* work at times, but it is not guaranteed to, and therefore should be avoided: | ||
|
||
.. ipython:: python | ||
:okwarning: | ||
|
||
dfd = dfc.copy() | ||
dfd['a'][2] = 111 | ||
dfd | ||
|
||
Last, the subsequent example will **not** work at all, and so should be avoided: | ||
|
||
.. ipython:: python | ||
:okwarning: | ||
:okexcept: | ||
|
||
with pd.option_context('mode.chained_assignment','raise'): | ||
dfd.loc[0]['a'] = 1111 | ||
|
||
.. warning:: | ||
|
||
The chained assignment warnings / exceptions are aiming to inform the user of a possibly invalid | ||
assignment. There may be false positives; situations where a chained assignment is inadvertently | ||
reported. | ||
:ref:`Copy-on-Write <copy_on_write>` is the new default with pandas 3.0. | ||
This means than chained indexing will never work. As a consequence, | ||
the ``SettingWithCopyWarning`` is not necessary anymore. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would remove the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
See :ref:`this section <copy_on_write_chained_assignment>` | ||
for more context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have another PR with the whatsnew note, please make sure that the removal of these are mentioned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create a separate PR with a whatsnew