-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve(?) explanation of SettingWithCopy warning #11746
Conversation
thanks! but you are making a wild assumption here, namely that users actually read documentation! (lmao) |
note #11500 (hopefully going to happen for 0.18.0) |
@@ -1525,20 +1525,35 @@ faster, and allows one to index *both* axes if so desired. | |||
Why does the assignment when using chained indexing fail! | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
|
|||
So, why does this show the ``SettingWithCopy`` warning / and possibly not work when you do chained indexing and assignment: | |||
The above is just a performance issue. What's up with the ``SettingWithCopy`` warning? We don't **usually** throw warnings around when you do something that might cost a few extra milliseconds! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First sentence is odd here.
Well, it'll be nice if this is obsolete by the time it hits a release, but it was fun to write anyway. :) And users do read documentation. Sometimes. If you go on StackOverflow and direct them to it. |
property in the first example. Because ``loc`` is designed to be a proxy | ||
object, ``dfmi.loc`` is guaranteed to be a view of ``dfmi``, albeit with | ||
different indexing behavior (since that's the purpose of ``loc``). This | ||
applies only to ``dfmi.loc`` itself; ``dfmi.loc.__getitem__('one')`` may of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get what you are saying, but I think this is non-sensical to a user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like this:
You might be wondering whether
dfmi.loc
might be a copy ofdfmi
rather than a view, which would break the first indexing method. Pandas guarantees thatdfmi.loc
(anddfmi.ix
anddfmi.iloc
) are views intodfmi
, which it can do because these must see all ofdfmi
. Butdfmi.__getitem__(idx)
is some subset or reordering ofdfmi
and therefore can't receive this guarantee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better to explain that .loc
does both operations (getting & setting) simultaneously and thus can guarnteed operate on the original object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a better angle. It's the whole point of .loc
, after all.
You may be wondering whether we should be concerned about the
loc
property in the first example. Butdfmi.loc
is guaranteed to bedfmi
itself with modified indexing behavior, sodfmi.loc.__getitem__
/dfmi.loc.__setitem__
operate ondfmi
directly. Of course,dfmi.loc.__getitem__(idx)
may be a view or a copy ofdfmi
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep looks good..
squash em & ping
Squashed. Just noticed I used "You might/may be wondering" to start two consecutive paragraphs. Oh well. |
ok, feel free to edit again :) |
After playing with R a bunch, I started feeling like the explanation of `SettingWithCopy` wasn't getting to the core of the matter, which is actually an essential consequence of python slice assignment semantics. Here's how python handles chained assignment: ```python df['foo']['bar'] = quux df.__getitem__('foo').__setitem__('bar', quux) ``` whereas in R, it's this: ```R df["foo"]["bar"] <- quux df["foo"] <- `[<-`(df["foo"], "bar", quux) df <- `[<-`(df, "foo", `[<-`(`[`(df, "foo"), "bar", quux)) ``` That last is a lot of line noise, though the R method names `` `[` `` and `` `[<-` `` are more concise than `__getitem__` and `__setitem__`! But imagine that you could call `__setitem__` with a kwarg `inplace=False` that would cause it to return a modified copy instead of modifying the original object. Then the R version would translate to this in python: ```python df = df.__setitem__('foo', df.__getitem__('foo') .__setitem__('bar', quux, inplace=False), inplace=False) ``` This is incredibly awkward, but it has the advantage of making `SettingWithCopy` unnecessary— *everything* is a copy, and yet things get set nonetheless. So this commit is an attempt to explain this without requiring the reader to know R.
There. Plus some emphasis. :) |
Improve(?) explanation of SettingWithCopy warning
ok thanks! (and pls look at the built docs and see if anything needs tweaking)......usually at least 1 hr before travis builds.. |
After playing with R a bunch, I started feeling like the explanation of
SettingWithCopy
wasn't getting to the core of the matter, which is actually an essential consequence of python slice assignment semantics. Here's how python handles chained assignment:whereas in R, it's this:
That last is a lot of line noise, though the R method names
[
and[<-
are more concise than__getitem__
and__setitem__
! But imagine that you could call__setitem__
with a kwarginplace=False
that would cause it to return a modified copy instead of modifying the original object. Then the R version would translate to this in python:This is incredibly awkward, but it has the advantage of making
SettingWithCopy
unnecessary— everything is a copy, and yet things get set nonetheless.So this commit is an attempt to explain this without requiring the reader to know R.