-
-
Notifications
You must be signed in to change notification settings - Fork 19.5k
Improve(?) explanation of SettingWithCopy warning #11746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
thanks! but you are making a wild assumption here, namely that users actually read documentation! (lmao) |
|
note #11500 (hopefully going to happen for 0.18.0) |
doc/source/indexing.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First sentence is odd here.
|
Well, it'll be nice if this is obsolete by the time it hits a release, but it was fun to write anyway. :) And users do read documentation. Sometimes. If you go on StackOverflow and direct them to it. |
doc/source/indexing.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get what you are saying, but I think this is non-sensical to a user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like this:
You might be wondering whether
dfmi.locmight be a copy ofdfmirather than a view, which would break the first indexing method. Pandas guarantees thatdfmi.loc(anddfmi.ixanddfmi.iloc) are views intodfmi, which it can do because these must see all ofdfmi. Butdfmi.__getitem__(idx)is some subset or reordering ofdfmiand therefore can't receive this guarantee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better to explain that .loc does both operations (getting & setting) simultaneously and thus can guarnteed operate on the original object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a better angle. It's the whole point of .loc, after all.
You may be wondering whether we should be concerned about the
locproperty in the first example. Butdfmi.locis guaranteed to bedfmiitself with modified indexing behavior, sodfmi.loc.__getitem__/dfmi.loc.__setitem__operate ondfmidirectly. Of course,dfmi.loc.__getitem__(idx)may be a view or a copy ofdfmi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep looks good..
squash em & ping
|
Squashed. Just noticed I used "You might/may be wondering" to start two consecutive paragraphs. Oh well. |
|
ok, feel free to edit again :) |
After playing with R a bunch, I started feeling like the explanation of
`SettingWithCopy` wasn't getting to the core of the matter, which is actually
an essential consequence of python slice assignment semantics. Here's how
python handles chained assignment:
```python
df['foo']['bar'] = quux
df.__getitem__('foo').__setitem__('bar', quux)
```
whereas in R, it's this:
```R
df["foo"]["bar"] <- quux
df["foo"] <- `[<-`(df["foo"], "bar", quux)
df <- `[<-`(df, "foo", `[<-`(`[`(df, "foo"), "bar", quux))
```
That last is a lot of line noise, though the R method names `` `[` `` and
`` `[<-` `` are more concise than `__getitem__` and `__setitem__`! But imagine
that you could call `__setitem__` with a kwarg `inplace=False` that would cause
it to return a modified copy instead of modifying the original object. Then the
R version would translate to this in python:
```python
df = df.__setitem__('foo',
df.__getitem__('foo')
.__setitem__('bar', quux, inplace=False),
inplace=False)
```
This is incredibly awkward, but it has the advantage of making
`SettingWithCopy` unnecessary— *everything* is a copy, and yet things get
set nonetheless.
So this commit is an attempt to explain this without requiring the reader to
know R.
|
There. Plus some emphasis. :) |
Improve(?) explanation of SettingWithCopy warning
|
ok thanks! (and pls look at the built docs and see if anything needs tweaking)......usually at least 1 hr before travis builds.. |
After playing with R a bunch, I started feeling like the explanation of
SettingWithCopywasn't getting to the core of the matter, which is actually an essential consequence of python slice assignment semantics. Here's how python handles chained assignment:whereas in R, it's this:
That last is a lot of line noise, though the R method names
[and[<-are more concise than__getitem__and__setitem__! But imagine that you could call__setitem__with a kwarginplace=Falsethat would cause it to return a modified copy instead of modifying the original object. Then the R version would translate to this in python:This is incredibly awkward, but it has the advantage of making
SettingWithCopyunnecessary— everything is a copy, and yet things get set nonetheless.So this commit is an attempt to explain this without requiring the reader to know R.