Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve(?) explanation of SettingWithCopy warning #11746

Merged
merged 1 commit into from
Dec 4, 2015

Conversation

ischwabacher
Copy link
Contributor

After playing with R a bunch, I started feeling like the explanation of SettingWithCopy wasn't getting to the core of the matter, which is actually an essential consequence of python slice assignment semantics. Here's how python handles chained assignment:

df['foo']['bar'] = quux
# becomes
df.__getitem__('foo').__setitem__('bar', quux)

whereas in R, it's this:

df["foo"]["bar"] <- quux
# becomes
df["foo"] <- `[<-`(df["foo"], "bar", quux)
# becomes
df <- `[<-`(df, "foo", `[<-`(`[`(df, "foo"), "bar", quux))

That last is a lot of line noise, though the R method names [ and [<- are more concise than __getitem__ and __setitem__! But imagine that you could call __setitem__ with a kwarg inplace=False that would cause it to return a modified copy instead of modifying the original object. Then the R version would translate to this in python:

df = df.__setitem__('foo',
                    df.__getitem__('foo')
                        .__setitem__('bar', quux, inplace=False),
                    inplace=False)

This is incredibly awkward, but it has the advantage of making SettingWithCopy unnecessary— everything is a copy, and yet things get set nonetheless.

So this commit is an attempt to explain this without requiring the reader to know R.

@jreback
Copy link
Contributor

jreback commented Dec 2, 2015

thanks!

but you are making a wild assumption here, namely that users actually read documentation! (lmao)

@jreback jreback added the Docs label Dec 2, 2015
@jreback
Copy link
Contributor

jreback commented Dec 2, 2015

note #11500 (hopefully going to happen for 0.18.0)

@jreback jreback added this to the 0.18.0 milestone Dec 2, 2015
@@ -1525,20 +1525,35 @@ faster, and allows one to index *both* axes if so desired.
Why does the assignment when using chained indexing fail!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

So, why does this show the ``SettingWithCopy`` warning / and possibly not work when you do chained indexing and assignment:
The above is just a performance issue. What's up with the ``SettingWithCopy`` warning? We don't **usually** throw warnings around when you do something that might cost a few extra milliseconds!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First sentence is odd here.

@ischwabacher
Copy link
Contributor Author

Well, it'll be nice if this is obsolete by the time it hits a release, but it was fun to write anyway. :) And users do read documentation. Sometimes. If you go on StackOverflow and direct them to it.

property in the first example. Because ``loc`` is designed to be a proxy
object, ``dfmi.loc`` is guaranteed to be a view of ``dfmi``, albeit with
different indexing behavior (since that's the purpose of ``loc``). This
applies only to ``dfmi.loc`` itself; ``dfmi.loc.__getitem__('one')`` may of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you are saying, but I think this is non-sensical to a user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this:

You might be wondering whether dfmi.loc might be a copy of dfmi rather than a view, which would break the first indexing method. Pandas guarantees that dfmi.loc (and dfmi.ix and dfmi.iloc) are views into dfmi, which it can do because these must see all of dfmi. But dfmi.__getitem__(idx) is some subset or reordering of dfmi and therefore can't receive this guarantee.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think better to explain that .loc does both operations (getting & setting) simultaneously and thus can guarnteed operate on the original object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a better angle. It's the whole point of .loc, after all.

You may be wondering whether we should be concerned about the loc property in the first example. But dfmi.loc is guaranteed to be dfmi itself with modified indexing behavior, so dfmi.loc.__getitem__ / dfmi.loc.__setitem__ operate on dfmi directly. Of course, dfmi.loc.__getitem__(idx) may be a view or a copy of dfmi.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep looks good..

squash em & ping

@ischwabacher
Copy link
Contributor Author

Squashed. Just noticed I used "You might/may be wondering" to start two consecutive paragraphs. Oh well.

@jreback
Copy link
Contributor

jreback commented Dec 4, 2015

ok, feel free to edit again :)

After playing with R a bunch, I started feeling like the explanation of
`SettingWithCopy` wasn't getting to the core of the matter, which is actually
an essential consequence of python slice assignment semantics. Here's how
python handles chained assignment:

```python
df['foo']['bar'] = quux
df.__getitem__('foo').__setitem__('bar', quux)
```

whereas in R, it's this:

```R
df["foo"]["bar"] <- quux
df["foo"] <- `[<-`(df["foo"], "bar", quux)
df <- `[<-`(df, "foo", `[<-`(`[`(df, "foo"), "bar", quux))
```

That last is a lot of line noise, though the R method names `` `[` `` and
`` `[<-` `` are more concise than `__getitem__` and `__setitem__`! But imagine
that you could call `__setitem__` with a kwarg `inplace=False` that would cause
it to return a modified copy instead of modifying the original object. Then the
R version would translate to this in python:

```python
df = df.__setitem__('foo',
                    df.__getitem__('foo')
                      .__setitem__('bar', quux, inplace=False),
                    inplace=False)
```
This is incredibly awkward, but it has the advantage of making
`SettingWithCopy` unnecessary&mdash; *everything* is a copy, and yet things get
set nonetheless.

So this commit is an attempt to explain this without requiring the reader to
know R.
@ischwabacher
Copy link
Contributor Author

There. Plus some emphasis. :)

jreback added a commit that referenced this pull request Dec 4, 2015
Improve(?) explanation of SettingWithCopy warning
@jreback jreback merged commit f5fbec8 into pandas-dev:master Dec 4, 2015
@jreback
Copy link
Contributor

jreback commented Dec 4, 2015

ok thanks! (and pls look at the built docs and see if anything needs tweaking)......usually at least 1 hr before travis builds..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants