Skip to content

DEPR: .take(..., is_copy=True); rename is_copy -> copy #27357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Jul 12, 2019 · 16 comments · Fixed by #30615 or #30784
Closed

DEPR: .take(..., is_copy=True); rename is_copy -> copy #27357

jreback opened this issue Jul 12, 2019 · 16 comments · Fixed by #30615 or #30784
Assignees
Labels
Deprecate Functionality to remove in pandas good first issue
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jul 12, 2019

this is more in-line with our other signatures. ideally would do this for 0.25

@jreback jreback added Deprecate Functionality to remove in pandas good first issue labels Jul 12, 2019
@jreback jreback added this to the 0.25.0 milestone Jul 12, 2019
@jorisvandenbossche
Copy link
Member

Or simply deprecate this without replacement? A take operation is in principle always a copy

@jreback
Copy link
Contributor Author

jreback commented Jul 13, 2019

you can try to deprecate it entirely if that works

@jorisvandenbossche
Copy link
Member

So the special case for "no copy" is when the result is exactly the same as the caller. Personally, I don't think we need to provide this special case (if the user cares about the performance in that corner case, they can do the check themselves)

@jreback jreback modified the milestones: 0.25.0, 1.0 Jul 17, 2019
chrisstpierre added a commit to chrisstpierre/pandas that referenced this issue Sep 30, 2019
@chrisstpierre
Copy link

chrisstpierre commented Sep 30, 2019

I opened a PR to deprecate it in 0.25.2. is_copy is supposed to be removed all-together in 1.0.0.

take always returns a copy

pandas/pandas/core/generic.py

Lines 3437 to 3439 in c4489cb

new_data = self._data.take(
indices, axis=self._get_block_manager_axis(axis), verify=True
)

return self.reindex_indexer(
new_axis=new_labels, indexer=indexer, axis=axis, allow_dups=True
)

def reindex_indexer(
self, new_axis, indexer, axis, fill_value=None, allow_dups=False, copy=True
):

Copy is true by default.

@jorisvandenbossche
Copy link
Member

@chrisstpierre can you open a new PR against master? (using the same branch, but not selecting the 0.25.x branch when opening the PR)
I looked at the commit you added, and that by itself looks good!

chrisstpierre added a commit to chrisstpierre/pandas that referenced this issue Oct 2, 2019
@chrisstpierre
Copy link

@jorisvandenbossche just did that.

But I still have .. deprecated:: 0.25.2.

Should it be 1.0.0 ?

@TomAugspurger
Copy link
Contributor

There's a stalled PR at #28751 that provides a nice starting point.

@TomAugspurger TomAugspurger modified the milestones: 1.0, Contributions Welcome Dec 30, 2019
@ryankarlos
Copy link
Contributor

take

@jreback jreback modified the milestones: Contributions Welcome, 1.0 Jan 4, 2020
@jorisvandenbossche
Copy link
Member

Reopening this issue (to have an issue with a milestone) to track the discussion in the merged PR: #30615 (comment)

I think that the current deprecation is keeping the "wrong" behaviour, as in my head, the result of take is always a copy, so we shouldn't set _is_copy IMO.
(but I also might be wrong, it's pretty confusing, and the is_copy keyword has also a very confusing name and unclear explanation of what it is actually intended to do)

@TomAugspurger
Copy link
Contributor

@jorisvandenbossche do you think this is a blocker for the RC?

@jorisvandenbossche
Copy link
Member

Ah, forgot this one, we should have discussed it yesterday on the call ..

Well, I personally think the merged PR is a step in the wrong direction. It's easy to revert that, but IMO the better solution is to make take do a proper copy. But that is 1) something that needs to be discussed and 2) is more work to implement.

@jorisvandenbossche
Copy link
Member

Given the lack of input of others, shall we revert the PR for the RC?

@jreback
Copy link
Contributor Author

jreback commented Jan 10, 2020

not sure what exactly is the issue you are bringing up here; we deprecated is_copy and you reopened exactly why?

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 15, 2020

not sure what exactly is the issue you are bringing up here; we deprecated is_copy and you reopened exactly why?

Right now, after the deprecation, we will keep is_copy=True, while I think it should be is_copy=False that becomes the default. Please read #27357 (comment) and #30615 (comment) (and comments below)

@jorisvandenbossche
Copy link
Member

Example consequence of the PR. Using sample() can now give a false-positive warning:

In [11]: df = pd.DataFrame(np.random.randn(10,3), columns=['a', 'b', 'c'])   

In [12]: df2 = df.sample(3)  

In [14]: df2['d']= 1   
/home/joris/miniconda3/envs/dev/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  #!/home/joris/miniconda3/envs/dev/bin/python

The above is on master, on 0.25.3 this did not raise a warning.

@jorisvandenbossche
Copy link
Member

I updated my PR that made take to always copy to include an internal take with the old behaviour to be used in the indexing code: #30784

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment