-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDEP-7: Consistent copy/view semantics in pandas with Copy-on-Write #51463
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up.
Left some comments on some outdated parts that could be freshened up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big +1
Not a blocker for this PR or anything, but perhaps the SettingWithCopy
warning could be updated to also ask users to consider enabling copy-on-write?
That's a good idea to do at the point when we are more confident to point general users towards it. In general we will have to add warnings for behaviours that will change (eg for chained assignment), and so that might already replace some of the current SettingWithCopyWarnings as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that some notes about the transition should be part of the PDEP. Namely, we change the SettingWithCopyWarning
to suggest using the new mode, and whether we can implement other checks where people get no warning, but the behavior will change.
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Updated a couple of sections to be more aligned with the current implementation, also mentioning things that we intend to do (warnings mode, upgrade, ...) |
@pandas-dev/pandas-core are there any more comments on the text of the proposal? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my comments are about grammar/syntax, except in one place where I think an example would be helpful.
I also wonder if we should do something to make it easier for users to identify defensive copying that is no longer needed, e.g., df_filtered = df[df["A"] > 1].copy()
I'm not sure this is possible.
You won’t need a defensive copy anymore, ever. That’s one of the advantages |
Yes, but there is a lot of code out there that does have defensive copying going on (including things my staff has written), so the question is whether we help people identify it. For example, we could raise a warning whenever |
If the alternative is adding a fluffy keyword, then id go for not helping people identify it, just document it. @jorisvandenbossche can you confirm that the TL;DR here is pretty much "make official that we will enable CoW for all users"? |
That's correct, essentially what we already implemented under the optional |
Thanks Irv for the textual comments, will update for those.
I am also not fond of adding another keyword for this, but in theory we could add some option to trigger a warning for this. For example, it should be quite easy to check within the I don't know if it is worth adding a warning for this, but at least technically it shouldn't be hard. The main question would be how to let users enable it (I wouldn't do with a keyword in |
Could we move this discussion to a separate issue? I don't think that this should be part of the PDEP since it's not really relevant for CoW generally speaking. |
/preview |
No preview found for PR #51463. Did the docs build complete? |
Just FYI: The preview thing is broken |
Yeah, I emailed Marc about this and he said he refreshed the token that expired. Looks like something else is still going wrong :( cc @datapythonista for more info. |
Yes, sorry, I started to refresh the expired token but there were some problems and I couldn't finish until now. I see an unrelated error when uncompressing the artifact file in the server, but not sure if it's the specific file I was testing. I re-run the docs build here, when it finishes you can retry the preview again and hopefully it works, otherwise I'll have a look tomorrow. For other PRs, note that the preview is generated in the docs build, the |
/preview |
No preview found for PR #51463. Did the docs build complete? |
Co-authored-by: Irv Lustig <irv@princeton.com>
Co-authored-by: Irv Lustig <irv@princeton.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, this is ready for a vote.
Started the vote at #55511! |
Looks like the vote passed in #55511, I think we're good to set this as approved? |
This PR adds the google doc about the Copy-on-Write proposal that came out of the discussion in #36195 as a PDEP.
It predates the PDEPs, but bascially back then I wrote what would now be a PDEP, and since this is still actively being developed, I think it would be useful to have an "official" reference to the proposal, instead of pointing to my google doc.
The first commit is just a basic conversion of the google doc to the markdown. I will do a first follow-up commit updating some outdated parts (eg referencing the POC implementation).
I am fine with still getting review of the text / details here, or we can also take it as it is.
And then the question is also how to mark the status of the PDEP. Can we assume it is already accepted / under implementation? (it went through the somewhat vague pre-PDEP decision process of some kind of consensus / nobody loudly objecting anymore, and it's now available and documented as opt-in). But am also happy to let it pass through the PDEP process of approving this officially.
cc @pandas-dev/pandas-core @pandas-dev/pandas-triage