Skip to content
This repository has been archived by the owner on Apr 10, 2024. It is now read-only.

Copy on write for views #10

Open
wesm opened this issue Sep 1, 2016 · 6 comments
Open

Copy on write for views #10

wesm opened this issue Sep 1, 2016 · 6 comments

Comments

@wesm
Copy link
Owner

wesm commented Sep 1, 2016

I will work on a full document for this to get the conversation started, but this can be the placeholder for our discussion about COW

@shoyer
Copy link

shoyer commented Sep 1, 2016

As nice as copy-on-write would be, it's not strictly necessary in pandas 2.0 because we can choose our own consistent rules for copying once we divorce our storage from NumPy.

For example, we could say:

  1. Any indexing operation on columns uses views.
  2. Any indexing operation on rows makes a copy (all indexing operations on Series make a copy).

Given that we plan to ditch the BlockManager anyways, we would get (1) basically for free.

I'm sure there are a few use cases for view based slicing of DataFrame rows, but these are quite niche in comparison to selecting columns, and in my opinion, the unpredictability it introduces into the data model is not worth the trouble.

Copy on write for column views (and eventually, maybe row slicing) would still be nice in making pandas more intuitive, but could possibly wait until a later 2.1 or 3.0 release (supposing we're doing semantic versioning).

@wesm
Copy link
Owner Author

wesm commented Sep 1, 2016

I agree COW isn't a strict necessity for the 1 -> 2 transition. I think it's worth keeping in mind during the development process as there's a number of things we can do to make adding it later easier or more difficult. Step 1 is keeping track of parent-child relationships in a lightweight way, and we can permit mutation to start in accordance with current behavior

@wesm
Copy link
Owner Author

wesm commented Sep 5, 2016

See discussion in pandas-dev/pandas#11500

@nickeubank
Copy link

I've expressed my views on COW in pretty extensive detail elsewhere (#10954), so I'll save everyone the trouble of repeating them all here, but in short: any behavior that's consistent and easy to understand is fine by me!

Have we abandoned trying to get this in before v1.0?

@wesm
Copy link
Owner Author

wesm commented Sep 6, 2016

It's probably not too likely, since it would be an API change that would take a little time to fully understand the impact. If anyone has other thoughts (separate from the behavior of C-O-W) on this please chime in

@wesm
Copy link
Owner Author

wesm commented Sep 19, 2016

A notable benefit of copy-on-write is that operations like reset_index become zero-copy operations.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants