Copy on write for views #10

wesm · 2016-09-01T13:32:54Z

I will work on a full document for this to get the conversation started, but this can be the placeholder for our discussion about COW

shoyer · 2016-09-01T16:09:14Z

As nice as copy-on-write would be, it's not strictly necessary in pandas 2.0 because we can choose our own consistent rules for copying once we divorce our storage from NumPy.

For example, we could say:

Any indexing operation on columns uses views.
Any indexing operation on rows makes a copy (all indexing operations on Series make a copy).

Given that we plan to ditch the BlockManager anyways, we would get (1) basically for free.

I'm sure there are a few use cases for view based slicing of DataFrame rows, but these are quite niche in comparison to selecting columns, and in my opinion, the unpredictability it introduces into the data model is not worth the trouble.

Copy on write for column views (and eventually, maybe row slicing) would still be nice in making pandas more intuitive, but could possibly wait until a later 2.1 or 3.0 release (supposing we're doing semantic versioning).

wesm · 2016-09-01T17:14:47Z

I agree COW isn't a strict necessity for the 1 -> 2 transition. I think it's worth keeping in mind during the development process as there's a number of things we can do to make adding it later easier or more difficult. Step 1 is keeping track of parent-child relationships in a lightweight way, and we can permit mutation to start in accordance with current behavior

wesm · 2016-09-05T14:15:44Z

See discussion in pandas-dev/pandas#11500

nickeubank · 2016-09-06T18:03:04Z

I've expressed my views on COW in pretty extensive detail elsewhere (#10954), so I'll save everyone the trouble of repeating them all here, but in short: any behavior that's consistent and easy to understand is fine by me!

Have we abandoned trying to get this in before v1.0?

wesm · 2016-09-06T20:45:57Z

It's probably not too likely, since it would be an API change that would take a little time to fully understand the impact. If anyone has other thoughts (separate from the behavior of C-O-W) on this please chime in

wesm · 2016-09-19T20:57:37Z

A notable benefit of copy-on-write is that operations like reset_index become zero-copy operations.

shoyer mentioned this issue Sep 6, 2016

Copy on write using weakrefs (part 2) pandas-dev/pandas#12036

Closed

jreback added performance API compat labels Sep 30, 2016

nickeubank mentioned this issue Oct 20, 2016

SettingWithCopy dependence on reference count pandas-dev/pandas#14150

Closed

nickeubank mentioned this issue Jun 28, 2021

Link to plan to fix view vs copy in pandas? nickeubank/practicaldatascience_class#26

Open

shwina mentioned this issue Feb 2, 2022

Support Python 3.9 wheels and bump to v0.2.4 NVIDIA/NVTX#49

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy on write for views #10

Copy on write for views #10

wesm commented Sep 1, 2016

shoyer commented Sep 1, 2016

wesm commented Sep 1, 2016

wesm commented Sep 5, 2016

nickeubank commented Sep 6, 2016

wesm commented Sep 6, 2016

wesm commented Sep 19, 2016

Copy on write for views #10

Copy on write for views #10

Comments

wesm commented Sep 1, 2016

shoyer commented Sep 1, 2016

wesm commented Sep 1, 2016

wesm commented Sep 5, 2016

nickeubank commented Sep 6, 2016

wesm commented Sep 6, 2016

wesm commented Sep 19, 2016