Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement copy on write #597

Open
gdementen opened this issue Mar 1, 2018 · 1 comment
Open

implement copy on write #597

gdementen opened this issue Mar 1, 2018 · 1 comment

Comments

@gdementen
Copy link
Contributor

gdementen commented Mar 1, 2018

In python, using b = a does not copy the content of a.

>>> a = ndtest(3)
>>> b = a
>>> b['a1'] = 0
>>> a['a1']
0

The solution is to use b = a.copy() but if users use it everywhere even when not strictly necessary (and determining this is not always obvious) would consume memory and cpu needlessly.

To eliminate this problem, it would not be too hard to tell users to always use .copy() but in .copy() only flag the resulting array as "must_copy_on_write", without actually copying the data right away. Then if (and only if) the user later modifies the copy (using setitem), an actual copy is made and the array is flagged as must_copy_on_write=False before the setitem is done.

@alixdamman alixdamman added this to the nice_to_have milestone Mar 7, 2018
@gdementen gdementen removed this from the nice_to_have milestone Aug 1, 2019
@gdementen
Copy link
Contributor Author

pandas has implemented copy-on-write for a while now. It is currently optional but will be the default in Pandas 3.0.

I bumped the issue priority because a subset sometimes sharing its "parent" data is a not-too-rare source of hard-to-find bugs:

>>> a = ndtest(3)
>>> b = a[:'a1']
>>> b['a1'] = 0
>>> a['a1']
0

So this issue has an impact on performance but it is first and foremost about correctness. We cannot do much about the b = a case above, but we can fix the subset issue.

PS: I am unsure whether Pandas plans to make an explicit .copy() call copy-on-write too and it is probably a good idea we align our behaviour on what Pandas does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants