-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement copy on write #597
Comments
pandas has implemented copy-on-write for a while now. It is currently optional but will be the default in Pandas 3.0. I bumped the issue priority because a subset sometimes sharing its "parent" data is a not-too-rare source of hard-to-find bugs: >>> a = ndtest(3)
>>> b = a[:'a1']
>>> b['a1'] = 0
>>> a['a1']
0 So this issue has an impact on performance but it is first and foremost about correctness. We cannot do much about the PS: I am unsure whether Pandas plans to make an explicit .copy() call copy-on-write too and it is probably a good idea we align our behaviour on what Pandas does. |
In python, using b = a does not copy the content of a.
The solution is to use b = a.copy() but if users use it everywhere even when not strictly necessary (and determining this is not always obvious) would consume memory and cpu needlessly.
To eliminate this problem, it would not be too hard to tell users to always use .copy() but in .copy() only flag the resulting array as "must_copy_on_write", without actually copying the data right away. Then if (and only if) the user later modifies the copy (using setitem), an actual copy is made and the array is flagged as must_copy_on_write=False before the setitem is done.
The text was updated successfully, but these errors were encountered: