Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: pass a copy of RecordArray's internal fields in HL API #1650

Merged
merged 6 commits into from
Aug 31, 2022

Conversation

Saransh-cpp
Copy link
Member

Fixes #1648

The tests pass locally by running -

└─$ python3 -m pytest -vv -rs tests/v2/

@Saransh-cpp Saransh-cpp force-pushed the copy-parameters-and-fields branch from 164866c to 24e687d Compare August 30, 2022 18:32
@codecov
Copy link

codecov bot commented Aug 30, 2022

Codecov Report

Merging #1650 (4696cbc) into main (9e17f29) will increase coverage by 0.46%.
The diff coverage is 68.67%.

Additional details and impacted files
Impacted Files Coverage Δ
src/awkward/_v2/_connect/cuda/__init__.py 0.00% <0.00%> (ø)
src/awkward/_v2/_connect/numexpr.py 88.40% <0.00%> (ø)
src/awkward/_v2/_connect/pyarrow.py 88.46% <0.00%> (ø)
...awkward/_v2/_connect/rdataframe/from_rdataframe.py 0.00% <0.00%> (ø)
src/awkward/_v2/_lookup.py 98.68% <ø> (+1.17%) ⬆️
src/awkward/_v2/numba.py 93.47% <0.00%> (ø)
src/awkward/_v2/operations/ak_from_avro_file.py 66.66% <0.00%> (ø)
src/awkward/_v2/operations/ak_local_index.py 100.00% <ø> (ø)
src/awkward/_v2/operations/ak_max.py 87.09% <ø> (ø)
src/awkward/_v2/operations/ak_min.py 87.09% <ø> (ø)
... and 63 more

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! This solves it.

@jpivarski
Copy link
Member

@agoose77, I think we should leave fields as a list (following up on #1648 (comment)) and merge this, taking it as the first step in establishing a policy that high-level functions (ak.*, any public methods on ak.Array/ak.Record, including __getitem__, ufuncs) ensure that the data they return are not mutable references inside the Awkward Array objects. However, low-level interfaces (e.g. Content methods) do not make such a guarantee.

Do you agree?

@jpivarski
Copy link
Member

Either way, @all-contributors please add @Saransh-cpp for code

@allcontributors
Copy link
Contributor

@jpivarski

I've put up a pull request to add @Saransh-cpp! 🎉

@@ -27,4 +29,4 @@ def fields(array):

def _impl(array):
layout = ak._v2.operations.to_layout(array, allow_record=True, allow_other=False)
return layout.fields
return copy.copy(layout.fields)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest using .copy rather than copy.copy here, because we know that we have a list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Member

@jpivarski jpivarski Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forgot that—I had been (wrongly) thinking that it could be a list or None, but no: leaf-nodes (NumpyForm and EmptyForm) return an empty list instead of None.

@property
def fields(self):
return []

@property
def fields(self):
return []

Well, the same is true of parameters: we know that we have a dict:

@property
def parameters(self):
if self._parameters is None:
self._parameters = {}
return self._parameters

And a dict also has a .copy(). I'll put that in.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a .copy() of a dict would be a shallow copy w.r.t the values. Is there any chance that we want a deep copy? Are there any uses of parameters that involve setting a mutable object e.g. dict, list etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(So would copy.copy: also shallow.)

In principle, parameters can be any JSON-able thing. In practice, they've always been strings. The way it looks like things are going, they'll always be strings. If we have some non-string lists and dicts, then in principle we'd have to protect them with deepcopy.

>>> parameters = {
...     "__record__": "SomeLongName",
...     "__categorical__": False,
...     "units": "cm",
...     "__doc__": "This is some thing. Wow."
... }
>>> def copy1(what, how_many):
...     for _ in range(how_many):
...         tmp = what.copy()
... 
>>> def copy2(what, how_many):
...     for _ in range(how_many):
...         tmp = copy.copy(what)
... 
>>> def copy3(what, how_many):
...     for _ in range(how_many):
...         tmp = copy.deepcopy(what)
... 
>>> def copy4(what, how_many):
...     for _ in range(how_many):
...         tmp = json.loads(json.dumps(what))
... 
>>> start = time.time(); copy1(parameters, 1000000); time.time() - start
0.05509829521179199
>>> start = time.time(); copy2(parameters, 1000000); time.time() - start
0.18173909187316895
>>> start = time.time(); copy3(parameters, 1000000); time.time() - start
2.2748003005981445
>>> start = time.time(); copy4(parameters, 1000000); time.time() - start
3.071228504180908

I don't think the unlikely possibility of weird types justifies the expense. Or wait, maybe

>>> def copy5(what, how_many):
...     for _ in range(how_many):
...         if all(isinstance(x, (str, bool, numbers.Integral, numbers.Real)) for x in what.values()):
...             tmp = what.copy()
...         else:
...             tmp = copy.deepcopy(what)
... 
>>> start = time.time(); copy5(parameters, 1000000); time.time() - start
0.5682976245880127

is only a factor-of-10 cost, which goes into the expensive branch on weird cases (none of ours).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I misread this as copy.deepcopy.

@Saransh-cpp Saransh-cpp force-pushed the copy-parameters-and-fields branch from 24e687d to d88e76d Compare August 31, 2022 09:52
@Saransh-cpp Saransh-cpp force-pushed the copy-parameters-and-fields branch from d88e76d to 39756d1 Compare August 31, 2022 09:53
@jpivarski jpivarski enabled auto-merge (squash) August 31, 2022 16:38
@jpivarski
Copy link
Member

(Working through the web editor...)

@jpivarski jpivarski merged commit 9901b44 into scikit-hep:main Aug 31, 2022
@Saransh-cpp Saransh-cpp deleted the copy-parameters-and-fields branch August 31, 2022 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ak.fields (v2) passes a RecordArray's internal fields by reference
3 participants