-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: pass a copy of RecordArray
's internal fields in HL API
#1650
fix: pass a copy of RecordArray
's internal fields in HL API
#1650
Conversation
164866c
to
24e687d
Compare
Codecov Report
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! This solves it.
@agoose77, I think we should leave Do you agree? |
Either way, @all-contributors please add @Saransh-cpp for code |
I've put up a pull request to add @Saransh-cpp! 🎉 |
@@ -27,4 +29,4 @@ def fields(array): | |||
|
|||
def _impl(array): | |||
layout = ak._v2.operations.to_layout(array, allow_record=True, allow_other=False) | |||
return layout.fields | |||
return copy.copy(layout.fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest using .copy
rather than copy.copy
here, because we know that we have a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I forgot that—I had been (wrongly) thinking that it could be a list or None
, but no: leaf-nodes (NumpyForm and EmptyForm) return an empty list instead of None
.
awkward/src/awkward/_v2/forms/numpyform.py
Lines 198 to 200 in 1a0858e
@property | |
def fields(self): | |
return [] |
awkward/src/awkward/_v2/forms/emptyform.py
Lines 89 to 91 in 1a0858e
@property | |
def fields(self): | |
return [] |
Well, the same is true of parameters
: we know that we have a dict:
awkward/src/awkward/_v2/forms/form.py
Lines 259 to 263 in 1a0858e
@property | |
def parameters(self): | |
if self._parameters is None: | |
self._parameters = {} | |
return self._parameters |
And a dict also has a .copy()
. I'll put that in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a .copy()
of a dict would be a shallow copy w.r.t the values. Is there any chance that we want a deep copy? Are there any uses of parameters
that involve setting a mutable object e.g. dict, list etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(So would copy.copy
: also shallow.)
In principle, parameters
can be any JSON-able thing. In practice, they've always been strings. The way it looks like things are going, they'll always be strings. If we have some non-string lists and dicts, then in principle we'd have to protect them with deepcopy
.
>>> parameters = {
... "__record__": "SomeLongName",
... "__categorical__": False,
... "units": "cm",
... "__doc__": "This is some thing. Wow."
... }
>>> def copy1(what, how_many):
... for _ in range(how_many):
... tmp = what.copy()
...
>>> def copy2(what, how_many):
... for _ in range(how_many):
... tmp = copy.copy(what)
...
>>> def copy3(what, how_many):
... for _ in range(how_many):
... tmp = copy.deepcopy(what)
...
>>> def copy4(what, how_many):
... for _ in range(how_many):
... tmp = json.loads(json.dumps(what))
...
>>> start = time.time(); copy1(parameters, 1000000); time.time() - start
0.05509829521179199
>>> start = time.time(); copy2(parameters, 1000000); time.time() - start
0.18173909187316895
>>> start = time.time(); copy3(parameters, 1000000); time.time() - start
2.2748003005981445
>>> start = time.time(); copy4(parameters, 1000000); time.time() - start
3.071228504180908
I don't think the unlikely possibility of weird types justifies the expense. Or wait, maybe
>>> def copy5(what, how_many):
... for _ in range(how_many):
... if all(isinstance(x, (str, bool, numbers.Integral, numbers.Real)) for x in what.values()):
... tmp = what.copy()
... else:
... tmp = copy.deepcopy(what)
...
>>> start = time.time(); copy5(parameters, 1000000); time.time() - start
0.5682976245880127
is only a factor-of-10 cost, which goes into the expensive branch on weird cases (none of ours).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I misread this as copy.deepcopy
.
24e687d
to
d88e76d
Compare
d88e76d
to
39756d1
Compare
(Working through the web editor...) |
Fixes #1648
The tests pass locally by running -