Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement recursively_apply for Record #1401

Merged
merged 5 commits into from
Apr 8, 2022
Merged

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Apr 6, 2022

Fixes #1400

As per the discussion below, Record.recursively_apply is a top-level function that just forwards the _recursively_apply call to the content.

@codecov
Copy link

codecov bot commented Apr 6, 2022

Codecov Report

Merging #1401 (dce33c4) into main (a8c37b9) will increase coverage by 0.34%.
The diff coverage is 73.74%.

Impacted Files Coverage Δ
src/awkward/_v2/_broadcasting.py 93.71% <ø> (ø)
src/awkward/_v2/_connect/cling.py 0.00% <ø> (ø)
src/awkward/_v2/_connect/cuda/__init__.py 0.00% <0.00%> (ø)
src/awkward/_v2/contents/listarray.py 90.80% <ø> (+2.62%) ⬆️
src/awkward/_v2/operations/structure/ak_mask.py 95.23% <ø> (ø)
src/awkward/_v2/highlevel.py 74.31% <65.90%> (+1.19%) ⬆️
src/awkward/_v2/operations/convert/ak_to_layout.py 81.81% <75.00%> (-2.06%) ⬇️
src/awkward/_v2/_connect/numpy.py 62.24% <80.00%> (+0.46%) ⬆️
src/awkward/_v2/record.py 80.48% <80.00%> (+0.82%) ⬆️
src/awkward/_v2/_util.py 83.33% <86.40%> (+4.12%) ⬆️
... and 14 more

@agoose77 agoose77 changed the title Test: add failing test for 1400 Implement recursively_apply for Record Apr 6, 2022
@agoose77
Copy link
Collaborator Author

agoose77 commented Apr 6, 2022

@ioanaif I've tagged you here because I think this is your area of expertise?

Here's an initial implementation of recursively_apply for ak._v2.Record. I think that we want to support the recursively_apply interface, because (a) it's useful, and (b) it's currently supported (via ak._util) in v1.

However, whilst 3c79afd nearly passes the simple test that I've added, existing code does not (in many places) expect a non-Content argument in apply. So, if you are happy that we want to implement this function for Record, I think we'd also need to address how to fix these other functions.

On this note, @jpivarski my main concern is what to do about the is_OptionType etc. flags. Right now, users can pass a Record object to a high level function that does not expect a non-Content, and it will throw an Exception if it attempts to lookup these flags. Some apply functions check isinstance(..., Content), but I haven't looked into whether that is to handle some other non-content case besides Record or not.

Rambling about types

As we've touched on before, ak._v2.Record and ak._v2.Content are a bit strange because they are really specialised implementations of a more abstract interface, but we never formally define this via inheritance / protocols (in the same way that we don't for Form and Content).

It's funny, I am in general not a fan of strict inheritance hierarchies, but I hadn't realised how much nicer structural subtyping is when there are formal mechanisms to check them (e.g. mypy). I think you already raised the goal of implementing mypy support at a future date, so that's something to look forward to working on.

My first instinct here is that Record already partially quacks like a Content, so we should also implement these is_ flags. It would then require us to decide whether to add an is_RecordItem (terrible name), or leave this type omitted.

@jpivarski
Copy link
Member

This is my first impression, so take it with that in mind, but I don't think we want Record to look too much like Content. The thing about the Content hierarchy is that any Content subclass can be a content or one of the contents of any other—not so with Record. (Specifically, I mean ak._v2.contents.Content and ak._v2.record.Record.) A Record can only ever be at the root of a tree, and the array that it contains must be a RecordArray.

As such, the value of getting Record into a recursively_apply call is limited: it can avoid a special case in the first step, before descent, but that's it. The value of the Content subclasses having a recursively_apply call is much stronger: it's the only way to address the case of Content subclass "XYZArray" at some deep node in the hierarchy. A Record's recursively_apply saves one special case, XYZArray's recursively_apply saves infinitely many.

It would save that one special case. Implementations of ak.* functions that can take either an array or a record get a little simpler by not having to check isinstance(layout, ak._v2.record.Record) and doing something with that. Are there a lot of such ak.* functions? They can be identified by the fact that they pass allow_record=True when they internally call to_layout. If they're rare (I think they might be rare), then having more ceremony to call them out might actually be a benefit.

Streamlining is not always a good idea. If we minimize the differences between Content and Record enough, Records might start getting accidentally passed through as Contents in places where we don't want them to be. The symptom of that would be exceptions getting raised downstream of where the actual mistake happens, and those can be hard to trace back.

I can say that some of the connection between low-level Record and Content is historical: in C++, some functions could return arrays or scalars, such as getitem. To pass the type-checker, these functions had to return something like std::variant<numbers, None, Record, Content>, but the C++ was C++11, not 17. (Also, what's "numbers"?) So scalars returned from C++ had to be implemented as Content subclasses even when they logically aren't: numbers are NumpyArray with zero dimensions, a case that has to be forbidden as input to many functions, None is a Content subclass with almost all methods unimplemented, and Record is a subclass of Content, but only on the C++ side. pybind11 makes it out to be not a subclass of Content on the Python side. So, some of the method names being the same were because of this undesired constraint.


It takes so long to write a response that sometimes I'll have a change of heart while writing it. It might be alright for Record to have a recursively_apply (to save that one case, mentioned above), but its recursively_apply may need to be implemented in an entirely different way. I don't think the action function should ever match it. If action is going to specially check for the Record case, that case could just as easily be checked outside of action and recursively_apply.

The main benefit of Record's recursively_apply is to get through that one layer, which is always on the root of the layout tree, into the part that the action actually cares about. This recursively_apply could probably be written as a short-circuit to passing the action down to the Record's array.

@agoose77
Copy link
Collaborator Author

agoose77 commented Apr 6, 2022

but I don't think we want Record to look too much like Content.

Ha, this was my first thought too. Then I flip-flopped after looking at v1.

As such, the value of getting Record into a recursively_apply call is limited ...

Agreed. It mainly replaces the burden of checking for isinstance(layout, Record) with the need to support Record and Content in the apply function...

Implementations of ak.* functions that can take either an array or a record get a little simpler by not having to check

as you put in the this paragraph!

It might be alright for Record to have a recursively_apply ... I don't think the action function should ever match it.

I considered this solution earlier, but I wasn't sure whether this conditional skip would be a bad thing. On balance, I think this might be the best solution - the to_layout caller determines whether they wish to support Records or not, and the apply function is guaranteed only to receive a Content.

It takes so long to write a response that sometimes I'll have a change of heart while writing it.

I'm glad it's not just me. Sometimes I'll write the same answer three or four times before removing half the text in favour of reading the replies!

agoose77 added 2 commits April 6, 2022 23:26
This makes it clear that the `apply` is only ever given a `Content` type.
@agoose77 agoose77 marked this pull request as ready for review April 6, 2022 22:52
@agoose77 agoose77 requested a review from jpivarski April 8, 2022 11:53
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I just wanted to tweak it a bit, and I'll use GitHub's suggested change to do that. If it passes the test with the change, then it will be merged. (I'll turn on the auto-merge.)

src/awkward/_v2/record.py Outdated Show resolved Hide resolved
@jpivarski jpivarski enabled auto-merge (squash) April 8, 2022 15:42
@jpivarski jpivarski merged commit e718a8d into main Apr 8, 2022
@jpivarski jpivarski deleted the fix-1400-with_name-record branch April 8, 2022 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

with_name fails for ak._v2.Record
2 participants