Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: hide Content recursion entry points in ak._do submodule. #1972

Merged
merged 32 commits into from
Dec 8, 2022

Conversation

jpivarski
Copy link
Member

@jpivarski jpivarski commented Dec 7, 2022

This is a proposed solution to #1969. While we have to hide recursively_apply somehow before the 2.0.0 release, there's nothing final about doing it this way. Since this is a hidden submodule, we can rename it or even change back to methods with a naming convention.

This is for any function X that needs to be accessible throughout the Awkward codebase but hidden from outside the codebase, the first of which is recursively_apply. Here's an example program flow:

  1. Some function in ak.operations calls ak._do.recursively_apply(layout, *args). They can do this because hidden submodules (like _broadcasting, _slicing, _connect, ...) can be called from anywhere in the Awkward codebase, but not by outsiders.
  2. We declare that functions in ak._do are allowed to call Content and Record private methods. (Maybe it should have been called ak._sudo.) ak._do.recursively_apply decides whether layout is a Content or a Record and calls its _recursively_apply private method.
  3. Content._recursively_apply calls Content._recursively_apply because we have another rule that says that Subclass1._methodA is allowed to call Subclass2._methodB if Subclass1 == Subclass2 or _methodA == _methodB.

I'm making up rules left and right, but rules are supposed to help us organize things, not constrain us, and these rules only apply within the Awkward codebase. For communication with the outside world, we have to limit ourselves to only the community-established conventions, such as "leading underscore means don't call it from outside."

(Also note that the program flow sketched above does not go out of Content back into Content, which would be a little confusing. It goes from ak.* to ak._do to Content.)

I'll add other entry points to this PR as separate commits.


📚 The documentation for this PR will be available at https://awkward-array.readthedocs.io/en/jpivarski-hide-recursion-entry-points/ once Read the Docs has finished building 🔨

@codecov
Copy link

codecov bot commented Dec 7, 2022

Codecov Report

Merging #1972 (26b51ae) into main (5b754e3) will increase coverage by 0.08%.
The diff coverage is 89.75%.

Additional details and impacted files
Impacted Files Coverage Δ
src/awkward/operations/ak_all.py 96.15% <ø> (ø)
src/awkward/operations/ak_any.py 96.15% <ø> (ø)
src/awkward/operations/ak_argcartesian.py 78.94% <ø> (ø)
src/awkward/operations/ak_count.py 96.00% <ø> (ø)
src/awkward/operations/ak_count_nonzero.py 96.15% <ø> (ø)
src/awkward/operations/ak_nan_to_none.py 23.52% <0.00%> (ø)
src/awkward/operations/ak_prod.py 87.09% <ø> (ø)
src/awkward/operations/ak_sum.py 90.32% <ø> (ø)
src/awkward/operations/ak_transform.py 91.30% <ø> (ø)
src/awkward/record.py 82.96% <ø> (+2.44%) ⬆️
... and 57 more

@jpivarski jpivarski self-assigned this Dec 7, 2022
@jpivarski jpivarski added the pr-next-release Required for the next release label Dec 7, 2022
@jpivarski
Copy link
Member Author

jpivarski commented Dec 8, 2022

At this point, Content has an API that will not be hard to maintain:

L2: 📂
L3: 🔒

Form handling: 📂

  • form_cls
  • form
  • form_with_key

Type properties: 📂

  • is_numpy
  • is_unknown
  • is_list
  • is_regular
  • is_option
  • is_indexed
  • is_record
  • is_union

Extended type properties: 📂

  • is_tuple
  • is_leaf
  • is_identity_like

Aspects: 📂

  • length
  • backend
  • parameters
  • parameter
  • with_parameter
  • fields
  • nbytes

Constructors: 📂

  • __init__
  • copy
  • simplified

Python interface: 📂

  • __len__
  • __iter__
  • __getitem__
  • __copy__
  • __deepcopy__
  • __array_ufunc__, __array_function__, __array__ (to prevent them)

Reducers: 🔧

  • argmin 💥
  • argmax 💥
  • count 💥
  • count_nonzero 💥
  • sum 💥
  • prod 💥
  • any 💥
  • all 💥
  • min 💥
  • max 💥

Sorting: 🔧

  • argsort
  • sort

Packing: 🔧

  • packed (a property...) → to_packed()

Conversion:

  • to_arrow
  • to_backend
  • to_json
  • to_list, tolist
  • to_numpy
  • maybe_to_array_maybe_to_array

Form properties: 📂

  • branch_depth
  • dimension_optiontype
  • purelist_depth
  • purelist_isregular
  • purelist_parameter
  • minmax_depth

TypeTracer access: 📂

  • forget_length
  • typetracerto_typetracer

Equality:

  • layout_equalequals

Validity:

  • validity_error 🔧
  • validity_error_parameters_validity_error_parameters

Option-type API: only applies to layouts for which is_option 📂

  • mask_as_bool
  • project
  • to_BitMaskedArray
  • to_ByteMaskedArray
  • to_IndexedOptionArray64

List-type API: only applies to layouts for which is_list 📂

  • starts
  • stops
  • to_ListOffsetArray64
  • to_RegularArray

Also, RegularArray has 📂

  • maybe_to_NumpyArray → implement for NumpyArray
  • offsets

IndexedArray API: 📂

  • mask_as_bool
  • project
  • to_IndexedOptionArray64

UnionArray API: 📂

  • content
  • nested_tags_index
  • project
  • regular_index

RecordArray API: 📂

  • as_tupleto_tuple
  • content
  • field_to_index
  • has_field
  • index_to_field

NumpyArray API: 📂

  • dtype
  • shape, inner_shape
  • strides
  • contiguousto_contiguous
  • is_contiguous
  • ptr 🔒
  • raw 🔒
  • to_RegularArray

EmptyArray API: 📂

  • to_NumpyArray

But I have to fix UnionArray.merging_strategy, which is accidentally not hidden.

These lists were generated using

import awkward as ak

layouts = [
    x for x in [getattr(ak.contents, x) for x in dir(ak.contents)]
    if isinstance(x, type) and issubclass(x, ak.contents.Content)
]

Content_api = sorted([x for x in dir(ak.contents.Content) if not x.startswith("_")])

for x in Content_api:
    print(x)

for x in sorted(layouts, key=lambda x: x.__name__):
    print(x.__name__, sorted([y for y in dir(x) if not y.startswith("_") and not y in Content_api]))

and a lot of editing.

@jpivarski
Copy link
Member Author

UnionArray.merge_strategyUnionArray._merge_strategy was good to fix because it's an internal method and only the underscored version gets called. By missing the underscore, UnionArray was getting Content._merge_strategy, which is a little different, but apparently not in a way that the tests noticed.


I have checked the entire public API of Awkward Array, and I think what we have, at the end of this PR, is something we can support. We can add things at any time, and we can change the mechanism by which hidden things are hidden (such as refactoring this ak._do to something else; that's not off the table). But the API we're exposing now is not something that we'd want to take away from.

Most of what I removed from the Content API are helper functions for implementing ak.* functions. There are still helper functions that start with underscores, but that's fine.

The merging system especially had to be hidden, since that was full of historical choices. (For example, when the merge of two Contents was expanded to mergemany for N Contents, the original merge API was left in so that I wouldn't have to change a lot of tests. Also, these functions can only be called under certain conditions: if you call mergemany when mergeable is False, you'll probably get a segfault. That's why these things need to be hidden!)

@jpivarski jpivarski requested review from agoose77 and ioanaif December 8, 2022 03:16
@agoose77
Copy link
Collaborator

agoose77 commented Dec 8, 2022

Rename suggestions:
contiguous()to_contiguous()
layout_equal()is_equal()?
as_tuple() -> to_tuple()?

@agoose77
Copy link
Collaborator

agoose77 commented Dec 8, 2022

All L3 are underscored, but added to a set of safe rules.

  • Same-name class can call any method of same name class?
  • System of method names with same prefix can call within system across across types?
  • L3 cannot be descriptors (properties)
  • Define set of "public" L3 attributes
  • All above are read-only.

@agoose77
Copy link
Collaborator

agoose77 commented Dec 8, 2022

ak._do is motivated by the loss of a Python private-public distinction in the Content namespace, so we use namespace for the entry points. This is not exactly the case in this PR, but we will do this down the road!

@agoose77 agoose77 merged commit 1d462e5 into main Dec 8, 2022
@agoose77 agoose77 deleted the jpivarski/hide-recursion-entry-points branch December 8, 2022 18:30
@jpivarski jpivarski removed the pr-next-release Required for the next release label Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants