Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ refactoring: Implemented the type tracer for Awkward-Dask. #1110

Merged
merged 28 commits into from
Oct 12, 2021

Conversation

jpivarski
Copy link
Member

No description provided.

@jpivarski jpivarski marked this pull request as draft October 5, 2021 20:32
@jpivarski
Copy link
Member Author

This is what the latest commit does. Given a concrete array,

>>> typetracer = ak._typetracer.TypeTracer.instance()
>>> concrete = ak._v2.contents.NumpyArray(np.arange(2*3*5).reshape(2, 3, 5) * 0.1)
>>> concrete
<NumpyArray dtype='float64' shape='(2, 3, 5)'>
    [[[0.  0.1 0.2 0.3 0.4]
      [0.5 0.6 0.7 0.8 0.9]
     ...
      [2.  2.1 2.2 2.3 2.4]
      [2.5 2.6 2.7 2.8 2.9]]]
</NumpyArray>

we can now make abstract arrays that know their dtype and as much shape information as possible. The shape is in an Interval class that has string representations like "2...2" (possible values are 2 through 2 inclusive, syntax is like Ruby). It successfully passes through NumpyArray._getitem_at.

>>> abstract = ak._v2.contents.NumpyArray(concrete.to(typetracer))
>>> abstract
<NumpyArray dtype='float64' shape='(2...2, 3, 5)'>[?? ... ??]</NumpyArray>
>>> abstract[0]
<NumpyArray dtype='float64' shape='(3...3, 5)'>[?? ... ??]</NumpyArray>
>>> abstract[0][0]
<NumpyArray dtype='float64' len='5...5'>[?? ... ??]</NumpyArray>
>>> abstract[0][0][0]
0.0

The scalar output (0.0) is a completely made-up value; it can be configured with fill. Zero is a relatively safe choice because the most common reason for extracting a number from an Index is to determine the length of the next Index to allocate, and zero is the most extreme length. (Hopefully, all of the instances where a new offsets Index is created has the array value be the length before adding one.)

@jpivarski jpivarski marked this pull request as ready for review October 11, 2021 20:58
@jpivarski
Copy link
Member Author

@ioanaif @ianna @stormiestsin I'll be merging this soon: it has a lot of little diffs throughout the tests to verify that every v2 test we've ever written preserves type information when passed through with TypeTracerArrays (arrays with no concrete values, going through the code just to see how the Form changes). There have also been a few changes to the code to make it work, and I noticed a few tests hadn't been enabled (commented out or still testing v1, rather than v2). I've turned the latter into @pytest.mark.skip so that we can see them and get back to them when merging and simplify are ready.

Let me know if there are any issues with merging this PR (i.e. you're working on something that would have a substantial diff with this branch). Thanks!

@sw15h
Copy link
Collaborator

sw15h commented Oct 12, 2021

Hi Jim, I went through the tests that I wrote earlier.

For the validityerror tests, (please correct me here if I am wrong) the content of the array is essentially a typetracer, so it should be fine for all the validityerror tests till now, since none of the implemented validityerror tests access the data.

There is one case, where checking for parameters calls a is_unique on the array buffer. That accesses data. I put a TODO in the validityerror_parameters, and it checks for unique data when the __array__ is categorical. Will this pass the array.typetracer.validityerror() test, when we write that test?

Everything else in the tests look great! I am not working on anything at the moment, so you can merge it if others don't have an issue.

@jpivarski
Copy link
Member Author

It's not exactly the content that's a TypeTracerArray object, but each NumpyArray's data and each Index's data (what would normally be a NumPy or CuPy array). The validityerror method, as all methods should, only iterate over data in the arrays (which doesn't exist for TypeTracerArray) in kernels (which are no-ops for TypeTracerArray).

All of the implementations looked fine, but some of the tests hadn't been running. I got a few of them running, adding code where necessary, but others have been left as skipped tests.

@jpivarski jpivarski changed the title Starting to implement the type tracer for Awkward-Dask. Implemented the type tracer for Awkward-Dask. Oct 12, 2021
@jpivarski jpivarski merged commit 7ce1bb9 into main Oct 12, 2021
@jpivarski jpivarski deleted the jpivarski/type-tracer-1 branch October 12, 2021 12:40
@jpivarski jpivarski changed the title Implemented the type tracer for Awkward-Dask. C++ refactoring: Implemented the type tracer for Awkward-Dask. Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants