Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for dask-awkward arrays #301

Closed
masonproffitt opened this issue Dec 14, 2022 · 7 comments · Fixed by #429
Closed

Support for dask-awkward arrays #301

masonproffitt opened this issue Dec 14, 2022 · 7 comments · Fixed by #429
Assignees
Labels
feature New feature or request

Comments

@masonproffitt
Copy link

Describe the potential feature

Support for vectors and vector operations using arrays from dask-awkward. I assume this is the correct repository for this issue, since by parallel construction there would need to be something like vector.dask.

Motivation

This is of course important for many HEP uses of uproot.dask. Among other places, this is needed for adoption of Uproot v5 and Awkward Array v2 by func-adl-uproot and therefore the Uproot ServiceX transformer.

Possible Implementation

No response

@masonproffitt masonproffitt added the feature New feature or request label Dec 14, 2022
@jpivarski
Copy link
Member

This would follow on #284, and stands to benefit from the dask-awkward testing that is being done at scikit-hep/coffea#736.

A first step would be to just try it (if you haven't already). It's not a new thing to be implemented, but something in which we need to test and fix any issues that come up.

@masonproffitt
Copy link
Author

masonproffitt commented Dec 16, 2022

Okay, this does actually seem to work using dak.with_name(). The vector properties/functions don't show up in __dir__(), but they do run when called. vector.awk doesn't work with dask-awkward objects, though, so it could still be nice to have a vector.dask or something like I suggested above.

@matthewfeickert
Copy link
Member

@Saransh-cpp Is this issue still relevant? Or does modern Vector work with dask-awkward just fine and can this be closed? (I assume so, given the ongoing work to move Coffea to using Vector.)

@Saransh-cpp
Copy link
Member

Hi @matthewfeickert!

I am not very familiar with dask but the following works:

In [1]: import vector

In [2]: import dask_awkward as dak

In [3]: import awkward as ak

In [4]: vec = vector.Array(
   ...:     [
   ...:         [{"x": 1, "y": 1.1, "z": 0.1}, {"x": 2, "y": 2.2, "z": 0.2}],
   ...:         [],
   ...:         [{"x": 3, "y": 3.3, "z": 0.3}],
   ...:         [
   ...:             {"x": 4, "y": 4.4, "z": 0.4},
   ...:             {"x": 5, "y": 5.5, "z": 0.5},
   ...:             {"x": 6, "y": 6.6, "z": 0.6},
   ...:         ],
   ...:     ]
   ...: )

In [5]: dak.from_awkward(vec, npartitions=4)
Out[5]: dask.awkward<from-awkward, npartitions=4>

In [6]: dak.from_awkward(vec, npartitions=4).compute()
Out[6]: <VectorArray3D [[{x: 1, y: 1.1, ...}, {...}], ...] type='4 * var * Vector3D...'>

In [7]: dak.from_awkward(vec, npartitions=4).x
Out[7]: dask.awkward<x, npartitions=4>

In [8]: dak.from_awkward(vec, npartitions=4).x.compute()
Out[8]: <Array [[1, 2], [], [3], [4, 5, 6]] type='4 * var * int64'>

vector.awk doesn't work with dask-awkward objects

I think @masonproffitt wants a new vector.dask constructor because the following errors out:

In [9]: vec_ak = ak.Array(
   ...:     [
   ...:         [{"x": 1, "y": 1.1, "z": 0.1}, {"x": 2, "y": 2.2, "z": 0.2}],
   ...:         [],
   ...:         [{"x": 3, "y": 3.3, "z": 0.3}],
   ...:         [
   ...:             {"x": 4, "y": 4.4, "z": 0.4},
   ...:             {"x": 5, "y": 5.5, "z": 0.5},
   ...:             {"x": 6, "y": 6.6, "z": 0.6},
   ...:         ],
   ...:     ]
   ...: )

In [10]: vector.Array(dak.from_awkward(vec_ak, npartitions=4))
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-10-03f9663c0e95> in <cell line: 0>()
----> 1 vector.Array(dak.from_awkward(vec_ak, npartitions=4))

~/Code/HEP/vector/src/vector/backends/awkward_constructors.py in Array(*args, **kwargs)
    312     import vector.backends.awkward
    313 
--> 314     akarray = awkward.Array(*args, **kwargs)
    315     array_type = akarray.type
    316 

/opt/homebrew/lib/python3.11/site-packages/awkward/highlevel.py in __init__(self, data, behavior, with_name, check_valid, backend, attrs)
    308 
    309         else:
--> 310             layout = ak.operations.to_layout(
    311                 data, allow_record=False, regulararray=False, primitive_policy="error"
    312             )

/opt/homebrew/lib/python3.11/site-packages/awkward/_dispatch.py in dispatch(*args, **kwargs)
     54                         # This may later be used to signal that another overload should be used.
     55                         if result is NotImplemented:
---> 56                             raise NotImplementedError
     57                         else:
     58                             return result

NotImplementedError: 

In [11]: vector.Array(dak.from_awkward(vec_ak, npartitions=4).compute())
Out[11]: <VectorArray3D [[{x: 1, y: 1.1, ...}, {...}], ...] type='4 * var * Vector3D...'>

Is vector.dask still required? If yes, I can start looking into dask and dask-awkward and try implementing the new constructor.

@masonproffitt
Copy link
Author

This issue is about having a simple function call that goes from a plain dask-awkward array to something that can be used with vector methods. I don't have a preference on whether that's vector.Array or vector.awk or vector.dask, but something like one of those should work with dask-awkward objects.

@jpivarski
Copy link
Member

Arguably, we only have this error because we have too many constructors. vector.Array(x) doesn't work when x is a dask_awkward.Array rather than an awkward.Array, but what is vector.Array supposed to do? Construct an awkward.Array with vector behavior and a parameter name like "Vector2D" using ak.with_name?

It's possible to set these two things (the parameter name and the behavior) manually:

x = dak.from_awkward(ak.Array([{"x": 1, "y": 2}, {"x": 1.1, "y": 2.2}]), npartitions=1)

a = ak.with_name(x, "Momentum2D", behavior=vector.backends.awkward.behavior)

b = a.phi
b.compute()   # works

The first argument of ak.with_name is a dask_awkward.Array, which it recognizes and wraps appropriately.

The ak.with_name call is long-winded, not something we want to write every time, but having vector.Array continue to not work and introduce a new vector.dask would only exacerbate the problem of having too many ways to do this. It doesn't help the people who try vector.Array without knowing that there's a vector.dask to try instead.

So I guess the thing to do is to fix the path that leads to this NotImplementedError in vector.Array, so that it does the right thing (ak.with_name) with dask-awkward arrays.

@Saransh-cpp
Copy link
Member

Thanks for the explanation, @masonproffitt and @jpivarski! I'll take this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants