Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add mapfilter decorator #1204

Closed
wants to merge 2 commits into from

Conversation

pfackeldey
Copy link
Contributor

@pfackeldey pfackeldey commented Nov 7, 2024

This PR adds a new decorator for embarrassingly parallel computations to be collapsed into a single node in the high-level dask graph.

This is basically a curried decorator version of dask_awkward.map_partitions. Thus, this decorator comes with the same caveats as dask_awkward.map_partitions regarding the mapped function:

  • The mapped function must use pure eager awkward inside (no delayed operations).
  • The mapped function must return a single argument, i.e. an awkward array.
  • The mapped function must be emberassingly parallel.

In addition, the decorator accepts two more arguments to ease the pain of writing if ak.backend(array) == "typetracer": ... conditions. The arguments are needs and out_like.

  • needs takes a dictionary that maps arguments of the function signature (which are awkward arrays) to an utterable of slices that should be touched respectively.
  • out_like skips the function execution during typetracing pass and "mocks" the output array as whatever out_like is.

An example is given with this untraceable function (it's untraceable because of the numpy conversion):

from coffea.processor import mapfilter

@partial(
  mapfilter,
  needs={"muons": ["pt"]},
  out_like=ak.Array([0.0]),
)
def untraceable_fun(muons):
  # a non-traceable computation for ak.typetracer
  # which needs "pt" column from muons and returns a 1-element array
  return ak.Array([np.sum(ak.to_numpy(muons.pt[0:1]))])

Here, muons.pt will be forcefully touched during typetracing and the output is mocked as ak.Array([0.0]).

In general, these two arguments are not needed as typetracing should work in most cases. Only in cases where it doesn't they might come in handy.

@pfackeldey pfackeldey marked this pull request as draft November 7, 2024 19:46
@pfackeldey
Copy link
Contributor Author

I put this PR in draft mode for now, as I plan to add documentation with examples in addition.

@pfackeldey
Copy link
Contributor Author

I'm closing this PR as I'm moving this feature directly into dask-awkward

@pfackeldey pfackeldey closed this Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant