Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with accessing children with NanoEvents after .compute() #1200

Open
rkansal47 opened this issue Nov 5, 2024 · 2 comments
Open

Issue with accessing children with NanoEvents after .compute() #1200

rkansal47 opened this issue Nov 5, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@rkansal47
Copy link
Contributor

Describe the bug

If I try to access the children of an object loaded as a dask_awkward array and then computed I get the error:

File ~/mambaforge/envs/python311/lib/python3.11/site-packages/coffea/nanoevents/methods/nanoaod.py:139, in GenParticle.children(self)
    133 @dask_property
    134 def children(self):
    135     """
    136     Accessor to direct children of this particle (not grandchildren). Includes particles
    137     with the same PDG ID as this particle.
    138     """
--> 139     return self._events().GenPart._apply_global_index(self.childrenIdxG)

File ~/mambaforge/envs/python311/lib/python3.11/site-packages/coffea/nanoevents/methods/base.py:270, in NanoCollection._events(self)
    268 if "@original_array" in self.attrs:
    269     return self.attrs["@original_array"]
--> 270 return self.attrs["@events_factory"].events()

KeyError: '@events_factory'

with coffea 2024.10.0 awkward 2.6.9.

To Reproduce
MRE:

events_delayed = nanoevents.NanoEventsFactory.from_root(
    {
        "root://cmseos.fnal.gov///store/user/lpcpfnano/cmantill/v2_3/2017/HH/GluGluToHHTobbVV_node_cHHH0_TuneCP5_13TeV-powheg-pythia8/GluGluToHHTobbVV_node_cHHH0/220808_163755/0000/nano_mc2017_1-1.root": "Events"
    },
    schemaclass=nanoevents.NanoAODSchema,
    delayed=True,
).events()

higgs = events_delayed.GenPart[events_delayed.GenPart.hasFlags(["fromHardProcess", "isLastCopy"]) * (events_delayed.GenPart.pdgId == 25)]
print(higgs.compute().children)
@rkansal47 rkansal47 added the bug Something isn't working label Nov 5, 2024
@lgray
Copy link
Collaborator

lgray commented Nov 5, 2024

The issue here is that the self-reference to the original events is lost when you compute (because you can't recursively compute a dask array!). So this isn't actually a bug but rather a consequence of how dask works.

I would suggest instead using delayed=False and doing your data exploration with completely eager nanoevents on that single file or you can try higgs.children.compute() and then work with that array (which you will not be able to recurse through.

Or if you know what you're looking for you can work with the dask array itself and then only compute what you want.

@rkansal47
Copy link
Contributor Author

I see. For context, I was trying to compute earlier than needed as a workaround for #1199, i.e., trying to flatten higgs.children.compute().children instead of higgs.children.children which gives an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants