Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

MultiIndex pandas dataframe from uproot.iterate #263

Closed
afrankenthal opened this issue Mar 27, 2019 · 8 comments · Fixed by #264
Closed

MultiIndex pandas dataframe from uproot.iterate #263

afrankenthal opened this issue Mar 27, 2019 · 8 comments · Fixed by #264

Comments

@afrankenthal
Copy link

Hi! First of all thank you very much for this awesome package!

I have a question regarding MultiIndex pandas dataframes and uproot.iterate. When I open a ROOT file via uproot.open, I am able to select branches which contain JaggedArrays with the same dimensionality, and make them into a pandas dataframe with MultiIndex. For example:

mytree = uproot.open(myfile)["mytree"]
mytree.pandas.df(['muonPt', 'muonEta', 'muonPhi'])

Depending on the event, I can have (say) 0, 1, or 2 muons, so I can have accordingly 0, 1, or 2 subentries, and the resulting pandas dataframe reflects that.

But I would like to process several files with the same structure using uproot.iterate. I haven't found a way to make the pandas dataframe with MultiIndex by selecting the right branches from the iterate, e.g.:

for arrays in uproot.iterate(listoffiles, "mytree", ['muonPt', 'muonEta', 'muonPhi'], outputtype=pd.DataFrame, executor=executor):
         listofdataframes.append(arrays)
pd.concat(listofdataframes)

Without "flatten=True" in the iterate command above, the dataframes come out containing JaggedArrays, and I'm not sure how to turn those into a MultiIndex structure. If I do include "flatten=True", however, I get an error about incompatible dimensionalities:

ValueError: Shape of passed values is (1, 1382), indices imply (1, 1466)

(I think this is because of the variable number of muons per entry). Is there a way to get the same behavior from uproot.iterate on many files, as I would from tree.pandas.df() on a single file?

Thank you!
Andre

@jpivarski
Copy link
Member

This is a bug—you're using iterate the way it's supposed to be used. In fact, tree.pandas.df is just an alias to tree.arrays with some different options, and tree.iterate shares code paths with tree.arrays. Something minor must be mismatched—I'll look into it.

@jpivarski
Copy link
Member

I fixed this in PR #264, where I found and fixed more issues that the one you found.

I was wrong when I thought it might be a minor mismatch: so many (good) updates have gone into DataFrame handling, tested in tree.arrays, that the DataFrame handling in tree.iterate was out of date. Then uproot.iterate (which works by simply calling tree.iterate on each tree) also had some out of date assumptions.

See that PR for updates. This will be a new version of uproot when it's done.

@jpivarski
Copy link
Member

The fix is in master, but Travis is having issues and it won't get pushed to PyPI until that gets resolved. If you need this fix, git clone it or use pip's install-from-git feature.

@afrankenthal
Copy link
Author

Hello, thank you for the incredibly speedy response! I will try to set up a new uproot install using pip's git install feature (I'm currently using conda install which only pulls from binaries, I believe). Or else I'll just wait for the Travis issues to go away.

@jpivarski
Copy link
Member

Sure. :) Based on a Google talk, I'm trying to encourage a "live at head" lifestyle, but that only works if head consists of small changes (and therefore frequent, small changes).

I just checked into Travis again, and they're apparently having serious issues. Only a few jobs have started and those that need to install dependencies from conda time-out at 10 minutes. I guess it won't happen today.

The normal order is that Travis does the continuous integration, and if that's successful, I tag a release, Travis runs again but this time deploys to PyPI at the end of its test. The new version in PyPI notifies the conda package maintainer and he presses the button to deploy to conda. We're stuck at step one.

@afrankenthal
Copy link
Author

That makes a lot of sense! Actually, if this Google talk is available publicly, it would be awesome to watch it, if you can share the link here! :)

@jpivarski
Copy link
Member

I thought it was at the last ROOT Workshop, but I can't find anything that looks like it. Even if I did manage to find slides, it actually wasn't what the speaker intended to talk about: he thought he was referencing a discipline we were familiar with, but it ended up being the most interesting thing in his talk. Apparently that phrase, "live at head," is the common way of describing it.

@afrankenthal
Copy link
Author

Very interesting! Someone needs to make this phrase into a t-shirt...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants