Duplicate scalar columns (or custom index) in Pandas DF with flatten=True #179

beojan · 2018-10-31T06:13:05Z

In my case, my tree contains runNumber and eventNumber columns that I would like to use as an index, but these columns are NaN for subentry != 0.

The text was updated successfully, but these errors were encountered:

jpivarski · 2018-10-31T12:31:19Z

My thinking on the was that anyone could use Pandas's fillna in the forward direction. Alternatively, I could call that function just before returning the DataFrame, but this provides more information to the user.

beojan · 2018-10-31T12:33:47Z

That would cause issues if you have multiple jagged-array columns with different lengths. I was suggesting duplicating only the scalar columns.

jpivarski · 2018-10-31T12:45:20Z

That is doable. I'll use fillna per column because in the arrays function, I know which columns are scalar. It does lose information, but that information is available in the original TTree object as the branch.interpretation (asdtype vs asjagged).

jpivarski · 2018-11-01T13:33:50Z

In uproot 3.2.9, scalar columns get duplicated down, but jagged columns of different lengths do not.

beojan · 2018-11-02T15:43:08Z

Turns out there's a problem. The integer columns have turned into floats.

jpivarski · 2018-11-02T17:45:15Z

That's something that Pandas does when it consolidates Numpy arrays internally. I don't know how to control it— I add columns to the DataFrame and it sometimes converts them. Do you know the mechanism behind that? It seems like something they really to be hidden/transparent.

beojan · 2018-11-02T19:06:17Z

It's probably because you used NaN which is only available with floats.

jpivarski · 2018-11-02T20:41:20Z

That makes sense. However, I didn't put NaN in myself: that's what Pandas does when you merge a dataset into one with a larger index— namely the one with nonzero subentries. That's intrinsic to the process. I suppose I could afterward determine if any fillna'ed scalar columns used to be integers and change them back...

jpivarski closed this as completed Nov 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate scalar columns (or custom index) in Pandas DF with flatten=True #179

Duplicate scalar columns (or custom index) in Pandas DF with flatten=True #179

beojan commented Oct 31, 2018

jpivarski commented Oct 31, 2018

beojan commented Oct 31, 2018

jpivarski commented Oct 31, 2018

jpivarski commented Nov 1, 2018

beojan commented Nov 2, 2018

jpivarski commented Nov 2, 2018

beojan commented Nov 2, 2018

jpivarski commented Nov 2, 2018

Duplicate scalar columns (or custom index) in Pandas DF with flatten=True #179

Duplicate scalar columns (or custom index) in Pandas DF with flatten=True #179

Comments

beojan commented Oct 31, 2018

jpivarski commented Oct 31, 2018

beojan commented Oct 31, 2018

jpivarski commented Oct 31, 2018

jpivarski commented Nov 1, 2018

beojan commented Nov 2, 2018

jpivarski commented Nov 2, 2018

beojan commented Nov 2, 2018

jpivarski commented Nov 2, 2018