Efficiently writing many histograms to a root file #1242
Replies: 1 comment
-
What would be ideal would be to have a serialization that writes the shared metadata (axis ranges, numbers of bins, labels, etc.) only once and the bin contents of all the histograms as a single, large multidimensional array. I've been trying to get developers interested in this idea for a while (most recently here). What Uproot does is it takes the boost-histogram, converts it into a ROOT histogram class (TH1, TH2, or TH3), which has a lot more metadata than boost-histogram itself, and saves that. We've had some discussions of native boost-histogram serialization; there's a draft schema and (I think) an implementation for HDF5, but (as far as I know) no serialization for ROOT yet. Boost-histograms can also be written with boost-serialization. Uproot's algorithm for determining what ROOT type to write on assignment considers any dict or DataFrame to be a TTree. It's technically possible for a ROOT TTree to contain histograms, instead of event data, but that's an odd (and not efficient) thing to do, and Uproot doesn't do it. There isn't a faster way to write large numbers of histograms to a ROOT file with Uproot, so you have a few options:
|
Beta Was this translation helpful? Give feedback.
-
Hey Experts!
So I have a quick question about how best to write many histogram (boost-histogram objects) to a root file using uproot v5.3.7.
Looking at the documentation 1, it says that one way to write a boost-histogram to a root file is simply by doing e.g.:
and this seems to work completely fine.
However, in my case I need to save loads of histograms to the root file (tens of thousands + of them!), and this method seems to slow down considerably as histograms are being saved to the file.
For example:
tqdm shows that the saving loop at the end slows down considerably even for this small example, going from roughly 300 iterations a second down to roughly 90 by the end. For my actual case this is even worse because I have even more histograms to save, and this almost completely grinds to a halt half way through!
Alternatively I thought it might be possible to save all of these histograms in one go by containing them in a dictionary and saving them as a tree following the example in 1. But I believe this is only possible if the data is numpy/awkward arrays or pandas dataframes. So I end up getting the error:
If anyone could help me fix this issue it would be greatly appreciated!
I should add that I am using boost histograms because that way I could include the associated bin errors as well, which I wasn't sure how to do by saving them as numpy histograms. If there is a way to do that then I guess I could use the final example shown above where I save it as a dictionary I believe?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions