Releases: scikit-hep/uproot3
3.4.12
3.4.11
3.4.10
Requests for array(...)
or arrays(...)
through HTTP and XRootD now start asynchronous downloads of all the basket data before starting to read, decompress, and interpret. This keeps the network busy prefetching while the CPU is preoccupied, hiding latency. The HTTP preloader is implemented with concurrent.futures.ThreadPoolExecutor
(not selected by default in Python 2, as that would require a non-standard library dependency), and the XRootD preloader is implemented with a pyxrootd callback. The threads
parameter is a number of threads for HTTP and a boolean for XRootD: yes-parallelize or no-don't, because we don't control how many threads pyxrootd uses. (PR #242)
Binder now uses JupyterLab, rather than Jupyter Notebook. (PR #244)
3.4.9
3.4.8
3.4.7
3.4.6
3.4.5
3.4.4
Faster TTree.pandas.df(flatten=True)
provided by PR #223.
One capability that was lost was reading branches with different jagged structure into the same DataFrame with flatten=True
. For instance, a TTree containing different numbers of electrons and muons can't be simultaneously flattened. The old code managed to do this with an outer join on DataFrames. We no longer do this in the TTree.pandas.df
code; instead, we broadcast JaggedArrays, which is not just faster, it's also more correct. Does it make sense to put the first electron and the first muon in the same row, then the second electron and the second muon in another row, where the two sets have different sizes in each event? (The shorter of the two then has to be padded with NaN
.) This joint row-membership doesn't correspond to any property the second electron and second muon share.
Now there's a ValueError warning you if you try to do this. You can encounter this error rather easily by not specifying branches—implicitly saying you want all branches from a TTree, which may contain incompatible branches. Remember that you can use glob patterns to ask for all branches satisfying a name pattern.
If you really do want to mix different cardinalities in the same DataFrame, you can explicitly do an outer join in Pandas:
muons = tree.pandas.df("Muon_*", flatten=True)
electrons = tree.pandas.df("Electron_*", flatten=True)
muons.join(electrons, how="outer")
You can also choose to not flatten the DataFrame, which puts no constraints on the structure of the contents (but is less useful if you have a lot of jagged data).