-
Notifications
You must be signed in to change notification settings - Fork 67
Extremely large memory consumption when reading from an arbitrary location of a root file #319
Comments
I reproduced your bug and found a cure, though this definitely goes under the "One Weird Trick" category. Do this: tree._recover() before attempting The long explanation: when you write a ROOT file and close it without calling At first, uproot didn't support this, but I've since added code to read embedded baskets the first time you try to read anything from a branch. You also have many branches, and each one was trying to recover its branches in between reading the data you were actually interested in. It's not an excessive amount of data, but the order was bad—it couldn't let go of previously read branches while each branch went and recovered its baskets—and the garbage collector couldn't do its job. By recovering all branches up-front (which doesn't take very long or very much memory), it doesn't have this problem when it goes and tries to read the data you're interested in (which also doesn't take very long or very much memory). That's why this feels like "One Weird Trick," you get the same amount of work done, but doing it in a different order makes the difference between a few MBs in under a second and crashing your computer with 64 GB. I'm also putting in a fix for all methods that read multiple branches, such as |
See PR #320. |
Thank you for the fast fix and the detailed explanation! I confirm that after calling |
Fix memory issue (#319) by recovering all interesting branches before reading any.
Hello,
when trying to read a chunk of data from an arbitrary location in a large root file, uproot takes all RAM memory (> 64GB) and eventually crashes. If the same file is converted to HDF5 (using
root_pandas.read_root
andpandas.DataFrame.to_hdf
) and then the same chunk of data is read withpandas.read_hdf
- it works fast and consumes less than 1 GB of RAM.Is this related to some intrinsic limitation of the root file format, or there is a way to overcome this problem?
Here is a minimal code to reproduce the issue:
System information: uproot version 3.9.0, python version 3.6.7, OS CERN CentOS 7.
The data file can be found here: https://cernbox.cern.ch/index.php/s/QOUBLxRUXpek7tz (or
/eos/home-k/kandroso/share-tmp/data.root
requires lxplus access)The text was updated successfully, but these errors were encountered: