Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

Open a custom ROOT object #168

Closed
LuTse opened this issue Oct 19, 2018 · 5 comments
Closed

Open a custom ROOT object #168

LuTse opened this issue Oct 19, 2018 · 5 comments

Comments

@LuTse
Copy link

LuTse commented Oct 19, 2018

I am trying to open a ROOT file (produced by MARS -- software to analyze data from the MAGIC telescopes) and then load a leaf into an array:

import uproot
f = uproot.open('./20171123_M1_05068934.001_C_PSRJ2032-W0.40+051.root')
t = f['Events']
samples = t['MRawEvtData.fHiGainFadcSamples'].array()
samples

This produces the following error:

IndexError                                Traceback (most recent call last)
~/.local/anaconda3/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
    700                 type_pprinters=self.type_printers,
    701                 deferred_pprinters=self.deferred_printers)
--> 702             printer.pretty(obj)
    703             printer.flush()
    704             return stream.getvalue()

~/.local/anaconda3/lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj)
    393                             if callable(meth):
    394                                 return meth(obj, self, cycle)
--> 395             return _default_pprint(obj, self, cycle)
    396         finally:
    397             self.end_group()

~/.local/anaconda3/lib/python3.6/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
    508     if _safe_getattr(klass, '__repr__', None) is not object.__repr__:
    509         # A user-provided repr. Find newlines and replace them with p.break_()
--> 510         _repr_pprint(obj, p, cycle)
    511         return
    512     p.begin_group(1, '<')

~/.local/anaconda3/lib/python3.6/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    699     """A pprint that just redirects to the normal repr function."""
    700     # Find newlines and replace them with p.break_()
--> 701     output = repr(obj)
    702     for idx,output_line in enumerate(output.splitlines()):
    703         if idx:

~/.local/anaconda3/lib/python3.6/site-packages/awkward/array/base.py in __repr__(self)
     50 
     51     def __repr__(self):
---> 52         return "<{0} {1} at {2:012x}>".format(self.__class__.__name__, str(self), id(self))
     53 
     54     def _try_tolist(self, x):

~/.local/anaconda3/lib/python3.6/site-packages/awkward/array/base.py in __str__(self)
     47             return "[{0}]".format(" ".join(awkward.util.array_str(x) for x in self))
     48         else:
---> 49             return "[{0} ... {1}]".format(" ".join(awkward.util.array_str(x) for x in self[:3]), " ".join(awkward.util.array_str(x) for x in self[-3:]))
     50 
     51     def __repr__(self):

~/.local/anaconda3/lib/python3.6/site-packages/awkward/array/base.py in <genexpr>(.0)
     47             return "[{0}]".format(" ".join(awkward.util.array_str(x) for x in self))
     48         else:
---> 49             return "[{0} ... {1}]".format(" ".join(awkward.util.array_str(x) for x in self[:3]), " ".join(awkward.util.array_str(x) for x in self[-3:]))
     50 
     51     def __repr__(self):

~/.local/anaconda3/lib/python3.6/site-packages/awkward/array/objects.py in __iter__(self)
    165     def __iter__(self):
    166         for x in self._content:
--> 167             yield self.generator(x, *self._args, **self._kwargs)
    168 
    169     def __getitem__(self, where):

~/.local/anaconda3/lib/python3.6/site-packages/uproot/interp/objects.py in __call__(self, bytes)
    273             source = uproot.source.source.Source(bytes)
    274             cursor = uproot.source.cursor.Cursor(0)
--> 275             return self.cls.read(source, cursor, self.context, None)
    276         def __repr__(self):
    277             if isinstance(self.cls, type):

~/.local/anaconda3/lib/python3.6/site-packages/uproot/rootio.py in read(cls, source, cursor, context, parent)
    839             context = context.copy()
    840         out = cls.__new__(cls)
--> 841         out = cls._readinto(out, source, cursor, context, parent)
    842         out._postprocess(source, cursor, context, parent)
    843         return out

~/.local/anaconda3/lib/python3.6/site-packages/uproot/rootio.py in _readinto(cls, self, source, cursor, context, parent)

~/.local/anaconda3/lib/python3.6/site-packages/uproot/source/cursor.py in array(self, source, length, dtype)
     83         start = self.index
     84         stop = self.index = start + length*dtype.itemsize
---> 85         return source.data(start, stop, dtype)
     86 
     87     def string(self, source):

~/.local/anaconda3/lib/python3.6/site-packages/uproot/source/source.py in data(self, start, stop, dtype)
     60 
     61         if stop > len(self._source):
---> 62             raise IndexError("indexes {0}:{1} are beyond the end of data source of length {2}".format(start, stop, len(self._source)))
     63 
     64         if dtype is None:

IndexError: indexes 27:108427 are beyond the end of data source of length 108426

There should only be 2000 entries in this file for that leaf.

Do you have an idea, what the issue is? Thanks in advance for your time!

Link to the file:
https://nextcloud.e5.physik.tu-dortmund.de/index.php/s/zoiCwmmzWCxE9w3

@jpivarski
Copy link
Member

Thanks for the sample file; I'll take a look as soon as I can (likely Monday).

jpivarski added a commit that referenced this issue Oct 22, 2018
@jpivarski
Copy link
Member

I looked into it. The serialization of MArrayB and MArrayS do not include what I have affectionately called the "speed bump byte," which distinguishes between an empty array and a non-existent array (a distinction I ignore). These two classes contain char* and short* fields, for which this distinction can be relevant. However, unlike every other case in which I have encountered serializations of pointers to arrays of primitive types, in your file, the "speed bump byte" is missing.

I haven't been able to find any metadata in your file that says this byte should be missing— in fact, you even have the same TStreamerBasicPointer version as another file with the speed bumps. I can't change the logic universally because it would break all of those other cases, so the least I could do was offer a switch: you can explicitly turn off the speed bump byte, which lets you read your file.

When PR #170 makes it through continuous integration and this fix is pushed to PyPI as uproot 3.2.6, you'll be able to do this:

branch = t["MRawEvtData.fHiGainFadcSamples"]
samples = branch.array(branch.interpretation.speedbump(False))

That is, you're creating a new interpretation from the default branch.interpretation by turning off the speedbump (default is True). Then you're passing this interpretation into the branch.array method to read the branch with the new interpretation.

The hint was that it's trying to read off the end of your data source by a single byte. It's starting one byte too far in.

jpivarski added a commit that referenced this issue Oct 22, 2018
addresses issue #168: provide a way to skip speedbump bytes
@jpivarski
Copy link
Member

Oh! And once you've read these objects in, you'll have to do

samples[i]._fArray     # for some index i

to get at the array data. uproot doesn't know anything about the MArrayB or MArrayS types, so it creates Python classes with field names like _fN and _fArray. If you want to get fancy and give these types Pythonic methods (e.g. samples[i].something()), you can follow what THnSparse does in uproot-methods. However, I'd guess that once you get the array, you've got everything you need.

@LuTse
Copy link
Author

LuTse commented Oct 29, 2018

Amazing, this is actually working!
It does not produce the output I expected (length of the array should have been a lot shorter), but this is something I'll have a look into myself.
Thanks a lot!!

@jpivarski
Copy link
Member

The MArrayB and MArrayS objects have fields named _fN that specify (to uproot) the length of the arrays. I didn't notice any inconsistency with that.

Oh! It could be the fact that these are private members of the C++ class, but the C++ class might hide some data in its public interface, especially if you normally get the data from a C++ method. Unfortunately, code isn't stored in the ROOT file for uproot to use in its interpretation.

I'm glad it's working for you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants