Preparing for reference behavior while using header-only awkward #2838
Replies: 1 comment 5 replies
-
In the array structure itself, a The lookup function (and action to be taken if the referent is not there) can be encapsulated in a behavior, as it is for Coffea. Doing a row-wise read in C++ is not as bad as doing it in Python. At least there's that. I was going to say that I was surprised that something as new as the ILC would be based on a custom row-wise format, but I see that the paper goes back to 2003. That's a long development time. Will the same technology be used when the ILC is actually built? CMS switched frameworks twice in its early development. (From something before my time, to ORCA, which I saw the very end of, to CMSSW.) Footnotes
|
Beta Was this translation helpful? Give feedback.
-
Howdy everyone! 🤠 I have greatly appreciated using
awkward
in my HEP research since it so neatly follows the structure of our data. I am now working on a project to try to load data stored in LCIO intoak.Array
in-memory using the header-only version of awkward to access the LCIO C++ API and construct the array.The basic demo for using awkward with pybind11 has been very helpful and I have gotten pretty far, but now I am stuck at (what I believe to be) the largest hurdle: representing "references" in the data (what LCIO calls "relations"). In general, these "references" come in two types:
EVENT::Track
object contains references to theEVENT::TrackerHit
s which make up the track).EVENT::LCRelation
). These are often helpful for associating extra information that is not always necessary (e.g. data about the kinks in any constructed tracks can be related to the tracks via this object).I'm curious if awkward experts have a good idea on how to design this reference behavior. I know I'll need to implement some awkward behavior so that "de-referencing" can happen in an automatic way (coffea already has "de-referencing" for the PhysLite schema so I'm not too worried about writing that side of things), but what about the actual data construction side? Should I just have a "reference" be a
Numpy
array ofint
s that I can apply as indices? IsIndexOptionArray
what I'm looking for?As a side note, it appears to me that the LCIO file format (well, technically the
slcio
file format since its LCIO written with SIO - confusing I know) is completely row-wise - i.e. there is no chunking of columns near to each other in the file (like how ROOT does with TBasket), so I will need to construct the array one event at a time. This will be relatively slow but I think I'll implement some caching-to-parquet protocol if it is really painful. In addition, this means I am focusing on doing a single-pass reading all requested data into memory (a.k.a. I'm not going to attempt to do lazy-loading at this time).Hopefully this project will be promoted to a "Show and Tell" project in a few weeks 😉 but we will see if I am able to get it into a good enough form to display.
Beta Was this translation helpful? Give feedback.
All reactions