Davidl root reader t directory #579

faustus123 · 2024-04-10T03:29:10Z

BEGINRELEASENOTES
Add ROOTReader::openTDirectory() to allow external management of TDirectory (e.g. TMemFile) to support streaming.
ENDRELEASENOTES

This is motivated by issue #565 . It makes it possible to write a stream source that sends data packets in the form of serialized TMemFile objects using ROOT's built-in serialization/deserialization functionality. Each TMemFile may contain a small number of events in the "events" tree and the complete metadata trees allowing the other standard ROOTReader methods to work. I have a working test case that reads events from an existing PODIO ROOT file and sends them this way via zmq messages. This couples with the ePIC recon code and works so there is a proof-of-principle that this works as needed.

A couple of notes on what was done:

A TTree *m_metaTree member was added to take the place of the TChain in most of the calls for which a TChain is not actually needed. When a TChain is actually used, this just points to the TChain.
The TChain was changed from being kept in a unique_ptr to being a direct member of the class. I suspect the unique_ptr trick was used because TChain has its copy and assignment constructors unavailable as private members of the TChain class and that causes some issues with storing these in std::map. I ran into other complications with trying to implement m_metaTree to support both cases (an externally supplied TTree or an internal TChain) I resolved it by explicitly deleting the copy and assignment constructors for ROOTReader which avoids the problem of ownership of the TChain. Removing the unique_ptr mechanism was then possible which simplified things in the class. There may have been other reasons for using unique_ptr which I don't know about.
A new test was added read_frame_root_tdirectory which is almost a complete copy of read_frame_root_multiple. I can't claim to fully understand the testing configuration, but hopefully this is in line with what you would like.

…Directory

tmadlener

Thanks a lot for this. It's a slightly different approach than I would have attempted for this, but I think it's also quite elegant as it re-uses quite a bit of existing machinery. Also summoning @hegner for a few more high level comments.

One question: How do you build the TMemFiles on the writing side? (Or plan to build)

My main concern for this approach is that this ties the streaming of Frames rather tightly to ROOTs TTree backend. I would have tried to serialize on the FrameData level, which should be possible without any dependency on a specific backend. However, since you have a proof-of-principle already for the full chain, your approach is potentially a bit easier to implement (at least on the podio side), since a lot of the heavy lifting is done else where. You probably have a better feeling for how involved the overall approach is.

Another somewhat smaller concern is performance. When I first implemented Frame I/O I didn't pay too much attention to the things that happen "only once", e.g. when opening a file, or when first setting up a category for reading the first entry. Hence, I crammed quite a bit of (potentially) unnecessary work into that since the overhead will be distributed in any case. I am not entirely sure how things look if this "overhead" happens repeatedly for just a few events. In case all of the events that you stream have the same metadata, and the same categories there might be a way to cache a bit more of this information between calls to openTDirectory in case performance becomes an issue. Until that is the case, I would prefer the current version, since has fewer assumptions and is more readable.

I can't claim to fully understand the testing configuration, but hopefully this is in line with what you would like.

Yeah, it's not the simplest setup, but I haven't yet found the time to clean it up a bit better, now that the legacy EventStore is gone which drove some of it's implementation. I think overall the current approach is OK, I would maybe try to reduce some code duplication, but we can also do that at the end.

tmadlener · 2024-04-10T07:50:04Z

src/ROOTReader.cc

@@ -189,54 +197,87 @@ std::vector<std::string> getAvailableCategories(TChain* metaChain) {
  return brNames;
 }

+/// @brief Read version and data model from the m_metaTree 


The docstring needs to be in the header file in our doxygen configuration.

tmadlener · 2024-04-10T07:57:37Z

tests/root_io/read_frame_root_tdirectory.cpp

+
+#include "TFile.h"
+
+int read_frames(podio::ROOTReader& reader) {


This (on a quick first glance) seems to more or less be a copy of the templated version in read_test.h. Maybe that can be adapted to split off a version with an already initialized reader (initialized == file / TDirectory opened).

faustus123 · 2024-04-10T12:58:20Z

One question: How do you build the TMemFiles on the writing side? (Or plan to build)

You can see a pair of programs that do the serialization/deserialization and zmq messaging here. These are independent of the PODIO library. They are specific to the ROOT file format of the ePIC simulated data files in that it assumes 4 TTrees with the event data being kept in "events".

https://github.com/JeffersonLab/SRO-RTDP/tree/davidl_podio2tcp/src/utilities/cpp/podio2tcp

How this is implemented on the ePIC side is via a JANA JEventSource plugin. A rough prototype of that can be seen here. It mostly copies the one already in EICrecon that uses ROOTReader::openFiles(). This will need a lot of cleanup, but the core functionality is there.

https://github.com/JeffersonLab/SRO-RTDP/blob/davidl_podio2tcp/src/utilities/cpp/podiostream/podiostreamSource.cc

My main concern for this approach is that this ties the streaming of Frames rather tightly to ROOTs TTree backend. I would have tried to serialize on the FrameData level, which should be possible without any dependency on a specific backend. However, since you have a proof-of-principle already for the full chain, your approach is potentially a bit easier to implement (at least on the podio side), since a lot of the heavy lifting is done else where. You probably have a better feeling for how involved the overall approach is.

I agree 100% that a better implementation will be to do this at the Frame level. This approach was quicker for me to implement and didn't require my digging too deep into the PODIO source. It will serve my short term goals until someone more familiar with PODIO can do it the right way. I also think the openTDirectory() still adds some flexibility that may be useful in some other cases.

Another somewhat smaller concern is performance. When I first implemented Frame I/O I didn't pay too much attention to the things that happen "only once", e.g. when opening a file, or when first setting up a category for reading the first entry. Hence, I crammed quite a bit of (potentially) unnecessary work into that since the overhead will be distributed in any case. I am not entirely sure how things look if this "overhead" happens repeatedly for just a few events. In case all of the events that you stream have the same metadata, and the same categories there might be a way to cache a bit more of this information between calls to openTDirectory in case performance becomes an issue. Until that is the case, I would prefer the current version, since has fewer assumptions and is more readable.

Yes, performance is not great, but it is currently obscured by the horribly slow ePIC reconstruction code. When reading from a file, it takes 2-5 seconds to reconstruct an event. Preliminary testing with pre-formed send buffers yields sending takes only about 0.02 seconds/event.

When it comes to reading in the meta-data, one knob we have to turn is how many events are sent at once. I can just as easily send 100 events or more at a time which does reduce that overhead. I'll tune that as needed when I start scaling up for more performance testing.

tmadlener · 2024-04-10T14:04:21Z

Thanks for the links. After a quick look and IIUC the main use case here is to read from stream in Jana2, but the source of the stream would then be outside of Jana2? I am mainly asking, because I was wondering whether you also planned to implement (or actually need) the creation of the TMemFile somewhere in memory. Currenty, it seems that everything starts from an existing podio file on disk rather.

faustus123 · 2024-04-10T14:18:16Z

Thanks for the links. After a quick look and IIUC the main use case here is to read from stream in Jana2, but the source of the stream would then be outside of Jana2? I am mainly asking, because I was wondering whether you also planned to implement (or actually need) the creation of the TMemFile somewhere in memory. Currenty, it seems that everything starts from an existing podio file on disk rather.

Correct, there is no need for a JANA2 process on the sending side.

I do not anticipate using TMemFile in any production streaming system. It is just the (relatively) easy way to serialize ROOT objects which is the form the simulated ePIC data happens to be in. My expectation is that the DAQ WG will decide on some custom format for the stream data driven by the customized nature of the electronics. We may deal with that either in a specific JANA2 JEventSource or by some future transformer code that converts it into a PODIO stream (whatever that design turns out to be).

It is true that at the moment and for the next couple of years we will likely only have simulated data and it will be in the form of podio files on disk. Hence the need to generate stream(s) from that so the SRO infrastructure can start development.

faustus123 · 2024-06-10T15:53:50Z

I haven't heard anything on this for 2 months. Are there any plans to merge it or any hold-ups preventing it?

hegner · 2024-06-11T19:30:46Z

@faustus123 - apologies for having it dormant for so long. We've been doing some refactoring which we just managed to finish. If you can update your PR I can deal with it this week.

faustus123 · 2024-06-13T21:15:07Z

Uggh. I just tried to rebase to bring my fork up to date and a lot of changes have happened in the ROOTReader files in the meantime. It is going to take a while to try and resolve this. It would probably be more efficient if I do this after you have finished with whatever work you are doing there. Please let me know when you are finished with the current round of work and I will take a look at it then.

tmadlener · 2024-06-14T08:08:42Z

The changes that we want to go in first are the ones in #625. Looking at the changes in this PR, I am wondering whether you cherry-picking might be easier than rebasing?

faustus123 added 5 commits March 27, 2024 09:12

First draft of using in-memory trees.

3b14f3f

Merge remote-tracking branch 'origin/master' into davidl_RootReader_T…

62e0c8f

…Directory

Clean up the code/comments a bit.

7bf12c8

Add tests for ROOTReader::openTDirectory

67e8159

Merge remote-tracking branch 'origin/master' into davidl_RootReader_T…

2bd84e5

…Directory

tmadlener reviewed Apr 10, 2024

View reviewed changes

hegner self-requested a review June 11, 2024 19:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Davidl root reader t directory #579

Davidl root reader t directory #579

faustus123 commented Apr 10, 2024

tmadlener left a comment

tmadlener Apr 10, 2024

tmadlener Apr 10, 2024

faustus123 commented Apr 10, 2024

tmadlener commented Apr 10, 2024

faustus123 commented Apr 10, 2024

faustus123 commented Jun 10, 2024

hegner commented Jun 11, 2024

faustus123 commented Jun 13, 2024

tmadlener commented Jun 14, 2024


		#include "TFile.h"

		int read_frames(podio::ROOTReader& reader) {

Davidl root reader t directory #579

Are you sure you want to change the base?

Davidl root reader t directory #579

Conversation

faustus123 commented Apr 10, 2024

tmadlener left a comment

Choose a reason for hiding this comment

tmadlener Apr 10, 2024

Choose a reason for hiding this comment

tmadlener Apr 10, 2024

Choose a reason for hiding this comment

faustus123 commented Apr 10, 2024

tmadlener commented Apr 10, 2024

faustus123 commented Apr 10, 2024

faustus123 commented Jun 10, 2024

hegner commented Jun 11, 2024

faustus123 commented Jun 13, 2024

tmadlener commented Jun 14, 2024