Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frame serialization/deserialization #565

Open
faustus123 opened this issue Feb 23, 2024 · 12 comments
Open

Frame serialization/deserialization #565

faustus123 opened this issue Feb 23, 2024 · 12 comments
Assignees

Comments

@faustus123
Copy link

  • OS version: all
  • Compiler version: all
  • PODIO version: all
  • Goal:

I would like to read frames(events) from a PODIO file and stream them over a network to a remote process. I would then like that remote process to process the frame in the same way it would if it had been read from a local file using the existing API. This would need to honor any associations.

Note that I'm not really interested in solutions like just letting xrootd handle the transfer since I need some control over the stream(s), buffer headers, and networking details.

@hegner
Copy link
Collaborator

hegner commented Feb 23, 2024

To get an understanding how low level you want to go here - could you write down some pseudo-code to show which parts you would do yourself and which parts you'd expect PODIO to do? And the atomic piece of the streaming - would it be an entire frame or single collections? And do you expect both sides to use the same language?

@tmadlener
Copy link
Collaborator

From a purely technical point of view, as long as you have a FrameDataT that effectively implements the same functionality as the EmptyFrameData everything should work as expected when you construct a Frame from it.

struct EmptyFrameData {
podio::CollectionIDTable getIDTable() const {
return {};
}
std::optional<podio::CollectionReadBuffers> getCollectionBuffers(const std::string&) {
return std::nullopt;
}
/** Get the still available, i.e. yet unpacked, collections from the raw data
*/
std::vector<std::string> getAvailableCollections() const {
return {};
}
/** Get the parameters that are stored in the raw data
*/
std::unique_ptr<podio::GenericParameters> getParameters() {
return std::make_unique<podio::GenericParameters>();
}
};

I think something like this would be the thing that goes over the network, as you have effectively full control of what you put in there from a content and also technical perspective. The main thing that could make this a bit more complicated is the fact that the collection ID table and the IDs that are in the buffers need to be consistent. When starting from a file this should not really be a problem, I think.

@faustus123
Copy link
Author

faustus123 commented Feb 27, 2024

Here is some pseudo code along the lines of what I was thinking of:

//----------------------------------------------------------
// For the sender side
podio::ROOTFrameReader m_reader;
m_reader.openFile( GetResourceName() );

for( int i=0; i <  m_reader.getEntries("events"); i++){
	auto frame_data = m_reader.readEntry("events", i);
	auto frame = std::make_unique<podio::Frame>(std::move(frame_data));

	std::vector<uint8_t> buff;
	frame->Serialize( buff );

	// send buffer to remote
}

//----------------------------------------------------------
// For the receiver side

while(is_connected){

	std::vector<uint8_t> buff = ReadBufferFromSocket();
	auto frame = podio::Frame::Deserialize( buff );

	// Do something with collections in frame
}

I also could see needing a couple of calls to handle the non-event data that would be done at the beginning of each program.

As for the podio::CollectionReadBuffers class, I guess I'd have to look into how to serialize those as individual objects. It looks like the opening of a rabbit hole that I was hoping to avoid. Perhaps with a little more guidance I could look into it.

@tmadlener
Copy link
Collaborator

This looks quite sensible. As I mentioned before, I am not sure I would put the de-/serialize functionality on the Frame or whether I would create some new FrameData type (or extend the existing ROOTFrameData) that has that functionality. It could avoid some up-front work that happens when constructing a Frame from the frame data.

Is reading from a file the main use case here, or do you envisage also having some algorithm create / populate a frame and then send that off somewhere? If it's mainly the former I would probably go for a solution involving the FrameData as we probably have easier access to some "buffer like" data. However, if the latter is also a use case then the Frame would be the more natural point to tack the functionality on, I think.

The CollectionReadBuffers are something between a useful abstraction and a bit of a hack at the moment, tbh ;) Effectively they are a void* to the data buffers and some std::functions that do the actual work, where we generate most of them via our templates to inject type information back into the system. In principle it would be possible to add a

std::function<void(podio::CollectionReadBuffers const&, std::vector<uint8_t>&) serialize;

as a member function and then populate that with the correct implementation for each type at code generation time. However, if we really need to make them part of the "public" parts of podio, we should probably think about whether that is the best way to go about it.

Do you already have some library that does the de-/serialization? In case you haven't, I think the SIO backend that we have solves quite a few things already, and we might be able to use that to create a new set of readers and writers that effectively write to / read from a socket and otherwise simply use functionality that is already present.

@faustus123
Copy link
Author

My short term goal is to read an ePIC simulated data file, split the data into multiple streams, and recombine them in a specialized JEventSource. This will allow the standard ePIC analysis to be run with data straight from the stream. The first step will be a single stream, but multiple streams will hopefully soon follow.

My longer term goals include dynamically filling a frame in memory and then serializing it. I can't say for certain though how far upstream PODIO will go in ePIC since AFAIK it has not been seriously discussed. For the purposes of streaming system development though, it will be a very useful tool.

@tmadlener
Copy link
Collaborator

Just for my understanding and clarification: You do not actually care how we do the de-/serialization, right? This would include the actual type of the buffer. So does it have to be a vector<uint8_t>, or could we also use a vector<char> or something else that resembles "a collection of bytes" as long as we know how to interpret them?

@faustus123
Copy link
Author

Correct. I would just need a buffer reference and its size so I could pass it to a generic write command. The data type can be anything that represents a collection of bytes.

@hegner
Copy link
Collaborator

hegner commented Feb 28, 2024

OK. And the required granularity for you would be on the frame level only for the time being?
We could provide you with something to play with relatively quickly once we finished a few other outstanding issues. We'd for obvious reasons put things into an experimental namespace for the time being.

@faustus123
Copy link
Author

Yes, frame level would be good for now. Collection level may be useful later, but it brings the complication of how to handle associations so I'd rather push that headache down the road until we have a clearer motivation for it.

Understood on the namespace.

@hegner hegner self-assigned this Feb 28, 2024
@faustus123
Copy link
Author

Just checking on the progress here. I have an LDRD milestone that would benefit from this, but will implement a different, temporary hack there if the timeline is going to be more than ~1week. Not pressuring anyone, just figuring my best course of action.

@faustus123
Copy link
Author

FYI: I did get some sudden inspiration over the weekend and have implemented what I think may be a solution. It did require some modification of the RootReader class. This essentially adds a openTDirectory method as an alternative to openFiles. Unfortunately, I ran into an issue building our primary recon program in order to test it. I think that is just a versioning issue on our end. I'll work on it more when I can find time later in the week and will report back then.

@hegner
Copy link
Collaborator

hegner commented Mar 25, 2024

Thanks. On my side there are Easter vacations scheduled so no progress in the next week.
Your idea sounds interesting, but in the long run we want something not depending on ROOT there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants