-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
namedtuple or dataclasses to implement a general format for data 'packets' #66
Comments
my current thought on this is to use msgpack and then always have a "name", "type", "data" you can essentially create a minimal type system that can be cross-platform (since basically every language supports message pack). So the ZMQ message would just have two parts: one for routing and a msgpack byte array. This does add some overhead, but msgpack is pretty efficient and it would mean that if someone wanted to write a message sink in a language other than python, it would be fine as long as there was zmq and msgpack available. (which is basically all mainstream languages). For data frames, I think arrow or feather is the best supported format for interoperability. so you could go pandas->arrow->serialize->arrow->DataFrames.jl |
yes yes, currently at the level of serialization we're using something similar, though with json rather than msgpack, though checking msgpack out now it does look very attractive. Need to add this to the docs because some of this isn't explicitly described there, but messages are currently structured like this: top level zmq "frames": where a message can be routed through multiple recipients (multple The serialized message (which is abstracted by the
as well as an optional What I'm thinking is some recursive abstraction for "packets" of data (that would then provide some means of serialization/deserialization for transport): Problem:
Requirements
I know other libraries must have something like this (pretty sure Brian2 does) so i'll look around for inspiration rather than trying to think from scratch |
I did something like this here https://github.com/CohenLabPrinceton/pvp/blob/master/pvp/common/values.py |
Stubbing out an idea:
In many cases data needs to move around the system in 'packets' -- eg. a frame from a camera needs both the image frame as well as the timestamp, etc., a task stage will return a prespecified set of data, and so on.
We want to have some predictable way of
a) specifying what fields we expect to have in a given 'packet'
b) provide a uniform way of accessing fields of data -- eg. for transformations, we don't want to have to write a thousand subclasses to handle whether the data is a pandas dataframe or numpy array...
c) handle common transformations like unit conversions, serialization and compression operations
it seems like data classes are the natural builtin way of doing this, seems like we may also want to provide some interface to declare them on the fly more in the syntax of namedtuple.
The text was updated successfully, but these errors were encountered: