-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support sending a DataCell
's size (& other metadata) over the wire
#1760
Labels
Comments
teh-cmc
added
🏹 arrow
concerning arrow
📉 performance
Optimization, memory use, etc
labels
Apr 4, 2023
teh-cmc
changed the title
Support sending a
Support sending a Apr 4, 2023
DataCell
's size over the wire DataCell
's size (& other metadata) over the wire
5 tasks
That would also serialize the cell sizes to disk when saving the store to an rrd file, meaning reloading it later on will be much faster. |
The size computation is now happening on the clients' no matter what (we need the value for the size_bytes trigger of the batching system), so not sending it over the wire is a literal waste of compute resources. |
7 tasks
5 tasks
teh-cmc
added a commit
that referenced
this issue
May 31, 2024
A `TransportChunk` is a `Chunk` that is ready for transport and/or storage. It is very cheap to go from `Chunk` to a `TransportChunk` and vice-versa. A `TransportChunk` maps 1:1 to a native Arrow `RecordBatch`. It has a stable ABI, and can be cheaply send across process boundaries. `arrow2` has no `RecordBatch` type; we will get one once we migrate to `arrow-rs`. A `TransportChunk` is self-describing: it contains all the data _and_ metadata needed to index it into storage. We rely heavily on chunk-level and field-level metadata to communicate Rerun-specific semantics over the wire, e.g. whether some columns are already properly sorted. The Arrow metadata system is fairly limited -- it's all untyped strings --, but for now that seems good enough. It will be trivial to switch to something else later, if need be. - Fixes #1760 - Fixes #1692 - Fixes #3360 - Fixes #1696 --- Part of a PR series to implement our new chunk-based data model on the client-side (SDKs): - #6437 - #6438 - #6439 - #6440 - #6441
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This would allow us to compute the size of
DataCell
s (a very costly operation) on the clients and therefore:The text was updated successfully, but these errors were encountered: