-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eagerly serialize components upon Archetype
& ComponentBatch
serialization in Rust and C++
#7245
Labels
🌊 C++ API
C/C++ API specific
💬 discussion
🪵 Log & send APIs
Affects the user-facing API for all languages
🦀 Rust API
Rust logging API
Comments
Wumpf
added
💬 discussion
🦀 Rust API
Rust logging API
🌊 C++ API
C/C++ API specific
🪵 Log & send APIs
Affects the user-facing API for all languages
labels
Aug 20, 2024
6 tasks
6 tasks
teh-cmc
added a commit
that referenced
this issue
Aug 23, 2024
It doesn't make any sense for a `ComponentBatch` to have any say in what the final `ArrowField` should look like. An `ArrowField` is a `Chunk`/`RecordBatch`/`Schema`-level concern that only makes sense during IO/transport/FFI/storage/etc, and which requires external context that a single `ComponentBatch` on its own has no idea of. --- Part of a lot of clean up I want to while we head towards: * #7245 * #3741
This was referenced Aug 23, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
🌊 C++ API
C/C++ API specific
💬 discussion
🪵 Log & send APIs
Affects the user-facing API for all languages
🦀 Rust API
Rust logging API
As we'll soon introduce tagged components and simple multi-datatype components, it gets harder and harder to represent Archetypes (and concrete ComponentBatches) as collection of concrete types.
Let's take the example of a generalized
rotation
component/archetype field which may be represented by various datatypes: we no longer can store concrete types on an archetype and have to type-earse them right away instead.Note that this way C++ and Rust get much closer to the Python SDK in this regard.
This fits very well into our desire to get rid of concrete component types in the SDK languages which today almost always take the form of
struct ComponentType(pub datatypes::TheDataType)
together with myriad of constructors, trait impls and utilities. I.e. a lot of forwarding code.Eager serialization allows us to implement component semantics on archetypes instead with concrete construction methods. E.g.
with_quaternion
andwith_axis_angle
would both populate the multi-datatyperotation
component which gets tagged appropriately.When logging raw component batches/columns this would become more explicit as you're expected to supply a datatype array/collection together with the appropriate component tag (which will still be provided by the SDK, but more in registry fashion rather a
class
/struct
per component). This follows the exact same mechanism of how an archetype construct its internalComponentBatches
.A drawback of this approach is that most accesses of archetypes requires deserialization back into the source datatypes which can be cumbersome in some cases. However, this is what we expect to do when a user reads back data from the store, so this is something that may soon become common-place anyways.
Another nice side effect is that the "ephemeral
rerun::Collection
hazard" goes away as we'd no longer store pointers to user data, making the API a lot safer to use. (rerun::Collection
becomes a pure pass-through type as it should be)rerun::Collection
borrows data too eagerly, making it very easy to cause segfaults & read of invalid data #7081This ticket is a meetup discussion outcome of @jleibs and @Wumpf with some additional input by @emilk
The text was updated successfully, but these errors were encountered: