Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eagerly serialize components upon Archetype & ComponentBatch serialization in Rust and C++ #7245

Open
Wumpf opened this issue Aug 20, 2024 · 0 comments
Labels
🌊 C++ API C/C++ API specific 💬 discussion 🪵 Log & send APIs Affects the user-facing API for all languages 🦀 Rust API Rust logging API

Comments

@Wumpf
Copy link
Member

Wumpf commented Aug 20, 2024

As we'll soon introduce tagged components and simple multi-datatype components, it gets harder and harder to represent Archetypes (and concrete ComponentBatches) as collection of concrete types.
Let's take the example of a generalized rotation component/archetype field which may be represented by various datatypes: we no longer can store concrete types on an archetype and have to type-earse them right away instead.
Note that this way C++ and Rust get much closer to the Python SDK in this regard.

This fits very well into our desire to get rid of concrete component types in the SDK languages which today almost always take the form of struct ComponentType(pub datatypes::TheDataType) together with myriad of constructors, trait impls and utilities. I.e. a lot of forwarding code.
Eager serialization allows us to implement component semantics on archetypes instead with concrete construction methods. E.g. with_quaternion and with_axis_angle would both populate the multi-datatype rotation component which gets tagged appropriately.
When logging raw component batches/columns this would become more explicit as you're expected to supply a datatype array/collection together with the appropriate component tag (which will still be provided by the SDK, but more in registry fashion rather a class/struct per component). This follows the exact same mechanism of how an archetype construct its internal ComponentBatches.

A drawback of this approach is that most accesses of archetypes requires deserialization back into the source datatypes which can be cumbersome in some cases. However, this is what we expect to do when a user reads back data from the store, so this is something that may soon become common-place anyways.

Another nice side effect is that the "ephemeral rerun::Collection hazard" goes away as we'd no longer store pointers to user data, making the API a lot safer to use. (rerun::Collection becomes a pure pass-through type as it should be)


This ticket is a meetup discussion outcome of @jleibs and @Wumpf with some additional input by @emilk

@Wumpf Wumpf added 💬 discussion 🦀 Rust API Rust logging API 🌊 C++ API C/C++ API specific 🪵 Log & send APIs Affects the user-facing API for all languages labels Aug 20, 2024
teh-cmc added a commit that referenced this issue Aug 23, 2024
Remove unused old traits.

Part of a lot of clean up I want to while we head towards:
* #7245
* #3741
teh-cmc added a commit that referenced this issue Aug 23, 2024
It doesn't make any sense for a `ComponentBatch` to have any say in what
the final `ArrowField` should look like.

An `ArrowField` is a `Chunk`/`RecordBatch`/`Schema`-level concern that
only makes sense during IO/transport/FFI/storage/etc, and which requires
external context that a single `ComponentBatch` on its own has no idea
of.

---

Part of a lot of clean up I want to while we head towards:
* #7245
* #3741
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🌊 C++ API C/C++ API specific 💬 discussion 🪵 Log & send APIs Affects the user-facing API for all languages 🦀 Rust API Rust logging API
Projects
None yet
Development

No branches or pull requests

1 participant