Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batching 3: DataRow & DataTable + no bundles outside of transport #1673

Merged
merged 1 commit into from
Mar 27, 2023

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Mar 22, 2023

In this PR:

  • Introduce DataRow & DataTable, which follow in the steps of DataCell.
  • Removes all traces of MsgBundle except for the last serialization/deserialization step (i.e. when converting from/to ArrowMsg).
  • The DataStore is still row-driven, although it'll automatically benefit from DataTable's contiguous memory as soon as we start sending more than one row.

Aside from data_row.rs & data_table.rs, it's all mostly grunt work, you can fly through. The interesting stuff starts (finally) in the next PR.

On top of #1636
Part of #1619

  • Self-review
  • No bench regression
  • No test regression
  • api_demo_rs looking good
  • api_demo_py looking good
  • TODO all the things that fit into Tracking issue: end-to-end batching #1619
  • open issues for everything else

@teh-cmc teh-cmc changed the base branch from main to cmc/datastore/better_component March 22, 2023 18:21
@teh-cmc teh-cmc force-pushed the cmc/datastore/better_component branch from de9aa19 to a0b726e Compare March 22, 2023 19:05
@teh-cmc teh-cmc added 🐍 Python API Python logging API 🏹 arrow concerning arrow 🦀 Rust API Rust logging API ⛃ re_datastore affects the datastore itself 🚀 performance Optimization, memory use, etc labels Mar 22, 2023
@teh-cmc teh-cmc force-pushed the cmc/datastore/rows_and_tables branch 3 times, most recently from 9fa8981 to eecff1a Compare March 22, 2023 21:03
@teh-cmc teh-cmc force-pushed the cmc/datastore/better_component branch from a0b726e to 7ff1630 Compare March 23, 2023 14:25
@teh-cmc teh-cmc force-pushed the cmc/datastore/rows_and_tables branch 2 times, most recently from c0d98b3 to 5642c48 Compare March 23, 2023 14:32
@teh-cmc teh-cmc force-pushed the cmc/datastore/better_component branch 2 times, most recently from ab6f087 to a360d3f Compare March 23, 2023 20:05
@teh-cmc teh-cmc force-pushed the cmc/datastore/rows_and_tables branch from 5642c48 to 83d43d7 Compare March 23, 2023 20:38
@teh-cmc teh-cmc changed the title batching 3: DataRow & DataTable + ready to accept batches batching 3: DataRow & DataTable + no bundles outside of transport Mar 23, 2023
@teh-cmc teh-cmc force-pushed the cmc/datastore/better_component branch from a360d3f to d6dede4 Compare March 24, 2023 08:45
@teh-cmc teh-cmc force-pushed the cmc/datastore/rows_and_tables branch 2 times, most recently from e647f07 to 579372d Compare March 24, 2023 10:57
crates/re_arrow_store/src/store_polars.rs Outdated Show resolved Hide resolved
crates/re_arrow_store/src/store_write.rs Outdated Show resolved Hide resolved
crates/re_arrow_store/src/store_write.rs Outdated Show resolved Hide resolved
Comment on lines -44 to -55
// Instances
#[error(
"All components within a row must have the same number of instances as the \
cluster component, got {cluster_comp}={cluster_comp_nb_instances} vs. \
{key}={num_instances}"
)]
MismatchedInstances {
cluster_comp: ComponentName,
cluster_comp_nb_instances: u32,
key: ComponentName,
num_instances: u32,
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataRow already makes sure this is upheld.

crates/re_arrow_store/src/store_write.rs Outdated Show resolved Hide resolved
crates/re_log_types/src/data_row.rs Outdated Show resolved Hide resolved
crates/re_log_types/src/data_table.rs Outdated Show resolved Hide resolved
crates/re_log_types/src/data_table.rs Outdated Show resolved Hide resolved
Comment on lines -113 to -117
// TODO(cmc): Since we don't yet support mixing splatted data within instanced rows,
// we need to craft an array of `MsgId`s that matches the length of the other components.
this.cells.push(DataCell::from_native(
vec![msg_id; this.num_instances()].iter(),
));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now handled by DataRow

crates/re_sdk/src/msg_sender.rs Outdated Show resolved Hide resolved
@teh-cmc teh-cmc force-pushed the cmc/datastore/rows_and_tables branch 3 times, most recently from cedfddc to d33cd0f Compare March 24, 2023 13:43
@teh-cmc teh-cmc marked this pull request as ready for review March 24, 2023 13:43
@teh-cmc teh-cmc force-pushed the cmc/datastore/better_component branch 2 times, most recently from 555d7d6 to ed4ccae Compare March 27, 2023 08:13
@teh-cmc teh-cmc force-pushed the cmc/datastore/rows_and_tables branch from d33cd0f to d8d507c Compare March 27, 2023 08:18
Base automatically changed from cmc/datastore/better_component to main March 27, 2023 08:21
@teh-cmc teh-cmc force-pushed the cmc/datastore/rows_and_tables branch from d8d507c to 1fa6aa7 Compare March 27, 2023 08:22
@teh-cmc teh-cmc force-pushed the cmc/datastore/rows_and_tables branch from 1fa6aa7 to dc1c47f Compare March 27, 2023 08:23
@emilk emilk self-requested a review March 27, 2023 10:17
@@ -761,60 +761,3 @@ fn test_arrow() {
let tensors_out: Vec<Tensor> = TryIntoCollection::try_into_collection(array).unwrap();
assert_eq!(tensors_in, tensors_out);
}

#[test]
fn test_concat_and_slice() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need this test anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically just testing that arrow2 and arrow2-convert do the correct thing with unions, which is now tested upstream since we've upstreamed the fixes.

@teh-cmc teh-cmc merged commit 8a37e50 into main Mar 27, 2023
@teh-cmc teh-cmc deleted the cmc/datastore/rows_and_tables branch March 27, 2023 13:25
@teh-cmc teh-cmc removed 🐍 Python API Python logging API 🦀 Rust API Rust logging API labels Mar 27, 2023
@@ -35,6 +34,7 @@ use crate::{

// ---

// TODO: can probably make that one pub(crate) already
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did this pass CI!?!? This made main red

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, missed it sorry 😞

@teh-cmc teh-cmc mentioned this pull request Apr 17, 2023
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏹 arrow concerning arrow 🚀 performance Optimization, memory use, etc ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants