Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batching 1: introduce DataCell & retire ComponentBundle #1634

Merged
merged 8 commits into from
Mar 27, 2023

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Mar 21, 2023

First step of #1619: Introduce DataCell and retire ComponentBundle.

DataCell is the leaf type of our data model, a uniform array of components: [C, C, C, ...]. No more, no less.
Users should prefer using it over raw arrow arrays as it tends to make code a lot simpler (and faster).
Behind the scenes, a DataCell is backed by an erased arrow array living on the heap, which is likely to point into a larger batch of contiguous memory that it shares with its peers.

This PR merely introduces the DataCell type but doesn't quite use it everywhere: the store in particular still shuffles raw arrow arrays around for the most part (for now).

This is mostly grunt work (and a bunch of house cleaning I've wanted done for a while): aside from data_cell.rs there's no notable logic changes.

Nice surprise: we get roughly 5% to 25% performance improvement across all our benchmarks, which I assume is the result of DataCell eliminating a bunch of unneeded list wraps/unwraps all over the place.
Also discovered a major performance pitfall in TryIntoArrow.

Spawned a bunch of issues:


Part of #1619

  • Self-review
  • No bench regression
  • No test regression
  • api_demo_rs looking good
  • api_demo_py looking good
  • TODO all the things that fit into Tracking issue: end-to-end batching #1619
  • open issues for everything else

@teh-cmc teh-cmc changed the title datastore: introduce DataCell all: introduce DataCell Mar 21, 2023
@teh-cmc teh-cmc added enhancement New feature or request 🏹 arrow concerning arrow ⛃ re_datastore affects the datastore itself labels Mar 21, 2023
@emilk emilk marked this pull request as draft March 21, 2023 10:37
@teh-cmc teh-cmc changed the title all: introduce DataCell batching 1: introduce DataCell Mar 21, 2023
@teh-cmc teh-cmc force-pushed the cmc/datastore/new_types branch 6 times, most recently from 70e7a65 to 60912c1 Compare March 23, 2023 15:44
crates/re_arrow_store/src/store_write.rs Outdated Show resolved Hide resolved
crates/re_arrow_store/src/store_write.rs Outdated Show resolved Hide resolved
crates/re_arrow_store/tests/correctness.rs Outdated Show resolved Hide resolved
crates/re_arrow_store/src/store_write.rs Outdated Show resolved Hide resolved
crates/re_arrow_store/tests/correctness.rs Outdated Show resolved Hide resolved
crates/re_log_types/src/msg_bundle.rs Show resolved Hide resolved
crates/re_log_types/src/msg_bundle.rs Outdated Show resolved Hide resolved
crates/re_sdk/src/msg_sender.rs Outdated Show resolved Hide resolved
crates/re_sdk/src/msg_sender.rs Outdated Show resolved Hide resolved
crates/re_sdk/src/msg_sender.rs Outdated Show resolved Hide resolved
@teh-cmc teh-cmc changed the title batching 1: introduce DataCell batching 1: introduce DataCell & retire ComponentBundle Mar 23, 2023
@teh-cmc teh-cmc force-pushed the cmc/datastore/new_types branch from 60912c1 to a5a42c9 Compare March 23, 2023 18:01
@teh-cmc teh-cmc marked this pull request as ready for review March 23, 2023 18:27
@teh-cmc teh-cmc force-pushed the cmc/datastore/new_types branch 3 times, most recently from acfe862 to f7f55c9 Compare March 23, 2023 19:15
@teh-cmc teh-cmc force-pushed the cmc/datastore/new_types branch from f7f55c9 to 8512de9 Compare March 23, 2023 19:51
@teh-cmc
Copy link
Member Author

teh-cmc commented Mar 23, 2023

Trouble ahead:

Benchmark suite Current: 8512de9 Previous: 1802c20 Ratio
datastore/insert/batch/rects/insert 601675 ns/iter (± 2069) 558040 ns/iter (± 2726) 1.08
datastore/latest_at/batch/rects/query 1890 ns/iter (± 17) 1869 ns/iter (± 5) 1.01
datastore/latest_at/missing_components/primary 287 ns/iter (± 0) 287 ns/iter (± 0) 1
datastore/latest_at/missing_components/secondaries 440 ns/iter (± 0) 456 ns/iter (± 0) 0.96
datastore/range/batch/rects/query 154071 ns/iter (± 437) 153037 ns/iter (± 237) 1.01
mono_points_arrow/generate_message_bundles 31957497 ns/iter (± 801251) 52357104 ns/iter (± 640637) 0.61
mono_points_arrow/generate_messages 130995451 ns/iter (± 1129415) 136205204 ns/iter (± 1313776) 0.96
mono_points_arrow/encode_log_msg 152674306 ns/iter (± 1031782) 171213456 ns/iter (± 1741778) 0.89
mono_points_arrow/encode_total 319904451 ns/iter (± 7475615) 362883755 ns/iter (± 2060012) 0.88
mono_points_arrow/decode_log_msg 176332820 ns/iter (± 1853162) 189752854 ns/iter (± 4013520) 0.93
mono_points_arrow/decode_message_bundles 54788523 ns/iter (± 1009136) 75254710 ns/iter (± 1140876) 0.73
mono_points_arrow/decode_total 223955102 ns/iter (± 1700294) 259499357 ns/iter (± 2333802) 0.86
batch_points_arrow/generate_message_bundles 325961 ns/iter (± 4397) 341982 ns/iter (± 553) 0.95
batch_points_arrow/generate_messages 6767 ns/iter (± 106) 6228 ns/iter (± 13) 1.09
batch_points_arrow/encode_log_msg 348719 ns/iter (± 3366) 372885 ns/iter (± 1494) 0.94
batch_points_arrow/encode_total 705107 ns/iter (± 7630) 746449 ns/iter (± 4228) 0.94
batch_points_arrow/decode_log_msg 346301 ns/iter (± 1849) 354317 ns/iter (± 1692) 0.98
batch_points_arrow/decode_message_bundles 1560 ns/iter (± 18) 2018 ns/iter (± 8) 0.77
batch_points_arrow/decode_total 352278 ns/iter (± 2929) 360005 ns/iter (± 2070) 0.98
arrow_mono_points/insert 6291970452 ns/iter (± 18115496) 7101183936 ns/iter (± 48000479) 0.89
arrow_mono_points/query 1750297 ns/iter (± 16858) 1847772 ns/iter (± 25156) 0.95
arrow_batch_points/insert 2945631 ns/iter (± 30202) 2738045 ns/iter (± 81908) 1.08
arrow_batch_points/query 15935 ns/iter (± 220) 16152 ns/iter (± 40) 0.99
arrow_batch_vecs/insert 556514 ns/iter (± 7317) 43144 ns/iter (± 226) 12.90
arrow_batch_vecs/query 382755 ns/iter (± 4626) 389549 ns/iter (± 596) 0.98
tuid/Tuid::random 34 ns/iter (± 0) 34 ns/iter (± 0) 1

@teh-cmc
Copy link
Member Author

teh-cmc commented Mar 23, 2023

Aaaand we're good:

Benchmark suite Current: 14f23b3 Previous: 1802c20 Ratio
datastore/insert/batch/rects/insert 584558 ns/iter (± 1896) 558040 ns/iter (± 2726) 1.05
datastore/latest_at/batch/rects/query 1830 ns/iter (± 10) 1869 ns/iter (± 5) 0.98
datastore/latest_at/missing_components/primary 288 ns/iter (± 0) 287 ns/iter (± 0) 1.00
datastore/latest_at/missing_components/secondaries 443 ns/iter (± 0) 456 ns/iter (± 0) 0.97
datastore/range/batch/rects/query 152376 ns/iter (± 260) 153037 ns/iter (± 237) 1.00
mono_points_arrow/generate_message_bundles 33976334 ns/iter (± 1653914) 52357104 ns/iter (± 640637) 0.65
mono_points_arrow/generate_messages 130309746 ns/iter (± 1458602) 136205204 ns/iter (± 1313776) 0.96
mono_points_arrow/encode_log_msg 166831936 ns/iter (± 1648600) 171213456 ns/iter (± 1741778) 0.97
mono_points_arrow/encode_total 333659366 ns/iter (± 2495635) 362883755 ns/iter (± 2060012) 0.92
mono_points_arrow/decode_log_msg 187351664 ns/iter (± 1649335) 189752854 ns/iter (± 4013520) 0.99
mono_points_arrow/decode_message_bundles 60453431 ns/iter (± 1676753) 75254710 ns/iter (± 1140876) 0.80
mono_points_arrow/decode_total 244841469 ns/iter (± 2074305) 259499357 ns/iter (± 2333802) 0.94
batch_points_arrow/generate_message_bundles 332330 ns/iter (± 933) 341982 ns/iter (± 553) 0.97
batch_points_arrow/generate_messages 5990 ns/iter (± 9) 6228 ns/iter (± 13) 0.96
batch_points_arrow/encode_log_msg 355456 ns/iter (± 2223) 372885 ns/iter (± 1494) 0.95
batch_points_arrow/encode_total 709792 ns/iter (± 3717) 746449 ns/iter (± 4228) 0.95
batch_points_arrow/decode_log_msg 350449 ns/iter (± 1317) 354317 ns/iter (± 1692) 0.99
batch_points_arrow/decode_message_bundles 1581 ns/iter (± 6) 2018 ns/iter (± 8) 0.78
batch_points_arrow/decode_total 351941 ns/iter (± 764) 360005 ns/iter (± 2070) 0.98
arrow_mono_points/insert 7082926772 ns/iter (± 41336212) 7101183936 ns/iter (± 48000479) 1.00
arrow_mono_points/query 1729903 ns/iter (± 10483) 1847772 ns/iter (± 25156) 0.94
arrow_batch_points/insert 3137240 ns/iter (± 12800) 2738045 ns/iter (± 81908) 1.15
arrow_batch_points/query 16598 ns/iter (± 50) 16152 ns/iter (± 40) 1.03
arrow_batch_vecs/insert 45217 ns/iter (± 122) 43144 ns/iter (± 226) 1.05
arrow_batch_vecs/query 389215 ns/iter (± 34999) 389549 ns/iter (± 596) 1.00
tuid/Tuid::random 34 ns/iter (± 0) 34 ns/iter (± 0) 1

Copy link
Member

@emilk emilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful!

crates/re_arrow_store/src/store_write.rs Outdated Show resolved Hide resolved
crates/re_arrow_store/src/store_write.rs Outdated Show resolved Hide resolved
crates/re_arrow_store/src/store_write.rs Show resolved Hide resolved
crates/re_arrow_store/src/store_write.rs Outdated Show resolved Hide resolved
crates/re_log_types/src/data_cell.rs Outdated Show resolved Hide resolved
crates/re_log_types/src/data_cell.rs Show resolved Hide resolved
crates/re_log_types/src/data_cell.rs Show resolved Hide resolved
crates/re_log_types/src/msg_bundle.rs Outdated Show resolved Hide resolved
crates/re_log_types/src/data_cell.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏹 arrow concerning arrow enhancement New feature or request ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants