-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arrow2
does _not_ refcount schema metadata
#1805
Labels
🏹 arrow
concerning arrow
📉 performance
Optimization, memory use, etc
⛃ re_datastore
affects the datastore itself
Comments
teh-cmc
added
🏹 arrow
concerning arrow
⛃ re_datastore
affects the datastore itself
📉 performance
Optimization, memory use, etc
labels
Apr 10, 2023
5 tasks
Looks like we might want to pull on this thread: |
This was referenced Apr 18, 2023
Closed
5 tasks
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
🏹 arrow
concerning arrow
📉 performance
Optimization, memory use, etc
⛃ re_datastore
affects the datastore itself
All
arrow2
arrays are defined roughly as the following:When you clone/slice/index an
Array
, you get anotherArray
in roughlyO(1)
thanks to both thevalues
andvalidity
bitmaps being refcounted behind the scenes:Well... not really, turns out the
DataType
is not refcounted, and it can get huge: it's a massive heap-recursive enum potentially filled with strings and such.Say you have a
ListArray
that contains a bunch ofStructArray
s (i.e. a column of component data) and you want to extract references to the individualStructArray
s in that list (i.e. the individualDataCell
s): each of these arrays is now going to carry a full copy of theStructArray
's schema.For tiny
DataCell
s (which are very common in Rerun), the overhead is enormous.The text was updated successfully, but these errors were encountered: