Skip to content
This repository was archived by the owner on Apr 10, 2024. It is now read-only.

First class array/list type #25

Open
wesm opened this issue Sep 17, 2016 · 2 comments
Open

First class array/list type #25

wesm opened this issue Sep 17, 2016 · 2 comments
Labels

Comments

@wesm
Copy link
Owner

wesm commented Sep 17, 2016

Similar to the ARRAY type found in SQL variants with nested types. See also the List type in Apache Arrow.

xref pandas-dev/pandas#8517

@wesm
Copy link
Owner Author

wesm commented Sep 17, 2016

"First class" here means "not implemented using Python lists". You can interpret any array of type T as Array[T] by adding an array of offsets that encode size and position.

For example, the data

[[0, 1, 2],
 [3],
 [],
 [4, 5, 6]]

can be represented compactly as

offsets: [0, 3, 4, 4, 7]
data: [0, 1, 2, 3, 4, 5, 6]

There are other possible representations. This one is good because flattening (for flatmap function or flatten) is zero copy, and it's highly cache-efficient for scanning. Downside is that mutability is more costly. I would argue that we should not be encouraging such structures to be mutated anyway

@chrisaycock
Copy link

I like this idea. Numpy's multi-dimensional requirements makes it really difficult to make an unboxed array of heterogeneously sized arrays.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants