Skip to content

[Parquet] Remove uses of ArrayData in favor of Direct Array construction #9128

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I noticed on #9061 that there is non trivial overhead to array. I am trying to improve make_array in parallel, but @tustvold had an even better idea in #9058 (comment)

My 2 cents is it would be better to move the codepaths relying on ArrayData over to using the typed arrays directly, this should not only cut down on allocations but unnecessary validation and dispatch overheads.

An ArrayData has at least one extra allocation (for the Vec that holds Buffers) as well as a bunch of dynamic function calls. While this overhead is small individually, it is paid for every array so in aggregate it can be substantial

Describe the solution you'd like
Change relying on ArrayData over to using the typed arrays directly, this should not only cut down on allocations but unnecessary validation and dispatch overheads.

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

enhancementAny new improvement worthy of a entry in the changelogperformance

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions