Skip to content

The ProjectionPlan::project_batch function is inefficient and noisy #5069

@westonpace

Description

@westonpace

In take workloads we utilize a function called ProjectionPlan::project_batch. This function creates a OneShotExec that reads in the batch and a ProjectExec that applies the projection expressions. This is probably fine if there are actually projection expressions to evaluate. However, if we all we are doing is reordering or dropping columns (or especially if the projection is an identity projections 🤦) we are adding quite a bit of unneeded overhead.

To put this into context, in a recent random access benchmarking effort I found this to be responsible for about 8% of the latency even though there was no actual projection (it was just identity).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions