Skip to content

Optimize filtered SortMergeJoin to avoid producing small/empty batches #14050

@comphead

Description

@comphead

Is your feature request related to a problem or challenge?

Related to #9846

In #9846 there is a couple of tasks to fix the correctness issues for SortMergeJoin with filter clause

As reported in apache/datafusion-comet#1211 (comment) the execution produces a lot of small/empty batches

Describe the solution you'd like

Concat small batches and produces batches equal or near equal to batch_size

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions