Support zero copy hash repartitioning for Hash Join

### Is your feature request related to a problem or challenge?

Currently `RepartitionExec: partitioning=Hash` will be added whenever a hash 
The benefit is increased parallelism, but at the cost of copying the entire table (in a not-so efficient way).

We should consider lowering the cost of repartitioning by not having to copy the input.

### Describe the solution you'd like

Instead of repartitioning the right side and left side input in `RepartitionExec`, support repartitioning the inputs based on a selection vector.

Instead of `taking` the `RecordBatch`, we can consider doing the following:

* Add a (boolean) selection vector as output column for each output partition. I.e. `true` means the row is selected for the partition.
* The rest of the `RecordBatch` remains unchanged (i.e. no copy).
* CoalesceBatchesExec is no longer needed for the output (reducing another copy) 
* In the hash join algorith,, only try matching input indices for the selection vector for each partition. This seems not that hard as `get_matched_indices` already supports getting an input of input indices + hashes.

Dependencies

- [ ] https://github.com/apache/datafusion/issues/15420

### Describe alternatives you've considered

The partitioning could be done inside the hash join algorithm, however this would add more complexity to each operator (join / aggregates) to do hash-repartitioning inside 

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support zero copy hash repartitioning for Hash Join #15382

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support zero copy hash repartitioning for Hash Join #15382

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions