Make Nested Loop Join more efficient for very small right input

### Is your feature request related to a problem or challenge?

This issue is a summary for https://github.com/apache/datafusion/issues/17488

After [rewriting the NLJ operator](https://github.com/apache/datafusion/pull/16996/files), it has regression on the workload that the right side is very small (NLJ's left input has 1000 rows, right input has 1 rows)

#### Reason for the inefficiency
The high-level execution logic for NLJ is:
```
for each right_batch:
    for each left_row:
        join(left_row, right_batch)
```
the inner-most `join()` function is optimized for large right batch with classical vectorization tricks. For large batch size, the amortized per-row cost will be very row; if this batch has only one row there is nothing to amortize.

#### Implementation complexity consieration
I think making it fast is good to have, but not necessary if it would introduce too many extra implementation complexity. 
The reason is:
- The optimizer will usually ensure the left side is smaller
- Significant regression only happens when the right side is very small like only 1 row, otherwise even the optimizer make the suboptimal decision of join order, the performance should be similar

See the original issue for more background

### Describe the solution you'd like

_No response_

### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make Nested Loop Join more efficient for very small right input #17547

Is your feature request related to a problem or challenge?

Reason for the inefficiency

Implementation complexity consieration

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make Nested Loop Join more efficient for very small right input #17547

Description

Is your feature request related to a problem or challenge?

Reason for the inefficiency

Implementation complexity consieration

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions