-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
This issue is a summary for #17488
After rewriting the NLJ operator, it has regression on the workload that the right side is very small (NLJ's left input has 1000 rows, right input has 1 rows)
Reason for the inefficiency
The high-level execution logic for NLJ is:
for each right_batch:
for each left_row:
join(left_row, right_batch)
the inner-most join() function is optimized for large right batch with classical vectorization tricks. For large batch size, the amortized per-row cost will be very row; if this batch has only one row there is nothing to amortize.
Implementation complexity consieration
I think making it fast is good to have, but not necessary if it would introduce too many extra implementation complexity.
The reason is:
- The optimizer will usually ensure the left side is smaller
- Significant regression only happens when the right side is very small like only 1 row, otherwise even the optimizer make the suboptimal decision of join order, the performance should be similar
See the original issue for more background
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
tobixdev and jonathanc-n
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request