rowexec: speed up lookup joins when no ordering is required #48117

asubiotto · 2020-04-28T17:16:14Z

Lookup joins currently always output rows based on the lookup input ordering. This can be inefficient if we don't have to do it.

The lookup joiner currently reads a batch of input rows and creates lookup spans based on those rows. These lookup spans need to be ordered properly before they are passed to the kv layer (due to scan resumption behavior and other things I can't remember). Because of this, the lookup join receives rows out of order relative to the lookup side. Once the results are retrieved, the lookup joiner iterates over the lookup rows and looks up the results once more based on the span key.

This can be bad for performance since all results need to be buffered (the first result that needs to be output can be the last result received), and in some cases when the result set is very large, spilled to disk. This ordering is not required in all cases (the optimizer passes this down through reqOrdering in a lookupJoinNode) so we should optimize the lookup joiner to stream results in these cases.

Investigation into #39471 has also shown that a blocker to increasing the lookup batch size is that performance tanks once we spill to disk. If we have a specialized no-ordering lookup join, this won't be a concern, and we can increase the lookup batch size to a larger constant.

The text was updated successfully, but these errors were encountered:

asubiotto added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-execution Relating to SQL execution. labels Apr 28, 2020

asubiotto mentioned this issue Apr 28, 2020

rowexec: investigate and improve lookup join performance #47472

Closed

5 tasks

asubiotto self-assigned this Apr 28, 2020

asubiotto mentioned this issue May 5, 2020

rowexec: make lookup joins with no required ordering more efficient #48439

Merged

craig bot closed this as completed in eb2bd4a May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rowexec: speed up lookup joins when no ordering is required #48117

rowexec: speed up lookup joins when no ordering is required #48117

asubiotto commented Apr 28, 2020

rowexec: speed up lookup joins when no ordering is required #48117

rowexec: speed up lookup joins when no ordering is required #48117

Comments

asubiotto commented Apr 28, 2020