Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rowexec: speed up lookup joins when no ordering is required #48117

Closed
asubiotto opened this issue Apr 28, 2020 · 0 comments · Fixed by #48439
Closed

rowexec: speed up lookup joins when no ordering is required #48117

asubiotto opened this issue Apr 28, 2020 · 0 comments · Fixed by #48439
Assignees
Labels
A-sql-execution Relating to SQL execution. C-performance Perf of queries or internals. Solution not expected to change functional behavior.

Comments

@asubiotto
Copy link
Contributor

Lookup joins currently always output rows based on the lookup input ordering. This can be inefficient if we don't have to do it.

The lookup joiner currently reads a batch of input rows and creates lookup spans based on those rows. These lookup spans need to be ordered properly before they are passed to the kv layer (due to scan resumption behavior and other things I can't remember). Because of this, the lookup join receives rows out of order relative to the lookup side. Once the results are retrieved, the lookup joiner iterates over the lookup rows and looks up the results once more based on the span key.

This can be bad for performance since all results need to be buffered (the first result that needs to be output can be the last result received), and in some cases when the result set is very large, spilled to disk. This ordering is not required in all cases (the optimizer passes this down through reqOrdering in a lookupJoinNode) so we should optimize the lookup joiner to stream results in these cases.

Investigation into #39471 has also shown that a blocker to increasing the lookup batch size is that performance tanks once we spill to disk. If we have a specialized no-ordering lookup join, this won't be a concern, and we can increase the lookup batch size to a larger constant.

@asubiotto asubiotto added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-execution Relating to SQL execution. labels Apr 28, 2020
@asubiotto asubiotto self-assigned this Apr 28, 2020
@craig craig bot closed this as completed in eb2bd4a May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant