Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rowexec: investigate and improve lookup join performance #47472

Closed
5 tasks done
asubiotto opened this issue Apr 14, 2020 · 5 comments
Closed
5 tasks done

rowexec: investigate and improve lookup join performance #47472

asubiotto opened this issue Apr 14, 2020 · 5 comments
Labels
A-sql-execution Relating to SQL execution. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) meta-issue Contains a list of several other issues.

Comments

@asubiotto
Copy link
Contributor

asubiotto commented Apr 14, 2020

Performance improvements

General improvements

Possibly out of scope but worth mentioning:

Explore better parallelization. A single lookup join is planned on the leaseholder for the bigger table. These rows might have matches on different nodes. Routing rows by expected lookup range location could allow us to parallelize lookups. This might not be something we do this release, but understanding the solution space will allow us to formulate specific work items. A less invasive change is to bucketing batches by expected lookup node #34997, which would allow us to reduce the total number of round trips.

@asubiotto asubiotto added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) meta-issue Contains a list of several other issues. A-sql-execution Relating to SQL execution. labels Apr 14, 2020
@asubiotto
Copy link
Contributor Author

Added #48117 and #48118 which were found while investigating #39471

@asubiotto
Copy link
Contributor Author

Unchecked "better parallelization". It's always been out of scope and might be easier to do once #47473 is complete.

@asubiotto
Copy link
Contributor Author

Also created a more specific issue to track join reader left semi/anti joins and linked that in the checkbox.

@nvanbenschoten
Copy link
Member

It turns out that lookup join performance is extremely important for TPC-E. Many of its transactions require parallel chains of point lookups, which can be expressed as multi-way lookup joins. So far, "vectorizing" these point lookups into lookup joins has improved performance significantly, but I'm sure there's room for improvement.

All that goes to say – TPC-E would be another good testbed for you to take a look at once it's a little more stable and try out changes to while working on this issue.

@asubiotto
Copy link
Contributor Author

Closing this issue as we improved/investigated the items in the list (refer to specific issues for more information). We didn't benchmark TPC-E but benchmarking/investigation would be interesting to do here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) meta-issue Contains a list of several other issues.
Projects
None yet
Development

No branches or pull requests

2 participants