New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add RFC for JDBC Join Push down #32

Open

Thanzeel-Hassan-IBM wants to merge 2 commits into prestodb:main from Thanzeel-Hassan-IBM:jdbc-join-pushdown

Thanzeel-Hassan-IBM commented Nov 7, 2024

Add RFC for JDBC Join Push down

Internal review done in this PR : #31


          Add RFC for JDBC Join Push down

f0c95ad

* Add RFC for JDBC Join Push down

---------

Co-authored-by: Ajas M <Ajas.M@ibm.com>
Co-authored-by: Haritha K <HARITHA.K@ibm.com>
Co-authored-by: Glerin <Glerin.Pinhero@ibm.com>

Thanzeel-Hassan-IBM mentioned this pull request

Implement Jdbc join pushdown capabilities in presto prestodb/presto#23152

Open

aaneja reviewed

View reviewed changes

Contributor

aaneja left a comment

Reviewed till high level design

RFC-0009-jdbc-join-push-down.md Outdated

Comment on lines 19 to 21

		At present, when a query joins multiple tables, it creates a separate TableScanNode for each table. Each TableScanNode select all the records from that table. The join operation is then executed in-memory in Presto using a JOIN node by applying JoinCriteria, FilterPredicate and other criteria (like order by, limit, etc.).

		However, if the query joins tables from the same JDBC datasource, it would be more efficient to let the datasource handle the join instead of creating a separate TableScanNode for each table and joining them in Presto. If we "Push down" or send these joins to remote JDBC datasource it increases the query performance. i.e., decreases the query execution time. We have seen improvements from 3x to 10x.

Contributor

aaneja Nov 11, 2024

it would be more efficient

This is not always true. Pushing down an expanding Join may be worse for performance.

Contributor

aaneja Nov 11, 2024

decreases the query execution time. We have seen improvements from 3x to 10x.

Can you reword this to be more specific and talk about the improvements in query latency, scanned data size, CPU/disk/network usage ?

Author

Thanzeel-Hassan-IBM Nov 13, 2024

I have added a table which shows the data i got from the presto UI. I dont think it shows Disk and network usage..
Can you take a look ? @aaneja

RFC-0009-jdbc-join-push-down.md Outdated Show resolved Hide resolved

RFC-0009-jdbc-join-push-down.md Outdated Show resolved Hide resolved

RFC-0009-jdbc-join-push-down.md Outdated Show resolved Hide resolved

RFC-0009-jdbc-join-push-down.md


		## Proposed Implementation

		At present, if presto get a join query (from the CLI or UI) which is trying to join tables either from same datasource or from different datasource, it is received as a string formatted sql query. Presto validates the syntax and converts it to Query (Statement) object using presto parser and analyzer. This Query object is converted to presto internal reference architecture called Plan, using its logical and physical optimizers. Finally, this plan is executed by the executor.

Contributor

aaneja Nov 11, 2024

IMO, instead of having the preamble here describing the basics of the parser and planner in this PR, and repeating them in multiple RFC's how about we create a dedicated section in https://prestodb.io/docs/current/ that we can link to ?

RFC-0009-jdbc-join-push-down.md Outdated Show resolved Hide resolved

RFC-0009-jdbc-join-push-down.md Outdated Show resolved Hide resolved

RFC-0009-jdbc-join-push-down.md Outdated Show resolved Hide resolved

RFC-0009-jdbc-join-push-down.md Outdated Show resolved Hide resolved

RFC-0009-jdbc-join-push-down.md Outdated Show resolved Hide resolved

aaneja reviewed

View reviewed changes

RFC-0009-jdbc-join-push-down.md Outdated Show resolved Hide resolved

RFC-0009-jdbc-join-push-down.md Outdated Show resolved Hide resolved

RFC-0009-jdbc-join-push-down.md


		5. Enable presto Join pushdown capabilities by setting the session flag optimizer_inner_join_pushdown_enabled = true.

		## Low level Design

Contributor

aaneja Nov 11, 2024

Please rework this section or remove it. A lot of this is implementation detail. What we should be highlighting is

How we would group tables by connector using the same mechanism in ReorderJoins. Describe how we would limit the grouping to only join sources that are either TableScan or Filter<-TableScan or Project<-Filter<-TableScan
Describe the updates to the SPI to add a table handle or other data structure to represent a 'group' of individual table handles that are joined
Describe how the connectors, and how the JDBC connector in particular can indicate that it wants to 'participate' in this creation of grouped-tables
Describe how the JDBC connector will un-wrap these grouped-tables and build the SQL necessary to achieve the JOIN condition that it would use when querying the JDBC source
Describe what the corner cases are - why we would need a way to add a table alias

Please create smaller sections for these as you see fit

Contributor

aaneja Nov 11, 2024

Additionally, please add a section dedicated to SPI changes - you can add classes, interfaces and types that are being added or removed, and add code-comments on them to describe how they would be used

Author

Thanzeel-Hassan-IBM Nov 13, 2024

Removed some things in that section.. Working on the rest

Author

Thanzeel-Hassan-IBM commented Nov 12, 2024

Updating the RFC with the above review comments.. The main idea is to reduce the verbosity. This level of in depth details of the implementation is not required.

I will make the modifications and make it easier to follow.

Thanzeel-Hassan-IBM force-pushed the jdbc-join-pushdown branch 5 times, most recently from 50814df to a72a706 Compare

November 13, 2024 10:41


          Review comments

f1c2aee

Remove in depth implementation of GroupInnerJoinsByConnector Optimizer

Fix heading

Added explain analyze outputs

remove extra details

added section header

remove pics

Added table for performance difference

Thanzeel-Hassan-IBM force-pushed the jdbc-join-pushdown branch from a72a706 to f1c2aee Compare

November 13, 2024 10:49

prestodb-ci added from:IBM and removed from:IBM labels

ethanyzhang added the from:IBM label

prestodb-ci commented Nov 13, 2024

Saved that user @Thanzeel-Hassan-IBM is from IBM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels