Aggregation pushdown #1

skyahead · 2020-04-12T18:06:24Z

No description provided.

…ode and InternalPlanVisitor

Since the introduction of the applyXXX methods, a TableScan represents a subquery, not just a raw table. It is possible that such a subquery might have no columns due to a projection being applied by the optimizer.

EricJoy2048 · 2020-04-30T06:16:54Z

This is an amazing job, looking forward to being merged to the trunk, we need this feature

RugratsJ · 2020-05-03T03:25:46Z

@skyahead, great job. Has this been fully tested? Can I integrate it into 332 release?

skyahead · 2020-05-04T17:48:40Z

@RugratsJ @gaojun2048 Thanks guys. The code in its current form can not be merged in to the master branch as it is not complete. It is my first attempt to implement the proposal here: https://github.com/prestosql/presto/wiki/Pushdown-of-complex-operations#aggregation-pushdown.

At best, it has done 1/3 of the total work needed.

In my opinion, doing the other 2/3 involve changes too many places in PrestoSQL internals and can not be done in a short while, but I can not wrong.

Therefore, I have been using this 1/3 implemented code in our environment where most of our queries are hitting S3 files, and I am adding Druid support.

If you guys are interested in trying the code in your work, here are the steps:

Download version 331 and untar it somewhere: here is the URL, https://repo1.maven.org/maven2/io/prestosql/presto-server/331/presto-server-331.tar.gz.
grab the code and compile it local like so: ./mvnw package -pl 'presto-spi, presto-base-jdbc, presto-main, presto-druid' -TC2 -DskipTests -Dmaven.javadoc.skip=true -Dmaven.source.skip=true -Dair.check.skip-all=true
replace the orignal jars under spi, main with what you get above, and use the above druid zip file if you wish to.

Note: using this code means when upstream is up versioned to 333, 334, etc, you have to redo all the codes changes and solve all the conflicts. And so, not a good idea.

I am maintaining our Presto that is already diverged from the upstream, and so I can live with this 1/3 code for a while, and hoping the community can move faster on aggregation pushdown.

To be honest, I am also thinking a lot these days of switching our cluster back to PrestoDB distributions, which has this feature done already. For JDBC, adding aggregation pushdown in prestoDB distributions can be at least 10 times easier than doing aggregation pushdown in the prestoSql distributions. Again, I might be wrong though.

RugratsJ · 2020-05-04T23:12:37Z

@skyahead, thank you for the detailed instructions. Since you are using AWS, instead of using druid, do you think glue + S3 will be a better choice?

Has PrestoDB distributions already done the aggregation pushdown? It's still open prestodb/presto#4839

skyahead · 2020-05-05T00:04:57Z

@skyahead, thank you for the detailed instructions. Since you are using AWS, instead of using druid, do you think glue + S3 will be a better choice?

Has PrestoDB distributions already done the aggregation pushdown? It's still open prestodb/presto#4839

We are not using Druid to replace anything. Druid is one under storage for Presto and S3 is another. We do not use glus as we run our own Hive metastore.

PrestoDB does have working aggregation pushdown, I run it in our staging env but not in production. I can pushdown aggregations to Druid.

But PrestoDB's aggregation pushdown has NOT been implemented for any JDBC storages yet. If you want, I can help write and we can work together to write one.

skyahead and others added 18 commits April 9, 2020 10:50

move PlanNode and PlanVisitor from main to SPI, created InternalPlanN…

d2832c3

…ode and InternalPlanVisitor

move AggregationNode to SPI

a3a2ec9

aggregation pushdown

8d44652

change AggregationApplicationResult

9fc7694

connector level aggregation pushdown is ok now

cfcab40

pushing down aggregations to tablehandle

621f127

pushing down grouping sets

92672ba

fixing tests

320d520

remove a dup dep

ca48bf9

adding presto druid

e6e90e8

fixing predictpushdown for jdbc

57f4fee

fixing limit push down to table scan

e924f6a

formatting

5526b74

adding pushTopNIntoTableScan

3de0ebd

more for pushTopNIntoTableScan

a3ab5fa

Allow table metadata with no columns

96a9136

Since the introduction of the applyXXX methods, a TableScan represents a subquery, not just a raw table. It is possible that such a subquery might have no columns due to a projection being applied by the optimizer.

Implement applyProjection for JDBC connectors

0a966c2

sorting plan optimizers to push down them all

6cd6aab

use list to keep column ordering

6318029

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregation pushdown #1

Aggregation pushdown #1

Uh oh!

skyahead commented Apr 12, 2020

Uh oh!

EricJoy2048 commented Apr 30, 2020 •

edited

Loading

Uh oh!

RugratsJ commented May 3, 2020

Uh oh!

skyahead commented May 4, 2020 •

edited

Loading

Uh oh!

RugratsJ commented May 4, 2020

Uh oh!

skyahead commented May 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Aggregation pushdown #1

Are you sure you want to change the base?

Aggregation pushdown #1

Uh oh!

Conversation

skyahead commented Apr 12, 2020

Uh oh!

EricJoy2048 commented Apr 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RugratsJ commented May 3, 2020

Uh oh!

skyahead commented May 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RugratsJ commented May 4, 2020

Uh oh!

skyahead commented May 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

EricJoy2048 commented Apr 30, 2020 •

edited

Loading

skyahead commented May 4, 2020 •

edited

Loading