Skip to content

Conversation

@skyahead
Copy link
Owner

No description provided.

@EricJoy2048
Copy link

EricJoy2048 commented Apr 30, 2020

This is an amazing job, looking forward to being merged to the trunk, we need this feature

@RugratsJ
Copy link

RugratsJ commented May 3, 2020

@skyahead, great job. Has this been fully tested? Can I integrate it into 332 release?

@skyahead
Copy link
Owner Author

skyahead commented May 4, 2020

@RugratsJ @gaojun2048 Thanks guys. The code in its current form can not be merged in to the master branch as it is not complete. It is my first attempt to implement the proposal here: https://github.com/prestosql/presto/wiki/Pushdown-of-complex-operations#aggregation-pushdown.

At best, it has done 1/3 of the total work needed.

In my opinion, doing the other 2/3 involve changes too many places in PrestoSQL internals and can not be done in a short while, but I can not wrong.

Therefore, I have been using this 1/3 implemented code in our environment where most of our queries are hitting S3 files, and I am adding Druid support.

If you guys are interested in trying the code in your work, here are the steps:

  1. Download version 331 and untar it somewhere: here is the URL, https://repo1.maven.org/maven2/io/prestosql/presto-server/331/presto-server-331.tar.gz.

  2. grab the code and compile it local like so: ./mvnw package -pl 'presto-spi, presto-base-jdbc, presto-main, presto-druid' -TC2 -DskipTests -Dmaven.javadoc.skip=true -Dmaven.source.skip=true -Dair.check.skip-all=true

  3. replace the orignal jars under spi, main with what you get above, and use the above druid zip file if you wish to.

Note: using this code means when upstream is up versioned to 333, 334, etc, you have to redo all the codes changes and solve all the conflicts. And so, not a good idea.

I am maintaining our Presto that is already diverged from the upstream, and so I can live with this 1/3 code for a while, and hoping the community can move faster on aggregation pushdown.

To be honest, I am also thinking a lot these days of switching our cluster back to PrestoDB distributions, which has this feature done already. For JDBC, adding aggregation pushdown in prestoDB distributions can be at least 10 times easier than doing aggregation pushdown in the prestoSql distributions. Again, I might be wrong though.

@RugratsJ
Copy link

RugratsJ commented May 4, 2020

@skyahead, thank you for the detailed instructions. Since you are using AWS, instead of using druid, do you think glue + S3 will be a better choice?

Has PrestoDB distributions already done the aggregation pushdown? It's still open prestodb/presto#4839

@skyahead
Copy link
Owner Author

skyahead commented May 5, 2020

@skyahead, thank you for the detailed instructions. Since you are using AWS, instead of using druid, do you think glue + S3 will be a better choice?

Has PrestoDB distributions already done the aggregation pushdown? It's still open prestodb/presto#4839

We are not using Druid to replace anything. Druid is one under storage for Presto and S3 is another. We do not use glus as we run our own Hive metastore.

PrestoDB does have working aggregation pushdown, I run it in our staging env but not in production. I can pushdown aggregations to Druid.

But PrestoDB's aggregation pushdown has NOT been implemented for any JDBC storages yet. If you want, I can help write and we can work together to write one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants