Support for correlated subqueries #4017

kokosing · 2015-11-23T14:41:46Z

It relates to subqueries used in IN, EXISTS predicates and scalar subqueries.

kokosing · 2015-11-23T14:43:06Z

Relates to #2878

matthewwardrop · 2016-02-05T17:59:45Z

Greetings all! I note that the referenced issue was closed for 0.132... just wondering if any progress has been made on supporting correlated subqueries?

kbajda · 2016-02-05T18:06:53Z

@matthewwardrop : yes, @kokosing is actively working on this. Expect PRs soon!

kokosing · 2016-02-05T18:09:08Z

see

matthewwardrop · 2016-02-05T18:21:32Z

Excellent news! Thanks kokosing :). Looking forward to it being deployed :).

kokosing · 2016-10-18T06:25:59Z

@martint FYI

Once the initial support correlated subqueries (see https://github.com/prestodb/presto/wiki/Correlated-subqueries) is done, our (roadmap) next goals are:

lateral joins - (Initial support for lateral joins #5879)
support simple scan-filter-project correlated subqueries (Support correlated simple scan-filter-project queries #6383)
implement EXISTS execution as SEMI_JOIN (Execute EXISTS predicate as SEMI JOIN #6384)
support multi column IN predicate (Support multiple columns in IN predicate #6385)

kokosing · 2016-11-16T09:01:38Z

Also this is on our short term road map: #6638

GrigorievNick · 2017-06-07T07:23:13Z

I have case like this

select 
(select username from cassandra.raw.players_by_brand as players where brand='brand_name' and players.bucket = buckets.bIndex)
from unnest(sequence(1, 5)) AS buckets (bIndex) limit 10;

It push me same exception.
Will this be fixed too?

kokosing · 2017-06-07T07:29:32Z

Yes. Your case seems to be a great example to use LATERAL join which is also in flight. I would rewrite your query to something like:

select *
from unnest(sequence(1, 5)) AS buckets (bIndex), 
LATERAL (select username from cassandra.raw.players_by_brand as players where brand='brand_name' and players.bucket = buckets.bIndex)
 limit 10;

brandynabrams · 2017-08-10T18:32:45Z

Hi All,

Out of curiosity, will a coorelated subquery join such as this work in the future as part of your ongoing support for coorelated subqueries?

from user_people up
left join order_applications oa on oa.id = (select id from order_applications where person_id = up.id order by id desc limit 1)

kokosing · 2017-08-11T05:39:04Z

@brandynabrams Currently I do work on subqueries which are using LIMIT (and ORDER BY).

Generally, since all the subqueries in TPCH and TPCDS extending support for subqueries become less important and we switched to things that may affect broader user audience (like join reordering).

Anyway, still I feel somehow personally related and engaged with subqueries, push the support for them in my spare time. Once I finish #8435, I could start work on subqueries with LIMIT.

brandynabrams · 2017-08-11T08:32:19Z

Hey @kokosing,

Thanks for the reply! Yes, I agree tasks like join reordering (which would be awesome) are more important.

I'm very used to Postgres, and subqueries that use limit & order by was used very frequently by myself & my team.

Some of our micro services still use postgres, but our data warehouse (treasure data) uses hive & presto combo.. and this type of unsupported subquery makes its frustrating sometimes.. currently my work around is a window function w/ rank() & partition by or joining the table to itself with a max() function, but these type of subqueries that involve limit / order by seem to be way simpler and provide cleaner code.. (if you use other syntax that is more performant / better to use.. i'm happy to try it out)

But yah, long story short, if you are able to find time to add these types of subqueries.. that would be AWESOME! thanks man.

kokosing · 2018-06-06T10:10:08Z

Todays support for correlated subqueries is decent in Presto. There are still some subquery patterns which are still not supported, but these should be tracked as separate issues.

aandis · 2018-10-23T11:41:54Z

hey @kokosing which connectors support correlated subqueries in presto?

kokosing · 2018-10-23T12:33:51Z

All. Because subqueries are supported by query planner and execution engine.

I kind of do not understand question. If you mean which support subquery pushdown then none.

aandis · 2018-10-24T10:49:13Z

@kokosing yeah. When I run queries like

select foo from table1 where bar in (select bar from table2 where qux = 'qux')

presto does a full table scan of table1. I am looking for ways to avoid that.

kokosing · 2018-10-24T11:00:18Z

Can you please open separate issue for that and please attach output of the EXPLAIN (distributed) of the query

findepi · 2018-10-24T11:25:26Z

select foo from table1 where bar in (select bar from table2 where qux = 'qux')
presto does a full table scan of table1. I am looking for ways to avoid that.

@aandis Presto needs a full table scan because the condition bar = ... is not known upfront.

You have the options:

you can split your query into 2 queries, i.e evaluate select bar from table2 where qux = 'qux' in your program and pass to second query as a parameter; this will allow any kind of optimizations for table1 filter
this use-case will benefit from dynamic filtering Add support for dynamic filtering #7428, but still would not be as performant as the first option

(No need for new separate issue, since this is already covered by #7428 and #8680)

aandis · 2018-10-24T13:11:01Z

@findepi yeah I've tried option 1 in the past. It has it's own problems because the parameter list may exceed max query length/parameter length presto allows. So you have to split it in batches and run multiple queries which is additional non core application logic.
Additionally, more parameters means more planning time #10700 (comment) with a connector like cassandra.

It's nice to know #7428 will fix this. I'll keep an eye out.

kokosing self-assigned this Jun 7, 2017

aandis mentioned this issue May 7, 2018

Cassandra semijoins #10570

Closed

kokosing closed this as completed Jun 6, 2018

mndoping1 mentioned this issue Oct 31, 2022

Decorrelate subqueries containing OrderBy + Limit or Limit Clause #18594

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for correlated subqueries #4017

Support for correlated subqueries #4017

kokosing commented Nov 23, 2015

kokosing commented Nov 23, 2015

matthewwardrop commented Feb 5, 2016

kbajda commented Feb 5, 2016

kokosing commented Feb 5, 2016

matthewwardrop commented Feb 5, 2016

kokosing commented Oct 18, 2016

kokosing commented Nov 16, 2016

GrigorievNick commented Jun 7, 2017 •

edited by kokosing

Loading

kokosing commented Jun 7, 2017

brandynabrams commented Aug 10, 2017 •

edited by kokosing

Loading

kokosing commented Aug 11, 2017

brandynabrams commented Aug 11, 2017

kokosing commented Jun 6, 2018

aandis commented Oct 23, 2018

kokosing commented Oct 23, 2018

aandis commented Oct 24, 2018 •

edited

Loading

kokosing commented Oct 24, 2018

findepi commented Oct 24, 2018 •

edited

Loading

aandis commented Oct 24, 2018

Support for correlated subqueries #4017

Support for correlated subqueries #4017

Comments

kokosing commented Nov 23, 2015

kokosing commented Nov 23, 2015

matthewwardrop commented Feb 5, 2016

kbajda commented Feb 5, 2016

kokosing commented Feb 5, 2016

matthewwardrop commented Feb 5, 2016

kokosing commented Oct 18, 2016

kokosing commented Nov 16, 2016

GrigorievNick commented Jun 7, 2017 • edited by kokosing Loading

kokosing commented Jun 7, 2017

brandynabrams commented Aug 10, 2017 • edited by kokosing Loading

kokosing commented Aug 11, 2017

brandynabrams commented Aug 11, 2017

kokosing commented Jun 6, 2018

aandis commented Oct 23, 2018

kokosing commented Oct 23, 2018

aandis commented Oct 24, 2018 • edited Loading

kokosing commented Oct 24, 2018

findepi commented Oct 24, 2018 • edited Loading

aandis commented Oct 24, 2018

GrigorievNick commented Jun 7, 2017 •

edited by kokosing

Loading

brandynabrams commented Aug 10, 2017 •

edited by kokosing

Loading

aandis commented Oct 24, 2018 •

edited

Loading

findepi commented Oct 24, 2018 •

edited

Loading