-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement physical plan for EXISTS subquery #123
Comments
Comment from Andy Grove(andygrove) @ 2020-12-31T19:42:45.132+0000: The example given here is a correlated subquery that can be translated into a join. Here is a random stackoverflow discussion on this for reference (I have not reviewed it) https://stackoverflow.com/questions/1772609/procedurally-transform-subquery-into-join |
In case anyone is curious -- we support correlated versions of these queries (via a join) but if there is no correlation (not super useful) we do not ❯ create table foo as select * from (values (1), (2), (NULL)) as sql
;
0 rows in set. Query took 0.022 seconds.
3 rows in set. Query took 0.007 seconds.
❯ create table bar as select * from (values (1), (2), (NULL)) as sql;
0 rows in set. Query took 0.000 seconds.
❯ select * from foo where exists (select column1 from bar);
NotImplemented("Physical plan does not support logical expression EXISTS (<subquery>)")
❯ select * from foo where exists (select column1 from bar where foo.column1 = bar.column1);
+---------+
| column1 |
+---------+
| 2 |
| 1 |
+---------+ |
> explain select * from foo where exists (select column1 from bar);
+---------------+-----------------------------------------------------+
| plan_type | plan |
+---------------+-----------------------------------------------------+
| logical_plan | LeftSemi Join: |
| | TableScan: foo projection=[column1] |
| | SubqueryAlias: __correlated_sq_1 |
| | TableScan: bar projection=[] |
| physical_plan | NestedLoopJoinExec: join_type=RightSemi |
| | DataSourceExec: partitions=1, partition_sizes=[1] |
| | DataSourceExec: partitions=1, partition_sizes=[1] |
| | |
+---------------+-----------------------------------------------------+
2 row(s) fetched.
Elapsed 0.007 seconds.
> select * from foo where exists (select column1 from bar);
+---------+
| column1 |
+---------+
| 1 |
| 2 |
| NULL |
+---------+
3 row(s) fetched.
Elapsed 0.006 seconds. I get this on |
Looks good -- thanks for checking @logan-keede -- we can open a new issue if we find another hole |
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-10819
The TPC-H queries include use of the EXISTS which is used to test for the existence of any record in a subquery. For example:
The text was updated successfully, but these errors were encountered: