Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using DISTINCT in a filter clause using a subquery #5633

Closed
Tracked by #5483
RustomMS opened this issue Mar 17, 2023 · 2 comments
Closed
Tracked by #5483

Error when using DISTINCT in a filter clause using a subquery #5633

RustomMS opened this issue Mar 17, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@RustomMS
Copy link

Describe the bug

Unable to run the following style query due to error in subquery:

select column1 from t1 where column1 in (select distinct column1 as other from t1 where false);

To Reproduce

$ export DATAFUSION_OPTIMIZER_SKIP_FAILED_RULES=false
$ datafusion-cli
DataFusion CLI v19.0.0
❯ CREATE TABLE IF NOT EXISTS t1 AS VALUES(1,111),(2,222);
0 rows in set. Query took 0.003 seconds.
❯ select * from t1;
+---------+---------+
| column1 | column2 |
+---------+---------+
| 1       | 111     |
| 2       | 222     |
+---------+---------+
2 rows in set. Query took 0.001 seconds.
❯ select column1 from t1;
+---------+
| column1 |
+---------+
| 1       |
| 2       |
+---------+
2 rows in set. Query took 0.001 seconds.
❯ select distinct column1 from t1;
+---------+
| column1 |
+---------+
| 2       |
| 1       |
+---------+
2 rows in set. Query took 0.004 seconds.
❯ select column1 from t1 where column1 in (select column1 as other from t1 where false);
0 rows in set. Query took 0.002 seconds.
❯ select column1 from t1 where column1 in ((select column1 as other from t1 where false));
This feature is not implemented: Physical plan does not support logical expression (<subquery>)
❯ select column1 from t1 where column1 in (select distinct column1 as other from t1 where false);
decorrelate_where_in
caused by
Internal error: Optimizer rule 'decorrelate_where_in' failed due to unexpected error: a projection is required at /Users/rustomms/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-optimizer-19.0.0/src/decorrelate_where_in.rs:147
caused by
Error during planning: Could not coerce into Projection! at /Users/rustomms/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-expr-19.0.0/src/logical_plan/plan.rs:1394. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
❯

Or run the following sql file with datafusion-cli -f test.sql

-- Test queries
CREATE TABLE IF NOT EXISTS t1 AS VALUES(1,111),(2,222);
select * from t1;
select column1 from t1;
select distinct column1 from t1;
-- The first select with subquery statement works with 0 rows the second errors and third error on datafusion 19.0.0
select column1 from t1 where column1 in (select column1 as other from t1 where false);
-- This is fixed in main/20.0.0 by https://github.com/apache/arrow-datafusion/issues/5529
select column1 from t1 where column1 in ((select column1 as other from t1 where false));
select column1 from t1 where column1 in (select distinct column1 as other from t1 where false);

Expected behavior

Can use distinct in a subquery

Additional context

@mingmwang
Copy link
Contributor

I will take a look.

@alamb
Copy link
Contributor

alamb commented Jul 27, 2023

This query appears to work now in datafusion 28.0.0:

(arrow_dev) alamb@MacBook-Pro-8:~$ datafusion-cli  -f /tmp/test.sql
DataFusion CLI v28.0.0
0 rows in set. Query took 0.001 seconds.
+---------+---------+
| column1 | column2 |
+---------+---------+
| 1       | 111     |
| 2       | 222     |
+---------+---------+
2 rows in set. Query took 0.001 seconds.
+---------+
| column1 |
+---------+
| 1       |
| 2       |
+---------+
2 rows in set. Query took 0.000 seconds.
+---------+
| column1 |
+---------+
| 2       |
| 1       |
+---------+
2 rows in set. Query took 0.001 seconds.
0 rows in set. Query took 0.002 seconds.
0 rows in set. Query took 0.001 seconds.
0 rows in set. Query took 0.001 seconds.
(arrow_dev) alamb@MacBook-Pro-8:~$ DATAFUSION_OPTIMIZER_SKIP_FAILED_RULES=false datafusion-cli  -f /tmp/test.sql
DataFusion CLI v28.0.0
0 rows in set. Query took 0.002 seconds.
+---------+---------+
| column1 | column2 |
+---------+---------+
| 1       | 111     |
| 2       | 222     |
+---------+---------+
2 rows in set. Query took 0.001 seconds.
+---------+
| column1 |
+---------+
| 1       |
| 2       |
+---------+
2 rows in set. Query took 0.000 seconds.
+---------+
| column1 |
+---------+
| 1       |
| 2       |
+---------+
2 rows in set. Query took 0.002 seconds.
0 rows in set. Query took 0.002 seconds.
0 rows in set. Query took 0.001 seconds.
0 rows in set. Query took 0.001 seconds.

I bet some of @jackwener 's / @mingmwang 's cleanups have fixed it

@alamb alamb closed this as completed Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants