Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The expression to get an indexed field is only valid for List types (common_sub_expression_eliminate) #3002

Closed
andygrove opened this issue Aug 1, 2022 · 3 comments · Fixed by #3003
Assignees
Labels
bug Something isn't working optimizer Optimizer rules

Comments

@andygrove
Copy link
Member

Describe the bug

I am trying to perform a trivial join in the CLI using latest from master and I get the following error:

Plan("The expression to get an indexed field is only valid for `List` types")

To Reproduce

$ cat /tmp/a.csv
foo
1

$ cat /tmp/b.csv
b
bar

$ ./target/debug/datafusion-cli

DataFusion CLI v10.0.0
❯ create external table a stored as csv with header row location '/tmp/a.csv';
0 rows in set. Query took 0.016 seconds.
❯ create external table b stored as csv with header row location '/tmp/b.csv';
0 rows in set. Query took 0.009 seconds.
❯ select * from a;
+-----+
| foo |
+-----+
| 1   |
+-----+
1 row in set. Query took 0.011 seconds.
❯ select * from b;
+-----+
| b   |
+-----+
| bar |
+-----+
1 row in set. Query took 0.010 seconds.
❯ select * from a join b on a.foo = b.bar;
Plan("The expression to get an indexed field is only valid for `List` types")

Expected behavior
Query should work

Additional context
None

@andygrove andygrove added bug Something isn't working optimizer Optimizer rules labels Aug 1, 2022
@andygrove
Copy link
Member Author

The query in the repro is invalid so it looks like we need to improve the error reporting here to an "invalid field" error.

If I specify the schema then the query works.

DataFusion CLI v10.0.0
❯ create external table a (foo varchar) stored as csv with header row location '/tmp/a.csv';
0 rows in set. Query took 0.002 seconds.
❯ create external table b (bar varchar) stored as csv with header row location '/tmp/b.csv';
0 rows in set. Query took 0.001 seconds.
❯ select a.foo, b.bar from a join b on a.foo = b.bar;
0 rows in set. Query took 0.038 seconds.

@andygrove
Copy link
Member Author

The issue seems to be related to having relations containing columns with the same name as the relation.

$ cat /tmp/a.csv
foo
1

$ cat /tmp/b.csv
bar
1

DataFusion CLI v10.0.0
❯ create external table a stored as csv with header row location '/tmp/a.csv';
0 rows in set. Query took 0.017 seconds.
❯ create external table b stored as csv with header row location '/tmp/b.csv';
0 rows in set. Query took 0.003 seconds.
❯ create external table bar stored as csv with header row location '/tmp/bar.csv';
IoError(Os { code: 2, kind: NotFound, message: "No such file or directory" })
❯ create external table bar stored as csv with header row location '/tmp/b.csv';
0 rows in set. Query took 0.003 seconds.
❯ select a.foo, b.bar from a join b on a.foo = b.bar;
+-----+-----+
| foo | bar |
+-----+-----+
| 1   | 1   |
+-----+-----+
1 row in set. Query took 0.038 seconds.
❯ select a.foo, bar.bar from a join b on a.foo = bar.bar;
Plan("The expression to get an indexed field is only valid for `List` types")
❯ 

The first query works but the second query fails. The queries are identical except for the name used to register the table. In the second case the table name is the same as the column name in the table.

@waitingkuo
Copy link
Contributor

select b.b from b doesn't work originally, solved by your pull request as well. I've sent a pull request to yours for adding this simple test case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working optimizer Optimizer rules
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants