Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL column identifiers should be converted to lowercase when unquoted #1746

Closed
mkmik opened this issue Feb 4, 2022 · 2 comments · Fixed by #1747
Closed

SQL column identifiers should be converted to lowercase when unquoted #1746

mkmik opened this issue Feb 4, 2022 · 2 comments · Fixed by #1747
Labels
bug Something isn't working datafusion Changes in the datafusion crate

Comments

@mkmik
Copy link
Contributor

mkmik commented Feb 4, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

DataFusion strives to implement a subset of the PostgreSQL SQL dialect where possible

In PostgreSQL all column name identifiers are turned into lowercase, unless they are quoted. Example:

tmp1=> create table blah (Foo int, "Foo" int, "Bar" int);
CREATE TABLE
tmp1=> \d blah
                Table "public.blah"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 foo    | integer |           |          |
 Foo    | integer |           |          |
 Bar    | integer |           |          |

tmp1=> insert into blah values (1,2,3);
INSERT 0 1
tmp1=> select foo, Foo, "Foo", "Bar" from blah;
 foo | foo | Foo | Bar
-----+-----+-----+-----
   1 |   1 |   2 |   3
(1 row)

tmp1=> select Bar from blah;
ERROR:  column "bar" does not exist
LINE 1: select Bar from blah;
               ^
HINT:  Perhaps you meant to reference the column "blah.Bar".

PostgreSQL converts to lowercase all column identifiers. The example above shows how Foo used in the DDL statement creates a column actually named foo and it shows how it can be accessed in a query with Foo (unquoted). An actual "Foo" column happily coexist with foo in the table, but must be referenced with a quoted "Foo"

In Datafusion currently Bar means the same as "Bar" (I'm not providing a screen dump of the current datafusion behaviour out of laziness, I may add it later if turns out to be useful for the discussion).

Describe the solution you'd like

select FoO from my_table;

should be equivalent to

select foo from my_table;

If users want to access a column that is not in lower case, they should explicitly quote the column:

select "FoO" from my_table;
@mkmik
Copy link
Contributor Author

mkmik commented Feb 7, 2022

I mislabeled this as a feature request; I think it should be tracked as a bug.

CC: @alamb

@alamb alamb added bug Something isn't working and removed enhancement New feature or request labels Feb 7, 2022
@alamb
Copy link
Contributor

alamb commented Feb 7, 2022

I mislabeled this as a feature request; I think it should be tracked as a bug.

Changed -- thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants