Skip to content

Incorrect cast of integer columns to utf8 when comparing with utf8 constant #15161

@scsmithr

Description

@scsmithr

Describe the bug

A comparison like column1 < '10' (where column1 is an int64) will cast column1 to utf8 instead of casting the utf8 constant to an integer.

Typically string constants in a sql query are treated as unknown, and preference should be on casting the "unknown" value to a target type.

To Reproduce

DataFusion CLI v46.0.0
> create table t1 as (values (1), (2), (3));
0 row(s) fetched. 
Elapsed 0.026 seconds.

> select * from t1 where column1 < '10';
+---------+
| column1 |
+---------+
| 1       |
+---------+
1 row(s) fetched. 
Elapsed 0.015 seconds.

> select * from t1 where column1 < 'hello';
+---------+
| column1 |
+---------+
| 1       |
| 2       |
| 3       |
+---------+
3 row(s) fetched. 
Elapsed 0.007 seconds.

> select arrow_typeof(column1) from t1 limit 1;
+--------------------------+
| arrow_typeof(t1.column1) |
+--------------------------+
| Int64                    |
+--------------------------+
1 row(s) fetched. 
Elapsed 0.010 seconds.

> explain select * from t1 where column1 < '10';
+---------------+-------------------------------------------------------+
| plan_type     | plan                                                  |
+---------------+-------------------------------------------------------+
| logical_plan  | Filter: CAST(t1.column1 AS Utf8) < Utf8("10")         |
|               |   TableScan: t1 projection=[column1]                  |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192           |
|               |   FilterExec: CAST(column1@0 AS Utf8) < 10            |
|               |     DataSourceExec: partitions=1, partition_sizes=[1] |
|               |                                                       |
+---------------+-------------------------------------------------------+
2 row(s) fetched. 
Elapsed 0.009 seconds.

Expected behavior

column1 not to be cast to a string.

Postgres output:

postgres=# create table t1 as (values (1), (2), (3));
SELECT 3
postgres=# select * from t1 where column1 < '10';
 column1 
---------
       1
       2
       3
(3 rows)

postgres=# select * from t1 where column1 < 'hello';
ERROR:  invalid input syntax for type integer: "hello"
LINE 1: select * from t1 where column1 < 'hello';

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions