Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing a Timestamp to a Date32 fails #4644

Closed
Tracked by #3148
crepererum opened this issue Dec 15, 2022 · 20 comments · Fixed by #4726 or #5140
Closed
Tracked by #3148

Comparing a Timestamp to a Date32 fails #4644

crepererum opened this issue Dec 15, 2022 · 20 comments · Fixed by #4726 or #5140
Labels
bug Something isn't working

Comments

@crepererum
Copy link
Contributor

Describe the bug
Comparing a Timestamp to a Date32 fails.

To Reproduce
Add the following test to timestamps.slt:

# Test that we can compare a timestamp to a string casted to a date
query C rowsort
select * from foo where ts > '2000-01-01'::date;
----
2 2000-02-01T00:00:00
3 2000-03-01T00:00:00

This fails w/:

[timestamps.slt] Running query: "select * from foo where ts > '2000-01-01'::date;"
Error: SqlLogicTest(query failed: DataFusion error: Internal error: The type of Timestamp(Nanosecond, None) > Date32 of binary physical should be same. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Expected behavior
It works.

Additional context
Tested on 5c558e9 .

This is likely due to a like in temporal_coercion as well as this workaround:

https://github.com/apache/arrow-datafusion/blob/5c558e9f6b3fe29ef0d5f78ec9465852d020d989/datafusion/optimizer/src/type_coercion.rs#L232-L239

Also see #3419.

@crepererum crepererum added the bug Something isn't working label Dec 15, 2022
@comphead
Copy link
Contributor

@crepererum @waitingkuo I can take that as a next ticket related to timestamp issues

@waitingkuo
Copy link
Contributor

thank you @comphead

do you plan to do Timestamp without Timezone only or both Timestamp and TImestampTz?

@waitingkuo
Copy link
Contributor

fyi

this is straightforward

willy=# select timestamp '2000-01-01T00:00:00.00000' = date '2000-01-01';
 ?column? 
----------
 t
(1 row)

and there're more to consider for timestamptz (i'm in UTC+8)

willy=# select timestamptz '2000-01-01T00:00:00.00000+00' = date '2000-01-01';
 ?column? 
----------
 f
(1 row)
willy=# select timestamptz '1999-12-31T16:00:00.00000+00' = date '2000-01-01';
 ?column? 
----------
 t
(1 row)
willy=# select timestamptz '2000-01-01T00:00:00.00000+08' = date '2000-01-01';
 ?column? 
----------
 t
(1 row)

@comphead
Copy link
Contributor

Simpler reproduce case

❯ select * from (select now() as n) a where n = '2000-01-01'::date;
Internal("The type of Timestamp(Nanosecond, Some(\"+00:00\")) = Date32 of binary physical should be same")

@comphead
Copy link
Contributor

@waitingkuo those ts to date tests, did you run in datafusion cli?
I'm going through some final tests and cannot reproduce your test

❯ select timestamptz '2000-01-01T00:00:00.000000+00' = date '2000-01-01';
Plan("'Timestamp(Nanosecond, Some(\"+00:00\")) = Date32' can't be evaluated because there isn't a common type to coerce the types to")

I'm running it on master branch

@waitingkuo
Copy link
Contributor

@comphead sorry i didn't mention it clearly. it's the behavior for postgresql

@liukun4515
Copy link
Contributor

liukun4515 commented Dec 27, 2022

I want to reopen this issue to discuss the implementation of the comparing between date and timestamp data type.

cc @waitingkuo @alamb @crepererum

I have comments about #4726 (comment)

  1. type coercion should be in the right phase, and not be in the physical phase
  2. what is the the common type/ result type for comparing timestamp and date

@liukun4515 liukun4515 reopened this Dec 27, 2022
@liukun4515
Copy link
Contributor

Describe the bug Comparing a Timestamp to a Date32 fails.

To Reproduce Add the following test to timestamps.slt:

# Test that we can compare a timestamp to a string casted to a date
query C rowsort
select * from foo where ts > '2000-01-01'::date;
----
2 2000-02-01T00:00:00
3 2000-03-01T00:00:00

This fails w/:

[timestamps.slt] Running query: "select * from foo where ts > '2000-01-01'::date;"
Error: SqlLogicTest(query failed: DataFusion error: Internal error: The type of Timestamp(Nanosecond, None) > Date32 of binary physical should be same. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Expected behavior It works.

Additional context Tested on 5c558e9 .

This is likely due to a like in temporal_coercion as well as this workaround:

https://github.com/apache/arrow-datafusion/blob/5c558e9f6b3fe29ef0d5f78ec9465852d020d989/datafusion/optimizer/src/type_coercion.rs#L232-L239

This rule will not affect the comparison, we just need to support them in the temporal_coercion

Also see #3419.

@liukun4515
Copy link
Contributor

we need to try to make align with behavior of PG

@liukun4515
Copy link
Contributor

Do you have time to investigate the behavior for the comparison between timestamp with date? Do we need to consider the timezone for the comparison or not? @crepererum

@alamb
Copy link
Contributor

alamb commented Dec 27, 2022

Do you have time to investigate the behavior for the comparison between timestamp with date?

The arrow-rs kernel appears to support casting to/from timestamp without timezone (timezone is None):

https://github.com/apache/arrow-rs/blob/1d0abfafe0da7c28b562fa0ba8c65a10b65a0821/arrow-cast/src/cast.rs#L257-L265

my reading of the code is that if the timestamp has a timezone the arrow kernel will fail (can not cast types)

@comphead
Copy link
Contributor

@alamb @liukun4515 thats right, we can cast from timestamp to date, as it timestamp is more accurate datatype

@crepererum
Copy link
Contributor Author

I think we shouldn't cast timezones if both inputs have different timezones. If both have the same it's fine and if one has a timezone and the other one hasn't it's probably also fine (one is concrete and the other one is "don't care").

@comphead
Copy link
Contributor

comphead commented Jan 3, 2023

This is also good point to cast/compare timestamp and timestamptz

@comphead
Copy link
Contributor

comphead commented Jan 6, 2023

I'm still debugging this. Looks like logical plan doesn't include an extra cast in type_coercion optimizer rule for dates/ts binary ops

@liukun4515
Copy link
Contributor

I'm still debugging this. Looks like logical plan doesn't include an extra cast in type_coercion optimizer rule for dates/ts binary ops

Does you have test the comparison in the PG between timestamp with date?

@comphead
Copy link
Contributor

comphead commented Jan 7, 2023

I'm still debugging this. Looks like logical plan doesn't include an extra cast in type_coercion optimizer rule for dates/ts binary ops

Does you have test the comparison in the PG between timestamp with date?

some investigation are here #4761 (comment) and #4761 (comment)

@comphead
Copy link
Contributor

comphead commented Jan 9, 2023

This test works, it adds extra cast to Equal condition

    #[test]
    fn binary_op_date32_eq_ts() -> Result<()> {
        let expr = cast(lit("1998-03-18"), DataType::Date32).eq(
            cast(lit("1998-03-18"), DataType::Timestamp(arrow::datatypes::TimeUnit::Nanosecond, None)));
        let empty = Arc::new(LogicalPlan::EmptyRelation(EmptyRelation {
            produce_one_row: false,
            schema: Arc::new(DFSchema::empty()),
        }));
        let plan = LogicalPlan::Projection(Projection::try_new(vec![expr], empty)?);
        let expected =
            "Projection: CAST(CAST(Utf8(\"1998-03-18\") AS Date32) AS Timestamp(Nanosecond, None)) = CAST(Utf8(\"1998-03-18\") AS Timestamp(Nanosecond, None))\n  EmptyRelation";
        assert_optimized_plan_eq(&plan, expected)?;
        Ok(())
    }

But this still fails.

    let sql = "explain verbose select '1998-03-18'::timestamp = '1998-03-18'::date";

Debugging

@comphead
Copy link
Contributor

comphead commented Jan 10, 2023

I found something weird. first logical plan failed silently on cast. second is arrow cast acts not consistently on the same inputs. so if I run through first test

[datafusion/expr/src/expr_schema.rs:268] &this_type = Date32
[datafusion/expr/src/expr_schema.rs:269] &cast_to_type = Timestamp(
    Nanosecond,
    None,
)
[datafusion/expr/src/expr_schema.rs:270] can_cast_types(&this_type, cast_to_type) = true

but if I run second test

[datafusion/expr/src/expr_schema.rs:268] &this_type = Date32
[datafusion/expr/src/expr_schema.rs:269] &cast_to_type = Timestamp(
    Nanosecond,
    None,
)
[datafusion/expr/src/expr_schema.rs:270] can_cast_types(&this_type, cast_to_type) = false

I'm looking what in arrow-rs can cause such non determined behaviuor
@liukun4515 @crepererum @alamb

Edit: looks like cargo clean helped to make the second test finally work. Sorry, facepalm. Creating PR soon

@comphead
Copy link
Contributor

comphead commented Jan 12, 2023

Waitin arrow-rs 32.0.0 to proceed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants