Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change coerced type for comparison between timestamp with date to timestamp #4761

Closed
liukun4515 opened this issue Dec 28, 2022 · 12 comments · Fixed by #5140
Closed

Change coerced type for comparison between timestamp with date to timestamp #4761

liukun4515 opened this issue Dec 28, 2022 · 12 comments · Fixed by #5140

Comments

@liukun4515
Copy link
Contributor

    I try the comparison timestamp with date in the Spark, and find the coerced type is timestamp.
spark-sql> explain extended select now()>cast('2000-01-01' as date);
== Parsed Logical Plan ==
'Project [unresolvedalias(('now() > cast(2000-01-01 as date)), None)]
+- OneRowRelation

== Analyzed Logical Plan ==
(now() > CAST(2000-01-01 AS DATE)): boolean
Project [(now() > cast(cast(2000-01-01 as date) as timestamp)) AS (now() > CAST(2000-01-01 AS DATE))#49]
+- OneRowRelation

== Optimized Logical Plan ==
Project [true AS (now() > CAST(2000-01-01 AS DATE))#49]
+- OneRowRelation

== Physical Plan ==
*(1) Project [true AS (now() > CAST(2000-01-01 AS DATE))#49]
+- *(1) Scan OneRowRelation[]

cc @comphead @alamb

But current implementation of coerced type is date, we can fix this in the follow up PR.

Originally posted by @liukun4515 in #4741 (comment)

@alamb
Copy link
Contributor

alamb commented Dec 28, 2022

I agree we should coerce Date --> Timestamp rather than Timestamp -> Date

This will ensure that no data is truncated during coercion, consistent with the rest of the rules

Date to Timestamp is lossless, However, going from Timestamp --> Date would truncate the precision of the original timestamp:

  • Date --> Timestamp --> Date: 2022-12-28 --> 2022-12-28 00:00:00 --> 2022-12-28
  • Timestamp --> Date --> timestamp: 2022-12-28 12:34:56 --> 2022-12-28 --> 2022-12-28 00:00:00

@alamb alamb changed the title The coerced type for comparison between timestamp with date Change coerced type for comparison between timestamp with date to timestamp Dec 28, 2022
@comphead
Copy link
Contributor

@alamb @liukun4515 I can take this ticket
Reg to conversion Date --> Timestamp --> Date I assume the timezone is the same, lets say by default its UTC or timezone on the machine, but it has to be the same to prevent below
Date --> Timestamp --> Date: 2022-12-28 --> 2022-12-28 00:00:00-08.00 --> 2022-12-27

@alamb
Copy link
Contributor

alamb commented Dec 28, 2022

I assume the timezone

Yes I think that is a good assumption

@liukun4515
Copy link
Contributor Author

we have the datafusion.execution.time_zone for the default timezone in the execution.

@liukun4515
Copy link
Contributor Author

@alamb @liukun4515 I can take this ticket Reg to conversion Date --> Timestamp --> Date I assume the timezone is the same, lets say by default its UTC or timezone on the machine, but it has to be the same to prevent below Date --> Timestamp --> Date: 2022-12-28 --> 2022-12-28 00:00:00-08.00 --> 2022-12-27

It's better to do the investigation about the date/timestamp and timezone before coding.

@comphead
Copy link
Contributor

comphead commented Dec 29, 2022

@liukun4515 reg to postgres the conversion happens through timestamp, opposite to spark

explain verbose SELECT now() = '2020-1-1'

Output: (now() = '2020-01-01 00:00:00+00'::timestamp with time zone)

Edit: this is for string literal for date it is

explain verbose SELECT now() = '2020-1-1'::date

Output: (now() = '2020-01-01'::date)

@liukun4515
Copy link
Contributor Author

@liukun4515 reg to postgres the conversion happens through timestamp, opposite to spark

explain verbose SELECT now() = '2020-1-1'

Output: (now() = '2020-01-01 00:00:00+00'::timestamp with time zone)

Edit: this is for string literal for date it is

explain verbose SELECT now() = '2020-1-1'::date

Output: (now() = '2020-01-01'::date)

Thanks for your effort for that, but I need some time to investigate time/data in PG and spark.

I will reply and summary the behavior of them in a few days.

@comphead
Copy link
Contributor

comphead commented Dec 30, 2022

Thanks @liukun4515, I can take some part of investigations if you share whats your plan.
From my side I tried to investigate what is the result type between timestamptz and date in PG but thats not so obviuos.
verbose plan doesn't show the cast.

There is a an internal cast table which registers type_from, type_to, cast_function. But these casts registered for timestamptz -> date, and opposite date -> timestamptz.
What I was not able to find is what exact cast is picked on query like select '2021-1-1'::date = now()

Some helpful queries below:

-- 1184 timestamptz
-- 1082 date 

--SELECT * from pg_cast where castsource = 1184
--SELECT oid, typname FROM pg_type WHERE oid = pg_typeof('2020-1-1'::date);

explain verbose select '2021-1-1'::date = now()

@comphead
Copy link
Contributor

comphead commented Jan 3, 2023

@liukun4515 another query in PG I believe give some clue

select pg_typeof(least(now(), '2021-1-1'::date))

pg_typeof
--
timestamp with time zone

That presumes the cast happens through timestamp as you and @alamb deducted. The same behaviour in spark

scala> spark.sql("select least(current_timestamp, date '2021-1-1')").printSchema
root
 |-- least(current_timestamp(), DATE '2021-01-01'): timestamp (nullable = false)

let me know if you think its enough to make a PR

@comphead
Copy link
Contributor

Waiting apache/arrow-rs#3584

@liukun4515
Copy link
Contributor Author

Waiting apache/arrow-rs#3584

can you point out the issue or the pr depended in the arrow-rs?

@comphead
Copy link
Contributor

comphead commented Feb 1, 2023

Waiting apache/arrow-rs#3584

can you point out the issue or the pr depended in the arrow-rs?

#5140 is local PR waiting for my changes in arrow-rs to support date to ts cast. Now its done and timestamp and date can be coerced having timestamp as a common type, the same as PG or Spark does

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants