Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] A collection of Date/Time related open issues #3148

Closed
79 of 85 tasks
waitingkuo opened this issue Aug 15, 2022 · 18 comments
Closed
79 of 85 tasks

[EPIC] A collection of Date/Time related open issues #3148

waitingkuo opened this issue Aug 15, 2022 · 18 comments

Comments

@waitingkuo
Copy link
Contributor

waitingkuo commented Aug 15, 2022

This is followed on the Proposal for Date/Time enhancement #3100
Please edit the list or comment bellow if you find something not listed here.

Date/Time/Timestamp Types & Casting

TimestampTz Date Types & Casting & Coercion

Intervals

See #5753

Arithmetic (including intervals)

Functions

parquet/csv/json

others

@waitingkuo
Copy link
Contributor Author

@alamb @avantgardnerio i created a separated ticket for tracking timestamp related open issues (from both datafusion and arrow-rs)

@waitingkuo waitingkuo changed the title A collection of Date/Time related open issues [EPIC] A collection of Date/Time related open issues Aug 15, 2022
@alamb
Copy link
Contributor

alamb commented Aug 15, 2022

This is amazing @waitingkuo -- thank you

@liukun4515
Copy link
Contributor

@waitingkuo you can @ me to review your pr about the times.
I care about this part

@avantgardnerio
Copy link
Contributor

Proposal:

We attack this in multiple steps parts:

  1. Add timestamp but not timestamp with timezone
  2. Hard code the server "local" time zone to always be UTC, such that show time zone always returns UTC and set time zone 'UTC' is the only set command that will not produce an error
  3. Add timestamp with timezone, but it will always be UTC due to the above
  4. Add support for set time zone 'UTC+8'
  5. Add support for set timezone 'US/Mountain

I think the above plan will allow us to tackle the problem in stages, without ever technically breaking ANSI compliance and reaching full compliance at the end state. Caveat: I'm not sure if we will ever reach the end state, and I'm okay with that personally.

@alamb
Copy link
Contributor

alamb commented Aug 31, 2022

Thank you @avantgardnerio and @waitingkuo for driving this issue. It is awesome to see

@waitingkuo
Copy link
Contributor Author

waitingkuo commented Sep 14, 2022

@alamb @avantgardnerio @liukun4515 i just finished the first 2 from @avantgardnerio's proposal

  1. Add timestamp but not timestamp with timezone
  2. Hard code the server "local" time zone to always be UTC, such that show time zone always returns UTC and set time zone 'UTC' is the only set command that will not produce an error

now we have

DataFusion CLI v12.0.0
❯ show time zone;
+--------------------------------+---------+
| name                           | setting |
+--------------------------------+---------+
| datafusion.execution.time_zone | UTC     |
+--------------------------------+---------+
1 row in set. Query took 0.059 seconds.

i'm just starting the work for timestamp with timezone, things become trickier now

postgresql has this
https://www.postgresql.org/docs/current/functions-formatting.html

to_timestamp ( text, text ) → timestamp with time zone

Converts string to time stamp according to the given format. (See also to_timestamp(double precision) in Table 9.32.)

to_timestamp('05 Dec 2000', 'DD Mon YYYY') → 2000-12-05 00:00:00-05

and this
https://www.postgresql.org/docs/current/functions-datetime.html

to_timestamp ( double precision ) → timestamp with time zone

Convert Unix epoch (seconds since 1970-01-01 00:00:00+00) to timestamp with time zone
to_timestamp(1284352323) → 2010-09-13 04:32:03+00

Looks like its to_timestamp always return timestamp with time zone

In general, do you think we should align these function signatures with postgresql?
There will be more things to change once we have timestamp with non-UTC timezone 💔

BTW
Datafusion's now() output timestamp with time zone which is same as what postgresql has 👍

@alamb
Copy link
Contributor

alamb commented Sep 15, 2022

In general, do you think we should align these function signatures with postgresql?
There will be more things to change once we have timestamp with non-UTC timezone 💔

I think it would be ok to align the functions with postgres as long as there is some clear way to get the current behavior as well (e.g. the query needs to be tweaked or something)

@alamb
Copy link
Contributor

alamb commented Oct 27, 2022

I filed #3980 and added it to this ticket -- I plan to add more date/time open issues here to help track them

This was referenced Nov 1, 2022
@mjvankampen
Copy link

I think #2785 belongs in this list as well!

@waitingkuo
Copy link
Contributor Author

added, thank you @mjvankampen

@alamb
Copy link
Contributor

alamb commented Mar 27, 2023

Filed See #5753 to track work for intervals

@watfordkcf
Copy link

We're ramping up our usage of delta-rs (and with it arrow-datafusion) and would like to see resolved much of the weirdness with dates/times (ignoring the absolute pain Spark brings upstream). Don't want to duplicate effort, so would help with #959 (and #6876) be welcomed?

@alamb
Copy link
Contributor

alamb commented Jul 10, 2023

Hi @watfordkcf -- thanks!

I believe @tustvold is in the middle of cleaning and making consistent the various timestamp arithmetic / kernels / intervals

The largest outstanding gap that I know of is correct timezone handling (basically anything other than UTC timestamps are like to hit corner cases / bugs) -- this is related to #959

In terms of #6876, that would be lovely I think.

@liukun4515
Copy link
Contributor

liukun4515 commented Jul 12, 2023

Do we have any method to convert or cast the integer type/integer expr with the timeunit to the interval?

@waitingkuo @alamb

I want to implement a app or function like this datetime + integer with timeunit, and the timeunit may be year,month,day.

I can't find the any method to convert the integer with timeunit to the interval, but in the PG which has a function make_interval() https://www.postgresql.org/docs/current/functions-datetime.html can do this.

I just can find a way to resolve the issue by using the concat method.

For example: I have a table like below, and want to convert the b with dayunit to interval

❯ \d test
+---------------+--------------+------------+-------------+-----------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
+---------------+--------------+------------+-------------+-----------+-------------+
| datafusion    | public       | test       | a           | Date32    | NO          |
| datafusion    | public       | test       | b           | Int64     | NO          |
+---------------+--------------+------------+-------------+-----------+-------------+
2 rows in set. Query took 0.013 seconds.

Using the concat to get the utf8 expr which can be casted to interval.

❯   select cast(concat(b,' day') as interval), b from test;
+-------------------------------------------------------+---+
| concat(test.b,Utf8(" day"))                           | b |
+-------------------------------------------------------+---+
| 0 years 0 mons 2 days 0 hours 0 mins 0.000000000 secs | 2 |
+-------------------------------------------------------+---+

@alamb
Copy link
Contributor

alamb commented Jul 12, 2023

@liukun4515 there is the interval constant syntax:

DataFusion CLI v27.0.0
❯ select interval '5 days';
+-------------------------------------------------------+
| IntervalMonthDayNano("92233720368547758080")          |
+-------------------------------------------------------+
| 0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------+
1 row in set. Query took 0.025 seconds.

Also casting strings to intervals handle units


❯ select '5 days'::interval;
+-------------------------------------------------------+
| Utf8("5 days")                                        |
+-------------------------------------------------------+
| 0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------+
1 row in set. Query took 0.003 seconds.

Or creating a string via || (which is the same as the concat you mention above)

❯ select (5 || ' days')::interval;
+-------------------------------------------------------+
| Int64(5) || Utf8(" days")                             |
+-------------------------------------------------------+
| 0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------+
1 row in set. Query took 0.004 seconds.

@liukun4515
Copy link
Contributor

@liukun4515 there is the interval constant syntax:

DataFusion CLI v27.0.0
❯ select interval '5 days';
+-------------------------------------------------------+
| IntervalMonthDayNano("92233720368547758080")          |
+-------------------------------------------------------+
| 0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------+
1 row in set. Query took 0.025 seconds.

Also casting strings to intervals handle units


❯ select '5 days'::interval;
+-------------------------------------------------------+
| Utf8("5 days")                                        |
+-------------------------------------------------------+
| 0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------+
1 row in set. Query took 0.003 seconds.

Or creating a string via || (which is the same as the concat you mention above)

❯ select (5 || ' days')::interval;
+-------------------------------------------------------+
| Int64(5) || Utf8(" days")                             |
+-------------------------------------------------------+
| 0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------+
1 row in set. Query took 0.004 seconds.

5|| ' DAY' is same with concat( 5, ' day') ?

I use this method concat to complete my requirements, but the performance may be not better.

#6876 (comment) also provide the thoughts about support the make_interval function

cc @watfordkcf

@alamb
Copy link
Contributor

alamb commented Sep 19, 2023

Wow the list of tickets fixed is quite impressive: image

@alamb
Copy link
Contributor

alamb commented Nov 20, 2023

I have moved all the incomplete items and moved them to #8282 to help track what is remaining. Thanks everyone for the epic work here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants