Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Duration as the result of timestamp - timestamp rather than Interval #7068

Closed
alamb opened this issue Jul 24, 2023 · 1 comment · Fixed by #6832
Closed

Use Duration as the result of timestamp - timestamp rather than Interval #7068

alamb opened this issue Jul 24, 2023 · 1 comment · Fixed by #6832
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jul 24, 2023

Is your feature request related to a problem or challenge?

Intervals (e.g. 1 month, which is a different number of days depending on the relative dates) and Durations (e.g. 12321 seconds) and arithmetic have (subltely) different semantics

SQL exposes intervals (not Durations), as the INTERVAL type

DataFusion has historically used intervals inconsistently

For example, subtracting timestamps that are one year apart results in an interval of 365 days, not 1 year

select '2022-01-01T12:34:45'::timestamp - '2021-01-01T12:34:45'::timestamp;
+-----------------------------------------------------------+
| Utf8("2022-01-01T12:34:45") - Utf8("2021-01-01T12:34:45") |
+-----------------------------------------------------------+
| 0 years 0 mons 365 days 0 hours 0 mins 0.000000000 secs   |
+-----------------------------------------------------------+
1 row in set. Query took 0.002 seconds.

Subtracting 11 months results in an interval of 334 days, rather than 11 months.

❯ select '2022-01-01T12:34:45'::timestamp - '2021-02-01T12:34:45'::timestamp;
+-----------------------------------------------------------+
| Utf8("2022-01-01T12:34:45") - Utf8("2021-02-01T12:34:45") |
+-----------------------------------------------------------+
| 0 years 0 mons 334 days 0 hours 0 mins 0.000000000 secs   |
+-----------------------------------------------------------+
1 row in set. Query took 0.000 seconds.

It appears that the current code assumes a day is 24 hours (which is not always correct given leap seconds, and other date/time oddities):

❯ select '2022-01-01T12:34:45'::timestamp - '2021-01-01T12:34:45.0432'::timestamp;
+----------------------------------------------------------------+
| Utf8("2022-01-01T12:34:45") - Utf8("2021-01-01T12:34:45.0432") |
+----------------------------------------------------------------+
| 0 years 0 mons 364 days 23 hours 59 mins 59.956800000 secs     |
+----------------------------------------------------------------+
1 row in set. Query took 0.000 seconds.

Describe the solution you'd like

To avoid inconsistencies and make sure the semantics are clear, @tustvold proposes to change timestamp / timestamp arithmetic to use Durations consistently. This will mean subtracting two timestamps will result in an actual Duration array rather than an Interval which really encodes a duration

Describe alternatives you've considered

Leave the existing system as is

Additional context

No response

@jackwener
Copy link
Member

This issue is related with Type Corecion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants