-
Notifications
You must be signed in to change notification settings - Fork 837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subtracting Timestamp
from Timestamp
should produce a Duration
(not Timestamp
)
#3964
Comments
Timestamp
from Timestamp
should be Interval
(not Timestamp
)
Should it be an interval, or should it be a duration? The latter is significantly easier to reason about, although it would appear postgres lacks such a construct. |
I guess I was thinking MonthDayNano with |
I would also be fine with |
Timestamp
from Timestamp
should be Interval
(not Timestamp
) Timestamp
from Timestamp
should be Duration
(not Timestamp
)
Timestamp
from Timestamp
should be Duration
(not Timestamp
) Timestamp
from Timestamp
should produce a Duration
(not Timestamp
)
I think this is a good first issue because the semantics are well defined and there are some existing examples |
hi, I would like to take this task |
it is my first rust related PR, just want to highlight my plan for implement this: Within function subtract_dyn(), we currently use typed_dict_math_op() and math_op(). I would like to impement antoher function for timestamp types, called timestamp_math_op(), which deals with timestamp type specific operations. And always returns Duration when calculating between two timestamps. |
Sounds good to me |
Thanks for working on this issue :) |
Can we perhaps revisit this decision in DataFusion, interval types are probably not the correct type to be returning for such operations on timestamps. A duration faithfully represents the difference between two timestamps in absolute time, an interval does not, instead representing a logical quantity the meaning of which depends on the timestamp it is applied to. I would expect timestamp arithmetic to return Duration, with query engines able to insert type coercions if they really need an interval for some reason. Crucially a duration can be converted to an interval, but the reverse transformation is not generally possible |
I think it would be fine to support converting to duration in arrow-rs (and we can convert to Interval in datafusion as needed) The reason we are pushing ahead with interval (rather than interval and duration) in DataFusion is to get something working incrementally without having to sort out all the subtleties with Intervals, Durations, arithmetic and conversions. Then I think over time we can and will add more sophistication (like making the distinction between Duration and Interval and coercing automatically between them) to DataFusion Thus, I suggest we get the kernels correct in arrow-rs (and provide the appropriate casting operations) and then we can upgrade DataFusion to use them. So in this case, let's have timestamp - timestamp produce duration in arrow.rs sounds good. I will also file a ticket about casting to/from Duration and Interval |
I filed #3998 to track casting durations to/from intervals |
Yes, the issue is most DB's (and SQL itself) simply use the timestamp - timestamp = interval pattern. However, from our perspective using durations is fine as long as there is casting/coercing mechanism to take care of the transformation cheaply. |
I think the core problem is that the clear distinction between "duration" and "interval" that is made in Arrow and Rust's standard library is not found in SQL (e.g. there is no SQL duration type). The coercion / casting logic in DataFusion I think is the right place to reconcile the various Arrow types (Intervals with different time units, durations with different time units) So in other words, even though I expect SQL / DataFusion users to mostly work with Intervals, having DataFusion sort out how to call the appropriate interval kernels in arrow-rs with coercion, etc would be the ideal approach |
SGTM |
|
Describe the bug
Subtracting two
Timestamp
columns results in another Timestamp which is not correctTo Reproduce
Which produces
Expected behavior
I expect the output to be an interval of typeInterval(MonthDayNano)
(not Timestamp)Updated (after discussion with @tustvold ) I expect the output to be a duration of type
Duration(unit)
whereunit
is the same as the sourceTimestamp(unit)
(not Timestamp)Additional context
Here is what postgres does:
The text was updated successfully, but these errors were encountered: