-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support type coercion for timestamp and utf8 #4312
Support type coercion for timestamp and utf8 #4312
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @andre-cc-natzka
here's my comments:
- rebase master, your time32/64 pr is merged, thanks again
- i'll suggest align the text with the prettyprint result (what you see in datafusion-cli)
i.e.
align the result for
❯ select now()::text;
+-----------------------------------+
| now() |
+-----------------------------------+
| 2022-11-21 21:42:52.196874 +00:00 |
+-----------------------------------+
to
❯ select now();
+----------------------------------+
| now() |
+----------------------------------+
| 2022-11-21T21:42:48.800795+00:00 |
+----------------------------------+
and
❯ select now()::timestamp::text;
+----------------------------+
| now() |
+----------------------------+
| 2022-11-21 21:44:48.386022 |
+----------------------------+
1 row in set. Query took 0.003 seconds.
to
❯ select now()::timestamp;
+----------------------------+
| now() |
+----------------------------+
| 2022-11-21T21:44:52.195395 |
+----------------------------+
- delete the generated datafusion.rs in the proto folder, it's generated by the prost crate
https://github.com/tokio-rs/prost
@@ -1555,7 +1555,7 @@ async fn cast_timestamp_to_timestamptz() -> Result<()> { | |||
#[tokio::test] | |||
async fn test_cast_to_time() -> Result<()> { | |||
let ctx = SessionContext::new(); | |||
let sql = "SELECT 0::TIME"; | |||
let sql = "SELECT 0::TIME64"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure why do we have this. but select 0::time64
doesn't work for me
ScalarTime64Value time64_value = 30; | ||
IntervalMonthDayNanoValue interval_month_day_nano = 31; | ||
StructValue struct_value = 32; | ||
ScalarFixedSizeBinary fixed_size_binary_value = 34; | ||
>>>>>>> upstream/master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>>>>>> upstream/master |
message ScalarTime32Value { | ||
oneof value { | ||
int32 time32_second_value = 1; | ||
int32 time32_millisecond_value = 2; | ||
}; | ||
} | ||
|
||
message ScalarTime64Value { | ||
oneof value { | ||
int64 time64_microsecond_value = 1; | ||
int64 time64_nanosecond_value = 2; | ||
}; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you should rebase master, the pr for time32/64 is merged
datafusion/core/tests/sql/select.rs
Outdated
// filtering with Time32 and Time64 types | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not quite sure what it is, i guess we need to delete ths
Marking this PR as draft as it still needs some work prior to passing CI |
…esponding test cases
…//github.com/andre-cc-natzka/arrow-datafusion into Support_type_coercion_for_Timestamp_and_Utf8
Hello @waitingkuo, @alamb. Thanks for your feedback! It's been a long time, I apologize for that. I took some days of vacation actually. I am also sorry for the state of this PR, I should indeed have rebased master in advance. I have been through the process now and I believe the problems pointed out by @waitingkuo have been solved. In fact, the changes I implemented here are very simple, and are limited to the ones in datafusion/expr/src/type_coercion/binary.rs, which aim at enabling type coercion to a timestamp (to nanosecond accuracy, since it is the only one supported by Arrow) when a string and a timestamp (with arbitrary precision) are provided. This could be very useful in my opinion. Thank you again, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me. Thank you @andre-cc-natzka
I would recommend (perhaps as a follow on) to add some SQL level tests but since there is coverage here I think it looks good to me
hi @andre-cc-natzka thank you, it looks good to me. I suggest that add some SQL level tests e.g. ❯ select (timestamp '2000-01-01T00:00:00') = '2000-01-01T00:00:00';
+-----------------------------------------------------------+
| Utf8("2000-01-01T00:00:00") = Utf8("2000-01-01T00:00:00") |
+-----------------------------------------------------------+
| true |
+-----------------------------------------------------------+
1 row in set. Query took 0.002 seconds. and this should raise error until we could cast TimestampTz to Utf8 ❯ select (timestamptz '2000-01-01T00:00:00+00:00') = '2000-01-01T00:00:00+08:00';
NotImplemented("Unsupported CAST from Utf8 to Timestamp(Nanosecond, Some(\"+00:00\"))") |
I plan to write some SQL level tests for this feature as they will help us in IOx. |
Thanks again @andre-cc-natzka -- here are some SQL level tests #4545 |
Benchmark runs are scheduled for baseline = fbadebb and contender = 8547fd8. 8547fd8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Hello again @alamb , @waitingkuo. Thank you very much for your second revision and for merging the PR! Thank you @alamb also for adding the SQL tests that test this new functionality, they look really nice! All the best, |
Thank you for the contribution @andre-cc-natzka |
thank you @andre-cc-natzka |
Which issue does this PR close?
Closes #4311.
Rationale for this change
Currently DataFusion supports type coercion for (
DataType::Time
,DataType::Utf8
) and (DataType::Date
,DataType::Utf8
) pairs, but is still missing type coercion for (DataType::Timestamp
,DataType::Utf8
).What changes are included in this PR?
Two lines of code are added in the
temporal_coercion
function fromdatafusion/expr/src/type_coercion/binary.rs
to account for type coercion for a (DataType::Timestamp
,DataType::Utf8
) pair. The output type isDataType::Timestamp(TimeUnit::Nanosecond, _)
, because it is the only time unit supported by Arrow, which forces us to change the original time unit to nanoseconds. Although not ideal, this is not a problem, as both the left and right hand side of an expression are converted intoDataType::Timestamp(TimeUnit::Nanosecond, _)
and the expression can be properly evaluated.Are these changes tested?
Yes, corresponding test cases are added to the test function
test_type_coercion
fromdatafusion/expr/src/type_coercion/binary.rs
.Are there any user-facing changes?
No.