-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add to_date
function
#9019
Add to_date
function
#9019
Conversation
@@ -1813,6 +1814,7 @@ pub fn parse_expr( | |||
ScalarFunction::StructFun => { | |||
Ok(struct_fun(parse_expr(&args[0], registry)?)) | |||
} | |||
ScalarFunction::ToDate => Ok(struct_fun(parse_expr(&args[0], registry)?)), //TODO: ensure how to fill it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I am not sure how to do with it, I need some suggestion
@Omega359 If it looks ok, I will write test code these days. Thanks very much |
I haven't had time to look it over yet but first thing I did see was the use of Date64 - I can't see any good reason to use that vs Date32 |
Date64 contains more number in my opinion. Maybe Date32 is better? |
@@ -1813,6 +1814,7 @@ pub fn parse_expr( | |||
ScalarFunction::StructFun => { | |||
Ok(struct_fun(parse_expr(&args[0], registry)?)) | |||
} | |||
ScalarFunction::ToDate => Ok(struct_fun(parse_expr(&args[0], registry)?)), //TODO: ensure how to fill it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be
ScalarFunction::ToDate => Ok(struct_fun(parse_expr(&args[0], registry)?)), //TODO: ensure how to fill it | |
ScalarFunction::ToDate => Ok(to_date(parse_expr(&args[0], registry)?)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// fn to_date_impl<T: ArrowTimestampType + ScalarType<i64>>( | ||
// args: &[ColumnarValue], | ||
// ) -> Result<ColumnarValue> { | ||
// } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// fn to_date_impl<T: ArrowTimestampType + ScalarType<i64>>( | |
// args: &[ColumnarValue], | |
// ) -> Result<ColumnarValue> { | |
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Date32 is the way. |
Okey! I will change it to Date32 |
474fb34
to
1a8ff01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Tangruilin I left some minor suggestions.
@@ -1308,6 +1349,40 @@ fn validate_to_timestamp_data_types( | |||
None | |||
} | |||
|
|||
// TODO: 实现这个函数 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: 实现这个函数 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/// to_date SQL function implementation | ||
pub fn to_date_invoke(args: &[ColumnarValue]) -> Result<ColumnarValue> { | ||
if args.is_empty() { | ||
return internal_err!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return internal_err!( | |
return exec_err!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that other functions are all using internal_err. If i need to change all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on #8781 (comment), I vote for exec_err
internal_err: something that wasn't expected/anticipated by the implementation and that is most likely a bug (the error message even encourages users to open a bug report). I user should not be able to trigger this under normal circumstances. Note that I/O errors (or any error that happens due to external systems) do NOT fall under this category (there's an external error variant for that)
exec_err: a processing error that happens during execution, e.g. the user passed malformed arguments to a SQL method, opened a CSV file that is broken, tried to divide an integer by zero. these errors shouldn't happen for a "good" query + "good" data, but since both the query and the data can easily be broken or misaligned, these are common occurrences in ETL systems / databases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So need I change internal_err
in to_timestamp_invoke
to exec_err
?
Or it should be done in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A follow on pr is better
DataType::Int32 | DataType::Int64 => { | ||
cast_column(&args[0], &DataType::Date32, None) | ||
} | ||
DataType::Null | DataType::Float64 => { | ||
cast_column(&args[0], &DataType::Date32, None) | ||
} | ||
DataType::Date32 | DataType::Date64 => { | ||
cast_column(&args[0], &DataType::Date32, None) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataType::Int32 | DataType::Int64 => { | |
cast_column(&args[0], &DataType::Date32, None) | |
} | |
DataType::Null | DataType::Float64 => { | |
cast_column(&args[0], &DataType::Date32, None) | |
} | |
DataType::Date32 | DataType::Date64 => { | |
cast_column(&args[0], &DataType::Date32, None) | |
} | |
DataType::Int32 | |
| DataType::Int64 | |
| DataType::Null | |
| DataType::Float64 | |
| DataType::Date32 | |
| DataType::Date64 => cast_column(&args[0], &DataType::Date32, None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
SELECT to_date('01-14-2023 01:01:30+05:30', '%q', '%d-%m-%Y %H/%M/%S', '%+', '%m-%d-%Y %H:%M:%S%#z'); | ||
---- | ||
2023-01-13 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test for wrong input and return NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the wrong input will return error but not NULL like to_timestamp
|s, format| { | ||
string_to_timestamp_nanos_formatted(s, format) | ||
.map(|n| { | ||
println!("{n}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
println!("{n}"); |
}) | ||
.and_then(|v| { | ||
v.try_into().map_err(|_| { | ||
DataFusionError::NotImplemented("()".to_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFusionError::NotImplemented("()".to_string()) | |
internal_datafusion_err!("Unable to cast to Date32") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
.map(|n| n / (1_000_000 * 24 * 60 * 60 * 1_000)) | ||
.and_then(|v| { | ||
v.try_into().map_err(|_| { | ||
DataFusionError::NotImplemented("()".to_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFusionError::NotImplemented("()".to_string()) | |
internal_datafusion_err!("Unable to cast to Date32") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
740e38c
to
da55c5d
Compare
2023-01-13 | ||
|
||
statement error DataFusion error: Internal error: to_date function unsupported data type at index 1: List | ||
SELECT to_date('2022-08-03T14:38:50+05:30', make_array('%s', '%q', '%d-%m-%Y %H:%M:%S%#z', '%+')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refer to this, this is a example for wrong input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there shjould be more tests of just dates without time or tz component. Need test to verify null, invalid formats, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also please add a test for invalid dates like
to_date('21311111')
I think the tests here could be copied and adapted to cover to_date
quite well:
d141650
to
dc8d355
Compare
datafusion/expr/src/expr_fn.rs
Outdated
nary_scalar_expr!( | ||
ToDate, | ||
to_date, | ||
"converts a string and optional formats to a `Date32`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"converts a string and optional formats to a `Date32`" | |
"converts string to date according to the given format" |
1 => handle::<Date32Type, _, Date32Type>( | ||
args, | ||
|s| { | ||
string_to_timestamp_nanos_shim(s) | ||
.map(|n| n / (1_000_000 * 24 * 60 * 60 * 1_000)) | ||
.and_then(|v| { | ||
v.try_into().map_err(|_| { | ||
internal_datafusion_err!("Unable to cast to Date32 for converting from i64 to i32 failed") | ||
}) | ||
}) | ||
}, | ||
name, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use to_timestamp_impl
to avoid code duplication.
1 => handle::<Date32Type, _, Date32Type>( | |
args, | |
|s| { | |
string_to_timestamp_nanos_shim(s) | |
.map(|n| n / (1_000_000 * 24 * 60 * 60 * 1_000)) | |
.and_then(|v| { | |
v.try_into().map_err(|_| { | |
internal_datafusion_err!("Unable to cast to Date32 for converting from i64 to i32 failed") | |
}) | |
}) | |
}, | |
name, | |
), | |
1 => to_timestamp_impl::<TimestampNanosecondType>(args, name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not do that
for the reason the Date32 need i32 type and time_stamp need i64 type.
to_timestamp_impl did not deal with the converting from i64 to i32
0eade48
to
bbf1b36
Compare
da4d780
to
7290549
Compare
You can review it now |
@Omega359 |
@Omega359 once you have reviewed this PR please let me know and I'll give it a final look too and hopefully merge it Thank you for your work @Tangruilin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you work on this @Tangruilin! I left a few comments, mostly minor but 1 issue around the rustdoc where it'll unfortunately have to have 'ignore' added. You have some repeated tests still in dates.slt that should be cleaned up then I believe it's ready to go.
use datafusion::error::Result; | ||
use datafusion::prelude::*; | ||
|
||
/// This example demonstrates how to use the to_timestamp series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_timestamp -> to_date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// This example demonstrates how to use the to_timestamp series | |
/// This example demonstrates how to use the to_date series |
ctx.register_batch("t", batch)?; | ||
let df = ctx.table("t").await?; | ||
|
||
// use to_timestamp function to convert col 'a' to timestamp type using the default parsing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_timestamp -> to_date, convert 'a to date type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// use to_timestamp function to convert col 'a' to timestamp type using the default parsing | |
// use to_date function to convert col 'a' to timestamp type using the default parsing |
/// use datafusion::arrow::datatypes::{DataType, Field, Schema}; | ||
/// use datafusion::arrow::record_batch::RecordBatch; | ||
/// use datafusion::error::Result; | ||
/// use datafusion::prelude::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we'll have to add an ignore for this because of #9277 See https://github.com/apache/arrow-datafusion/pull/9279/files#diff-67ae8808785b2e651767d7ff67dd7c53be04ca098857b52c82ed19927e071cdaR514 for an example. I believe once the functions are move to the new dataframe-functions crate we should be able to re-enable.
/// ctx.register_batch("t", batch)?; | ||
/// let df = ctx.table("t").await?; | ||
|
||
/// // use to_timestamp function to convert col 'a' to timestamp type using the default parsing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timestamp => date
|n| n, | ||
"to_date", | ||
), | ||
_ => internal_err!("Unsupported 0 argument count for function to_date"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an exec_err (See #9241 for details)
| DataType::Date64 => cast_column(&args[0], &DataType::Date32, None), | ||
DataType::Utf8 => to_date(args), | ||
other => { | ||
internal_err!("Unsupported data type {:?} for function to_date", other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exec_err
e5adc84
to
4262b5c
Compare
@Omega359 The |
I think it's because you are attempting to include |
4262b5c
to
b65125a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR looks good to me -- thank you @Tangruilin and @Omega359
There are several cleanup suggestions left outstanding from @Omega359 related to fixing up repeated tests / comments. Is this something you have time to fix @Tangruilin before we merge the PR? Otherwise we can do it as a follow on PR
use datafusion::error::Result; | ||
use datafusion::prelude::*; | ||
|
||
/// This example demonstrates how to use the to_timestamp series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// This example demonstrates how to use the to_timestamp series | |
/// This example demonstrates how to use the to_date series |
ctx.register_batch("t", batch)?; | ||
let df = ctx.table("t").await?; | ||
|
||
// use to_timestamp function to convert col 'a' to timestamp type using the default parsing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// use to_timestamp function to convert col 'a' to timestamp type using the default parsing | |
// use to_date function to convert col 'a' to timestamp type using the default parsing |
I think i can fix it in this PR |
b7e202f
to
da8c3b7
Compare
Done |
# to_date with invalid formatting | ||
query error input contains invalid characters | ||
SELECT to_date('2020-09-08 12/00/00+00:00', '%c', '%+') | ||
|
||
# to_date with invalid formatting | ||
query error input contains invalid characters | ||
SELECT to_date('2020-09-08 12/00/00+00:00', '%c', '%+') | ||
|
||
# to_date with invalid formatting | ||
query error input contains invalid characters | ||
SELECT to_date('2020-09-08 12/00/00+00:00', '%c', '%+') | ||
|
||
# to_date with invalid formatting | ||
query error input contains invalid characters | ||
SELECT to_date('2020-09-08 12/00/00+00:00', '%c', '%+') | ||
|
||
# to_date with broken formatting | ||
query error bad or unsupported format string | ||
SELECT to_date('2020-09-08 12/00/00+00:00', '%q') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# to_date with invalid formatting | |
query error input contains invalid characters | |
SELECT to_date('2020-09-08 12/00/00+00:00', '%c', '%+') | |
# to_date with invalid formatting | |
query error input contains invalid characters | |
SELECT to_date('2020-09-08 12/00/00+00:00', '%c', '%+') | |
# to_date with invalid formatting | |
query error input contains invalid characters | |
SELECT to_date('2020-09-08 12/00/00+00:00', '%c', '%+') | |
# to_date with invalid formatting | |
query error input contains invalid characters | |
SELECT to_date('2020-09-08 12/00/00+00:00', '%c', '%+') | |
# to_date with broken formatting | |
query error bad or unsupported format string | |
SELECT to_date('2020-09-08 12/00/00+00:00', '%q') |
# to_date with broken formatting | ||
query error bad or unsupported format string | ||
SELECT to_date('2020-09-08 12/00/00+00:00', '%q') | ||
|
||
# to_date with broken formatting | ||
query error bad or unsupported format string | ||
SELECT to_date('2020-09-08 12/00/00+00:00', '%q') | ||
|
||
# to_date with broken formatting | ||
query error bad or unsupported format string | ||
SELECT to_date('2020-09-08 12/00/00+00:00', '%q') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# to_date with broken formatting | |
query error bad or unsupported format string | |
SELECT to_date('2020-09-08 12/00/00+00:00', '%q') | |
# to_date with broken formatting | |
query error bad or unsupported format string | |
SELECT to_date('2020-09-08 12/00/00+00:00', '%q') | |
# to_date with broken formatting | |
query error bad or unsupported format string | |
SELECT to_date('2020-09-08 12/00/00+00:00', '%q') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done,you can review
da8c3b7
to
4f484c9
Compare
Signed-off-by: tangruilin <tang.ruilin@foxmail.com>
4f484c9
to
8ec5100
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @Tangruilin and @Omega359 -- the feature and the review is really appreciated. Nice work
Which issue does this PR close?
Closes #8987 .
Rationale for this change
What changes are included in this PR?
Add to_date function, and change the protobuf file
Are these changes tested?
Not now, but it should be added in this PR.
Are there any user-facing changes?
Add a function named
to_date
, users can useselect to_date(1243124);
to try it.