20224: fix: Fix scalar broadcast for to_timestamp()#229
20224: fix: Fix scalar broadcast for to_timestamp()#229martin-augment wants to merge 1 commit intomainfrom
Conversation
When to_timestamp() was invoked with a scalar Float64 and an array of
strings, the previous coding neglected to broadcast the scalar to the
array properly when producing the return value. That is, a query like
`SELECT to_timestamp(123.5, t.x) FROM t` would result in:
Internal error: UDF to_timestamp returned a different number of rows
than expected
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a bug in the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughThis pull request refactors the ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Pull Request Review: Fix scalar broadcast for to_timestamp()SummaryThis PR fixes a bug where Code Quality: ✅ GoodStrengths:
Style Observations:
Potential Issues:
|
There was a problem hiding this comment.
Code Review
This pull request fixes an issue with scalar broadcast for to_timestamp() when using float types. The changes correctly handle ColumnarValue::Scalar for Float16, Float32, and Float64 inputs, and new tests are added to cover these cases. The logic for these float types is quite similar, and I've suggested a refactoring to reduce code duplication. Additionally, several test files have had trailing newlines removed, which is a minor formatting improvement.
| Float16 => match &args[0] { | ||
| ColumnarValue::Scalar(ScalarValue::Float16(value)) => { | ||
| let timestamp_nanos = | ||
| value.map(|v| (v.to_f64() * 1_000_000_000.0) as i64); | ||
| Ok(ColumnarValue::Scalar(ScalarValue::TimestampNanosecond( | ||
| timestamp_nanos, | ||
| tz, | ||
| ))) | ||
| } | ||
| ColumnarValue::Array(arr) => { | ||
| let f16_arr = downcast_arg!(arr, Float16Array); | ||
| let result: TimestampNanosecondArray = | ||
| f16_arr.unary(|x| (x.to_f64() * 1_000_000_000.0) as i64); | ||
| Ok(ColumnarValue::Array(Arc::new(result.with_timezone_opt(tz)))) | ||
| } | ||
| _ => exec_err!("Invalid Float16 value for to_timestamp"), | ||
| }, | ||
| Float32 => match &args[0] { | ||
| ColumnarValue::Scalar(ScalarValue::Float32(value)) => { | ||
| let timestamp_nanos = | ||
| value.map(|v| (v as f64 * 1_000_000_000.0) as i64); | ||
| Ok(ColumnarValue::Scalar(ScalarValue::TimestampNanosecond( | ||
| timestamp_nanos, | ||
| tz, | ||
| ))) | ||
| } | ||
| ColumnarValue::Array(arr) => { | ||
| let f32_arr = downcast_arg!(arr, Float32Array); | ||
| let result: TimestampNanosecondArray = | ||
| f32_arr.unary(|x| (x as f64 * 1_000_000_000.0) as i64); | ||
| Ok(ColumnarValue::Array(Arc::new(result.with_timezone_opt(tz)))) | ||
| } | ||
| _ => exec_err!("Invalid Float32 value for to_timestamp"), | ||
| }, | ||
| Float64 => match &args[0] { | ||
| ColumnarValue::Scalar(ScalarValue::Float64(value)) => { | ||
| let timestamp_nanos = value.map(|v| (v * 1_000_000_000.0) as i64); | ||
| Ok(ColumnarValue::Scalar(ScalarValue::TimestampNanosecond( | ||
| timestamp_nanos, | ||
| tz, | ||
| ))) | ||
| } | ||
| ColumnarValue::Array(arr) => { | ||
| let f64_arr = downcast_arg!(arr, Float64Array); | ||
| let result: TimestampNanosecondArray = | ||
| f64_arr.unary(|x| (x * 1_000_000_000.0) as i64); | ||
| Ok(ColumnarValue::Array(Arc::new(result.with_timezone_opt(tz)))) | ||
| } | ||
| _ => exec_err!("Invalid Float64 value for to_timestamp"), | ||
| }, |
There was a problem hiding this comment.
The logic for handling Float16, Float32, and Float64 is very similar and contains a lot of duplicated code. This can be refactored to improve maintainability. I suggest unifying the logic by casting all float types to Float64 first.
Float16 | Float32 | Float64 => {
// Cast to Float64 to unify handling
let arg = if args[0].data_type() != &DataType::Float64 {
args[0].cast_to(&DataType::Float64, None)?
} else {
args[0].clone()
};
match &arg {
ColumnarValue::Scalar(ScalarValue::Float64(value)) => {
let timestamp_nanos = value.map(|v| (v * 1_000_000_000.0) as i64);
Ok(ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(
timestamp_nanos,
tz,
)))
}
ColumnarValue::Array(arr) => {
let f64_arr = downcast_arg!(arr, Float64Array);
let result: TimestampNanosecondArray =
f64_arr.unary(|x| (x * 1_000_000_000.0) as i64);
Ok(ColumnarValue::Array(Arc::new(result.with_timezone_opt(tz))))
}
_ => exec_err!("Invalid float value for to_timestamp"),
}
},There was a problem hiding this comment.
value:annoying; category:bug; feedback: The suggestion by the Gemini AI reviewer is not optimal. The casting might be cheap to do but it is not zero-cost. It would be better to use a macros instead and pass the types as arguments.
🤖 Augment PR SummarySummary: Fixes Changes:
Technical Notes: The updated float handling avoids constructing a 1-row array for scalars, allowing DataFusion’s normal scalar-to-batch expansion to produce correctly-sized results. 🤖 Was this summary useful? React with 👍 or 👎 |
20224: To review by AI