20224: fix: Fix scalar broadcast for to_timestamp() by martin-augment · Pull Request #229 · martin-augment/datafusion

martin-augment · 2026-02-09T07:41:59Z

20224: To review by AI

When to_timestamp() was invoked with a scalar Float64 and an array of strings, the previous coding neglected to broadcast the scalar to the array properly when producing the return value. That is, a query like `SELECT to_timestamp(123.5, t.x) FROM t` would result in: Internal error: UDF to_timestamp returned a different number of rows than expected

gemini-code-assist · 2026-02-09T07:42:21Z

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug in the to_timestamp function where it previously failed to correctly process scalar floating-point values. The changes introduce explicit handling for Float16, Float32, and Float64 scalar inputs, ensuring they are properly converted to TimestampNanosecond. This enhancement improves the robustness of the to_timestamp function and is validated by new SQL logic tests.

Highlights

Scalar Float Handling in to_timestamp: The to_timestamp function now correctly processes scalar Float16, Float32, and Float64 inputs, ensuring proper conversion to TimestampNanosecond.
New SQL Logic Tests: Added new SQL logic tests to validate the to_timestamp function's behavior with various scalar floating-point inputs, confirming the fix.
Minor File Cleanup: Removed trailing newlines from several SQL logic test files, contributing to minor code hygiene.

Changelog

datafusion/functions/src/datetime/to_timestamp.rs
- Refactored the Float16, Float32, and Float64 branches to use match statements, distinguishing between ColumnarValue::Scalar and ColumnarValue::Array inputs.
- Added logic to convert scalar float values directly to TimestampNanosecond using a map operation.
datafusion/sqllogictest/test_files/datetime/arith_date_time.slt
- Removed a trailing newline at the end of the file.
datafusion/sqllogictest/test_files/datetime/arith_timestamp_duration.slt
- Removed a trailing newline at the end of the file.
datafusion/sqllogictest/test_files/datetime/timestamps.slt
- Introduced new test cases for to_timestamp with scalar Float16, Float32, and Float64 values to confirm the fix.
datafusion/sqllogictest/test_files/limit.slt
- Removed a trailing newline at the end of the file.
datafusion/sqllogictest/test_files/limit_single_row_batches.slt
- Removed a trailing newline at the end of the file.
datafusion/sqllogictest/test_files/spark/collection/size.slt
- Removed a trailing newline at the end of the file.
datafusion/sqllogictest/test_files/spark/datetime/time_trunc.slt
- Removed a trailing newline at the end of the file.
datafusion/sqllogictest/test_files/spark/datetime/trunc.slt
- Removed a trailing newline at the end of the file.
datafusion/sqllogictest/test_files/struct.slt
- Removed a trailing newline at the end of the file.
datafusion/sqllogictest/test_files/truncate.slt
- Removed a trailing newline at the end of the file.

Activity

The pull request was created by martin-augment with the title "20224: fix: Fix scalar broadcast for to_timestamp()".
The author explicitly requested review by AI in the description: "To review by AI".

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-09T07:42:22Z

Walkthrough

This pull request refactors the to_timestamp function in DataFusion to support both scalar and array inputs for floating-point types (Float16, Float32, Float64). Previously, these types were only processed as dedicated array branches. The changes introduce per-argument logic that converts scalar floating-point values to nanoseconds and returns TimestampNanosecond scalars, while preserving array-based conversion for array inputs. The pull request also includes updates to test files: adding new test coverage for scalar float inputs in timestamps tests, removing an invalid modulo operation test, and adjusting whitespace formatting across multiple test files.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-20224-2026-02-09-07-41-56

No actionable comments were generated in the recent review. 🎉

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-02-09T07:44:44Z

Pull Request Review: Fix scalar broadcast for to_timestamp()

Summary

This PR fixes a bug where to_timestamp() incorrectly handled scalar float inputs (Float16, Float32, Float64) by always converting them to arrays using to_array(1), even when they should remain as scalar values for performance optimization.

Code Quality: ✅ Good

Strengths:

The fix correctly implements the scalar/array pattern used throughout DataFusion
Code follows existing conventions seen in other functions (e.g., advanced_udf.rs examples)
Proper use of match expressions to handle both ColumnarValue::Scalar and ColumnarValue::Array cases
Consistent pattern applied across all three float types (Float16, Float32, Float64)

Style Observations:

The implementation aligns with DataFusion's performance optimization strategy of keeping scalars as scalars
Error messages are clear and descriptive

Potential Issues: ⚠️ Minor Concerns

1. Unreachable Error Branch

The _ => exec_err!("Invalid Float16/32/64 value for to_timestamp") branches are likely unreachable:

Location: datafusion/functions/src/datetime/to_timestamp.rs:448, 465, 481

Since the outer match args[0].data_type() already ensures we're dealing with the correct float type, the inner match should only see:

ColumnarValue::Scalar(ScalarValue::Float16(_))
ColumnarValue::Array(_)

The wildcard _ pattern would only match if there's a type mismatch between data_type() and the actual ColumnarValue variant, which shouldn't happen in normal operation.

Recommendation: Consider if these error branches should use unreachable!() or internal_err!() instead to indicate they represent impossible states.

2. Missing Test Coverage for Edge Cases

The new tests in timestamps.slt are good but could be expanded:

Current coverage:

✅ Scalar float with array query (the main bug fix)
✅ Float16, Float32, Float64 types

Missing coverage:

NULL scalar floats: to_timestamp(NULL::float)
NaN values: to_timestamp('NaN'::float)
Infinity values: to_timestamp('Infinity'::float)
Negative timestamps: to_timestamp(-123.5)

These edge cases could expose undefined behavior in the float-to-i64 cast.

3. Floating Point Precision Loss

Location: datafusion/functions/src/datetime/to_timestamp.rs:453, 462, 469

The conversions (v as f64 * 1_000_000_000.0) as i64 can lose precision for:

Large timestamps (dates far in future/past)
Float32 → Float64 conversion before multiplication

Example:

// Float32 precision is ~7 decimal digits
let f32_val = 456.789f32;
let result = (f32_val as f64 * 1_000_000_000.0) as i64;
// May not precisely equal 456789000000

The test output shows this: 1970-01-01T00:07:36.789001464 instead of exact ...789000000

Recommendation: Add a comment documenting this known limitation, or consider if Float32 should use a different conversion strategy.

Performance: ✅ Excellent

This fix is a significant performance improvement:

Before:

Scalar floats were always expanded to 1-element arrays
Required allocation and copying even for constant values
Query optimizer couldn't leverage scalar simplification

After:

Scalars remain as scalars throughout execution
Zero-copy for constant propagation
Enables expression simplification during planning

Benchmark scenario:

SELECT to_timestamp(123.5, format_col) FROM large_table;

Before: Creates millions of 1-element arrays
After: Single scalar value reused for all rows

Security: ✅ No concerns

No unsafe code introduced
Float-to-i64 casts are safe (saturating behavior)
No untrusted input parsing changes
Timezone handling remains unchanged

Test Coverage: ✅ Good (with minor gaps)

Strengths:

Tests the actual bug: scalar float with varying format column
Covers all three float types
Validates correct timestamp output
Tests integrate with existing sqllogictest suite

Gaps:

No edge case coverage (NULL, NaN, Infinity, negative values)
No explicit performance/memory test (though this is typical for DataFusion)
No test for the error branches (acceptable if unreachable)

Additional Observations

Whitespace-only changes

Several test files have trailing newline changes:

arith_date_time.slt, arith_timestamp_duration.slt, etc.

These are minor cleanup changes and improve consistency, but are unrelated to the main fix.

Recommendations

Consider adding edge case tests for NULL, NaN, Infinity, and negative timestamps
Document floating point precision limitations with a code comment
Review error branch reachability - consider using unreachable!() if they truly can't be hit
Optional: Add an integration test that verifies scalar values remain as scalars (not converted to arrays)

Verdict: ✅ Approve with minor suggestions

This is a solid bug fix that:

Correctly implements the scalar broadcast pattern
Improves performance significantly
Has good test coverage for the main use case
Follows DataFusion coding conventions

The minor concerns are non-blocking and could be addressed in follow-up PRs if desired.

Reviewed by Claude Code

gemini-code-assist

Code Review

This pull request fixes an issue with scalar broadcast for to_timestamp() when using float types. The changes correctly handle ColumnarValue::Scalar for Float16, Float32, and Float64 inputs, and new tests are added to cover these cases. The logic for these float types is quite similar, and I've suggested a refactoring to reduce code duplication. Additionally, several test files have had trailing newlines removed, which is a minor formatting improvement.

gemini-code-assist · 2026-02-09T07:45:36Z

datafusion/functions/src/datetime/to_timestamp.rs

+            Float16 => match &args[0] {
+                ColumnarValue::Scalar(ScalarValue::Float16(value)) => {
+                    let timestamp_nanos =
+                        value.map(|v| (v.to_f64() * 1_000_000_000.0) as i64);
+                    Ok(ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(
+                        timestamp_nanos,
+                        tz,
+                    )))
+                }
+                ColumnarValue::Array(arr) => {
+                    let f16_arr = downcast_arg!(arr, Float16Array);
+                    let result: TimestampNanosecondArray =
+                        f16_arr.unary(|x| (x.to_f64() * 1_000_000_000.0) as i64);
+                    Ok(ColumnarValue::Array(Arc::new(result.with_timezone_opt(tz))))
+                }
+                _ => exec_err!("Invalid Float16 value for to_timestamp"),
+            },
+            Float32 => match &args[0] {
+                ColumnarValue::Scalar(ScalarValue::Float32(value)) => {
+                    let timestamp_nanos =
+                        value.map(|v| (v as f64 * 1_000_000_000.0) as i64);
+                    Ok(ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(
+                        timestamp_nanos,
+                        tz,
+                    )))
+                }
+                ColumnarValue::Array(arr) => {
+                    let f32_arr = downcast_arg!(arr, Float32Array);
+                    let result: TimestampNanosecondArray =
+                        f32_arr.unary(|x| (x as f64 * 1_000_000_000.0) as i64);
+                    Ok(ColumnarValue::Array(Arc::new(result.with_timezone_opt(tz))))
+                }
+                _ => exec_err!("Invalid Float32 value for to_timestamp"),
+            },
+            Float64 => match &args[0] {
+                ColumnarValue::Scalar(ScalarValue::Float64(value)) => {
+                    let timestamp_nanos = value.map(|v| (v * 1_000_000_000.0) as i64);
+                    Ok(ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(
+                        timestamp_nanos,
+                        tz,
+                    )))
+                }
+                ColumnarValue::Array(arr) => {
+                    let f64_arr = downcast_arg!(arr, Float64Array);
+                    let result: TimestampNanosecondArray =
+                        f64_arr.unary(|x| (x * 1_000_000_000.0) as i64);
+                    Ok(ColumnarValue::Array(Arc::new(result.with_timezone_opt(tz))))
+                }
+                _ => exec_err!("Invalid Float64 value for to_timestamp"),
+            },


The logic for handling Float16, Float32, and Float64 is very similar and contains a lot of duplicated code. This can be refactored to improve maintainability. I suggest unifying the logic by casting all float types to Float64 first.

Float16 | Float32 | Float64 => { // Cast to Float64 to unify handling let arg = if args[0].data_type() != &DataType::Float64 { args[0].cast_to(&DataType::Float64, None)? } else { args[0].clone() }; match &arg { ColumnarValue::Scalar(ScalarValue::Float64(value)) => { let timestamp_nanos = value.map(|v| (v * 1_000_000_000.0) as i64); Ok(ColumnarValue::Scalar(ScalarValue::TimestampNanosecond( timestamp_nanos, tz, ))) } ColumnarValue::Array(arr) => { let f64_arr = downcast_arg!(arr, Float64Array); let result: TimestampNanosecondArray = f64_arr.unary(|x| (x * 1_000_000_000.0) as i64); Ok(ColumnarValue::Array(Arc::new(result.with_timezone_opt(tz)))) } _ => exec_err!("Invalid float value for to_timestamp"), } },

value:annoying; category:bug; feedback: The suggestion by the Gemini AI reviewer is not optimal. The casting might be cheap to do but it is not zero-cost. It would be better to use a macros instead and pass the types as arguments.

augmentcode · 2026-02-09T07:45:37Z

🤖 Augment PR Summary

Summary: Fixes to_timestamp() behavior when the input is a scalar float but the expression is evaluated over multi-row batches.

Changes:

Handle Float16/Float32/Float64 scalar inputs by returning a scalar Timestamp(Nanosecond) (instead of forcing a length-1 array), enabling correct scalar broadcasting.
Preserve existing array-path behavior for float arrays (vectorized conversion to nanoseconds).
Add sqllogictest coverage for scalar float constants/casts used alongside a column argument.
Normalize/newline-only adjustments in several .slt files.

Technical Notes: The updated float handling avoids constructing a 1-row array for scalars, allowing DataFusion’s normal scalar-to-batch expansion to produce correctly-sized results.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

gemini-code-assist bot reviewed Feb 9, 2026

View reviewed changes

augmentcode bot reviewed Feb 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20224: fix: Fix scalar broadcast for to_timestamp()#229

20224: fix: Fix scalar broadcast for to_timestamp()#229
martin-augment wants to merge 1 commit intomainfrom
pr-20224-2026-02-09-07-41-56

martin-augment commented Feb 9, 2026

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Uh oh!

coderabbitai bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

claude bot commented Feb 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Uh oh!

martin-augment Feb 9, 2026

Uh oh!

augmentcode bot commented Feb 9, 2026

Uh oh!

augmentcode bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

martin-augment commented Feb 9, 2026

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Feb 9, 2026

Pull Request Review: Fix scalar broadcast for to_timestamp()

Summary

Code Quality: ✅ Good

Potential Issues: ⚠️ Minor Concerns

1. Unreachable Error Branch

2. Missing Test Coverage for Edge Cases

3. Floating Point Precision Loss

Performance: ✅ Excellent

Security: ✅ No concerns

Test Coverage: ✅ Good (with minor gaps)

Additional Observations

Whitespace-only changes

Recommendations

Verdict: ✅ Approve with minor suggestions

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot commented Feb 9, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Feb 9, 2026 •

edited

Loading