18519: Refactor log() signature to use coercion API + fixes#9
18519: Refactor log() signature to use coercion API + fixes#9martin-augment wants to merge 4 commits intomainfrom
log() signature to use coercion API + fixes#9Conversation
WalkthroughThis PR extends DataFusion's expression framework with coercible type signatures and Float16 support, refactors decimal handling in scalar construction and type classification, expands TypeSignatureClass variants for decimal/float/numeric types, and reworks the log function to use coercible signatures with enhanced decimal and float support across multiple data type precisions. Changes
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| select log(2.0, 100000000000000000000000000000000000::decimal(38,0)); | ||
| ---- | ||
| 116.267483321058 | ||
| 116 |
There was a problem hiding this comment.
Bug: Precision loss in decimal log regression detected
The test expectation for log(2.0, large_decimal) was changed from the correct value 116.267483321058 to the incorrect value 116. While a TODO comment acknowledges the bug, accepting incorrect output masks a regression in the native decimal log implementation that loses precision when the base is a float.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Bugbot AI reviewer is correct that this Pull Request introduces a regression. The author of the PR acknowledges the regression and filled an issue to track it but probably this PR should not be merged as is due to this regression.
Code Review - PR #9: Refactor log() signature to use coercion API + fixesSummaryThis PR refactors the log() function to use the new coercion API and fixes several issues related to decimal handling. Overall, this is a solid refactoring that improves type handling and adds better support for various numeric types. Strengths
Issues and Concerns1. Critical: Loss of Precision in Decimal Log (Acknowledged TODO)The native decimal log implementation uses ilog() which returns only integer results, causing significant precision loss compared to floating-point log. Impact:
Recommendation:
2. Inconsistent Casting in calculate_binary_mathThe casting is now done inside calculate_binary_math (utils.rs:143), but this might not be obvious to callers. The previous approach where callers handled casting was more explicit. Recommendation: Add a comment explaining why casting happens here and what the expected input types are. 3. Potential Type Confusion with Decimal BaseWhen the value is Decimal32/64, it is cast to Decimal128 but the base must be Float64. The log_decimal128 function requires the base to be an integer (base.trunc() != base check at line 108), but the type signature allows Float64 base. Recommendation:
4. Missing Test for Decimal32/64 with Base ParameterWhile there are tests for Decimal32/64 with default base (10), there are no tests for custom bases with these types. Recommendation: Add test cases like log(2.0, arrow_cast(64, 'Decimal32(9, 0)')) 5. Code Duplication in SignatureThe as_float.clone() is repeated multiple times in the signature definition. While this is fine, consider extracting to a variable if more signatures are added in the future. Code QualityGood Practices Observed:
Minor Style Issues:
Security ConcernsNo security issues identified. The function properly validates inputs and handles edge cases like negative values, invalid bases, and overflow conditions. Performance ConsiderationsPositive:
Concerns:
Test CoverageExcellent test coverage with tests for:
Missing tests:
Documentation
RecommendationsHigh Priority:
Medium Priority:
Low Priority:
Approval RecommendationOverall: LGTM with minor concerns This is a solid refactoring that improves the codebase. The identified issues are either:
The code is well-tested, properly handles edge cases, and the new coercion API usage is a good architectural improvement. Suggested action: Merge after addressing the documentation concern about decimal precision loss (either in code comments or function documentation). Review generated by Claude Code |
| | DataType::Decimal64(precision, scale) => { | ||
| calculate_binary_math::<Decimal128Type, Float64Type, Float64Type, _>( | ||
| value, | ||
| &cast(&value, &DataType::Decimal128(*precision, *scale))?, |
There was a problem hiding this comment.
use arrow::compute::kernels::cast; imports the module, but this call uses cast(...) as a function; this won’t resolve and will fail to compile. Consider importing the function or fully qualifying the call to the cast function here (also applies to other locations in the file if any).
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:annoying; category:bug; feedback:The Augment AI reviewer is not correct! arrow::compute::kernels::cast is an alias of arrow_cast::cast (https://github.com/apache/arrow-rs/blob/d379b981768215b16182f78aebddf940a4b45c99/arrow/src/compute/kernels.rs#L21C9-L21C25) which is defined at https://github.com/apache/arrow-rs/blob/d379b981768215b16182f78aebddf940a4b45c99/arrow-cast/src/cast/mod.rs#L345 as pub fn cast(array: &dyn Array, to_type: &DataType) -> Result<ArrayRef, ArrowError>. It is a function and the types of the parameters match.
| F: Fn(L::Native, R::Native) -> Result<O::Native, ArrowError>, | ||
| R::Native: TryFrom<ScalarValue>, | ||
| { | ||
| let right = right.cast_to(&R::DATA_TYPE, None)?; |
There was a problem hiding this comment.
If the RHS is a scalar NULL, this path will attempt to convert it to R::Native and error instead of yielding a NULL result per SQL semantics (including when the NULL is typed as the target numeric type). Consider short‑circuiting scalar‑NULL RHS to return an all‑NULL output array.
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:incorrect-but-reasonable; category:bug; feedback:The Augment AI reviewer is not correct! Casting to NULL should return an error. All databases do that. Here is the Postgres behavior:
postgres=# SELECT CAST('3' AS NULL);
ERROR: syntax error at or near "NULL"
LINE 1: SELECT CAST('3' AS NULL);
^
postgres=# SELECT CAST(3 AS NULL);
ERROR: syntax error at or near "NULL"
LINE 1: SELECT CAST(3 AS NULL);
^
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
datafusion/common/src/scalar/mod.rs (1)
4566-4568: NewTryFrom<ScalarValue> for f16is useful; consider also addingFrom<f16>for symmetry.This completes the extraction path. For ergonomic construction (parity with f32/f64), also add
From<f16>andFrom<Option<f16>>via the existing macro.Apply near the other impl_scalar! calls:
impl_scalar!(u64, UInt64); +// Convenience constructors for Float16 +impl_scalar!(f16, Float16);Optionally, add a non‑consuming conversion to avoid moves:
impl TryFrom<&ScalarValue> for f16 { type Error = DataFusionError; fn try_from(value: &ScalarValue) -> Result<Self> { match value { ScalarValue::Float16(Some(v)) => Ok(*v), _ => _internal_err!("Cannot convert {:?} to {}", value, std::any::type_name::<Self>()), } } }datafusion/expr-common/src/signature.rs (1)
382-384: Consider adding integer example for Numeric type.The
Numerictype class includes both integers and floats (Int8 through UInt64, Float32, Float64), butget_example_types()only returnsFloat64. While acceptable for a single example, consider returning a more representative set likevec![DataType::Int64, DataType::Float64]to better illustrate the range of numeric types.datafusion/functions/src/math/log.rs (1)
360-370: Document precision characteristics for users.The test demonstrates that decimal logarithms lose fractional precision compared to floating-point calculations. While this is tracked in issue apache#18524, consider documenting this behavior limitation in user-facing documentation or the function's doc comment to set appropriate expectations.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
datafusion/common/src/scalar/mod.rs(3 hunks)datafusion/common/src/types/native.rs(2 hunks)datafusion/expr-common/src/signature.rs(5 hunks)datafusion/functions/src/math/log.rs(7 hunks)datafusion/functions/src/utils.rs(2 hunks)datafusion/sqllogictest/test_files/decimal.slt(2 hunks)datafusion/sqllogictest/test_files/math.slt(1 hunks)datafusion/sqllogictest/test_files/order.slt(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
datafusion/functions/src/utils.rs (1)
datafusion/common/src/scalar/mod.rs (1)
right_array(5581-5581)
datafusion/common/src/types/native.rs (2)
datafusion/expr-common/src/type_coercion/binary.rs (1)
is_decimal(330-338)datafusion/expr/src/type_coercion/mod.rs (1)
is_decimal(93-101)
datafusion/functions/src/math/log.rs (3)
datafusion/expr-common/src/signature.rs (3)
types(744-747)new_implicit(918-930)new_exact(910-912)datafusion/common/src/scalar/mod.rs (1)
new_ten(1718-1801)datafusion/functions/src/utils.rs (1)
args(100-112)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: codex
- GitHub Check: claude-review
- GitHub Check: Analyze (rust)
🔇 Additional comments (24)
datafusion/functions/src/utils.rs (2)
143-143: LGTM: Early cast consolidation.The early cast of
righttoR::DATA_TYPEconsolidates type conversion logic before branching, eliminating redundant casting in each branch.
159-172: Type parameter change is correct and compatible with all call sites.Verification confirms that the change from
as_primitive::<T>()toas_primitive::<R>()is correct. TheTparameter represents the left type andRrepresents the right type. All six call sites inlog.rspass the right array with a type matching theRparameter, including mixed-type scenarios likecalculate_binary_math::<Decimal128Type, Float64Type, Float64Type, _>where the right array is correctly Float64Type. No compatibility issues exist.datafusion/common/src/types/native.rs (3)
432-434: LGTM: Improved maintainability through helper methods.Refactoring
is_numeric()to use the new helper methods improves code clarity and maintainability.
485-488: LGTM: Comprehensive float type checking.The
is_float()method correctly includes all float types, including the newly supported Float16.
480-483: The review comment is incorrect; these implementations serve different purposes and cannot be consolidated.The apparent duplication is misleading: the two
is_decimalfunctions in type_coercion modules operate onDataType(checking forDecimal32/64/128/256variants), while the newNativeType::is_decimal()operates on a different type (NativeType) with a singleDecimal(_, _)variant.Critically, the binary.rs implementation uses
std::mem::discriminantcomparisons to distinguish between decimal variants for type coercion logic—functionality that would be lost by switching toNativeType::is_decimal(). These functions are not redundant; they solve problems in different type systems.Likely an incorrect or invalid review comment.
datafusion/sqllogictest/test_files/math.slt (2)
709-719: LGTM: Comprehensive null propagation testing.The EXPLAIN test correctly validates that
log(NULL, c2)is optimized to a constantFloat64(NULL)at planning time, demonstrating proper null propagation handling.
720-734: LGTM: Thorough float precision testing.The tests validate
logfunction behavior across Float16, Float32, and Float64 types, correctly demonstrating precision differences in the results (2.6074219 vs 2.606992 vs 2.606992198152).datafusion/sqllogictest/test_files/order.slt (2)
672-684: LGTM: Coercion API successfully eliminates explicit casts.The removal of explicit
CASToperations in theLOGfunction projections confirms that the new coercible type signatures are working as intended, simplifying query plans and improving usability.
686-700: LGTM: Consistent coercion behavior.The changes consistently demonstrate that the coercion API works correctly regardless of argument order, eliminating the need for explicit casts in both
log(c12, c11)andlog(c11, c12).datafusion/sqllogictest/test_files/decimal.slt (4)
797-840: LGTM: Comprehensive decimal type coverage.The tests thoroughly validate
log(100)across all decimal type variants (Decimal32, Decimal64, Decimal128, Decimal256) with different scales, correctly expecting a result of 2 for all cases.
842-899: TODO comment indicates known discrepancy in decimal log implementation.The TODO at lines 893-894 notes a known issue where
log(2.0, 10^35)returns 116 instead of the expected 116.267483321058. This suggests a precision issue with the native decimal log implementation that should be tracked separately.Additionally, verify that the overflow error message at line 855 is appropriate and user-friendly. The test expects an error for
log(10^38)which exceeds Decimal128 precision.Referenced issue: apache#18524
922-943: LGTM: Tests correctly demonstrate parse_float_as_decimal impact.The tests appropriately show how the
parse_float_as_decimalsetting affects precision, demonstrating that float parsing causes rounding (35 → 34) while decimal parsing maintains precision. This is important for users to understand.
901-920: LGTM: Thorough null propagation testing.The tests comprehensively validate null handling for both single-argument
log(null)and two-argumentlog(base, value)forms, correctly expecting NULL results in all cases.datafusion/common/src/scalar/mod.rs (1)
1737-1744: Approve the code change; request verification that tests for scale=0 are added or will be added in this PR.The change is correct—all four decimal types now consistently check
if *scale < 0(Decimal128/256 already used this check). Since scale=0 produces 10^(0+1)=10, it is mathematically valid and should be supported.However, the existing tests for Decimal128/256 do not explicitly validate scale=0. Verify that this PR includes tests for both Decimal32/64 with scale=0 and scale<0, to confirm the behavior matches the 128/256 implementations and that no regressions are introduced.
datafusion/expr-common/src/signature.rs (3)
24-24: LGTM: Import additions support new decimal functionality.The new imports are necessary for the Decimal type signature class implementation.
337-339: LGTM: New type signature classes enable granular coercion.The Float, Decimal, and Numeric variants appropriately extend the type classification system for coercible signatures.
402-404: LGTM: Type matching and casting logic follow established patterns.The implementations for the new type classes correctly mirror the existing patterns. The decision to preserve the origin type in
default_casted_type()when it already matches the target class avoids unnecessary casts.Also applies to: 438-446
datafusion/functions/src/math/log.rs (7)
26-45: LGTM: Import additions support coercion API and extended type coverage.The new imports are necessary for the coercible signature implementation and expanded decimal/Float16 support.
76-102: LGTM: Coercible signature design enables decimal precedence.The signature construction correctly uses the coercion API to accept decimals natively while supporting implicit conversion from numeric types to float. The ordering within
OneOfensures decimal variants are matched first, as intended.
152-159: LGTM: Float16 support properly integrated.The return type logic correctly extends to Float16, preserving input precision across all float types.
188-191: LGTM: Early null propagation improves efficiency.The null check avoids unnecessary computation when inputs have null types, which is consistent with DataFusion's null handling patterns.
193-201: LGTM: Default base-10 handling is correct.The logic properly defaults to base-10 for single-argument calls while preserving explicit bases for two-argument calls. The use of
ScalarValue::new_ten()correctly supports all numeric types including decimals.
204-256: LGTM: Extended type support is well-structured.The implementation correctly handles Float16 and decimal types. The Decimal32/64 upcast limitation and integer logarithm precision loss are both documented with TODO comments and issue tracking.
Note: Lines 226-228 reference issue apache#17555 for native Decimal32/64 log support, and the test at lines 360-370 notes precision loss tracked in issue apache#18524.
272-283: LGTM: Null propagation in simplify ensures consistency.Adding null handling to the simplify path maintains consistency with the invoke path and enables early optimization of null expressions.
|
Findings
|
value:useful; category:bug; feedback:The Codex AI reviewer is correct that this Pull Request introduces a regression. The author of the PR acknowledges the regression and filled an issue to track it but probably this PR should not be merged as is due to this regression. |
value:useful; category:bug; feedback:The Claude AI reviewer is correct that this Pull Request introduces a regression. The author of the PR acknowledges the regression and filled an issue to track it but probably this PR should not be merged as is due to this regression. |
value:good-to-have; category:bug; feedback: The CodeRabbit AI reviewer is correct that for symmetry the From conversion should be added too. It might not be needed at the moment but it would make the API more consistent. |
value:useful; category:bug; feedback:The Claude AI reviewer is correct that the example would be more clear and useful if the items in the vector are different than the one for TypeSignatureClass::Float. This would increase the code coverage. |
value:useful; category:bug; feedback:The Claude AI reviewer is correct that this Pull Request introduces a regression. The author of the PR acknowledges the regression and filled an issue to track it but probably this PR should not be merged as is due to this regression. |
18519: To review by AI
Note
Refactors
log()to use coercible signatures with Float16 and improved decimal handling, introduces float/decimal/numeric type classes, tweaks scalar/utility casting, and updates tests and ordering plans.log):TypeSignature::Coerciblewith newTypeSignatureClass::{Float, Decimal, Numeric}.ScalarValue::new_ten.NativeTypehelpers:is_float,is_decimal; refactoris_numericto use helpers.0innew_ten(reject only negative scale).TryFrom<ScalarValue> for f16.calculate_binary_mathnow casts RHS upfront and avoids redundant casts for array RHS.Written by Cursor Bugbot for commit bfb9e92. This will update automatically on new commits. Configure here.