19065: refactor: Refactor spark width bucket signature away from user defined#97
19065: refactor: Refactor spark width bucket signature away from user defined#97martin-augment wants to merge 1 commit intomainfrom
Conversation
WalkthroughThis pull request extends the logical type system in DataFusion and refactors the width_bucket function implementation. Two new logical type singleton variants are introduced: LOGICAL_INTERVAL_YEAR_MONTH and LOGICAL_DURATION_MICROSECOND, along with corresponding getter functions to provide globally accessible type references. Separately, the width_bucket function's type coercion and error handling are restructured. A new helper constructs a composite type signature supporting numeric, duration, and interval types. The UDF invocation flow is updated to use take_function_args for argument extraction and casting, and error messages for unsupported data types are made more explicit. The public API surface remains unchanged. ✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-11-06T13:37:46.028ZApplied to files:
🧬 Code graph analysis (1)datafusion/spark/src/function/math/width_bucket.rs (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review for PR #97: Refactor Spark width_bucket signatureSummaryThis PR refactors the ✅ Positive Aspects
🔍 Issues & Concerns1. Missing import cleanup (Minor)Location: The old import for use crate::function::error_utils::unsupported_data_types_exec_err;This line is already removed, so this is good. ✅ 2. Import organization (Style - Minor)Location: The Arrow imports are split into two groups: use arrow::array::{
Array, ArrayRef, DurationMicrosecondArray, Float64Array, IntervalMonthDayNanoArray,
IntervalYearMonthArray,
};
// ... other imports ...
use arrow::array::{Int32Array, Int32Builder};
use arrow::datatypes::TimeUnit::Microsecond;Suggestion: Consolidate all use arrow::array::{
Array, ArrayRef, DurationMicrosecondArray, Float64Array, Int32Array, Int32Builder,
IntervalMonthDayNanoArray, IntervalYearMonthArray,
};3. Error message consistency (Minor)Location: The new error uses other => internal_err\!(
"width_bucket received unexpected data types: {:?}, {:?}, {:?}, {:?}",
other,
minv.data_type(),
maxv.data_type(),
nb.data_type()
),Suggestion: Consider adding context that this is a post-coercion error to help debugging: other => internal_err\!(
"width_bucket: unexpected data type after coercion: {:?} (expected Float64, Duration(Microsecond), Interval(YearMonth), or Interval(MonthDayNano))",
other
),4. Potential type signature issue (Needs verification)Location: The numeric coercion accepts: let numeric = Coercion::new_implicit(
TypeSignatureClass::Native(logical_float64()),
vec\![
TypeSignatureClass::Integer,
TypeSignatureClass::Float,
TypeSignatureClass::Decimal,
],
NativeType::Float64,
);The old implementation used Question: Is this intentional? The old (a, b, c, &(Int8 | Int16 | Int32 | Int64))
if is_signed_numeric(a) && is_signed_numeric(b) && is_signed_numeric(c)If unsigned integers should not be accepted, you need to be more explicit with the coercion classes. 5. Documentation opportunity (Enhancement)Consider adding a doc comment to the /// Creates a type signature for width_bucket with three arguments of the same type
/// (value, min, max) and a bucket count argument.
fn width_bucket_signature(arg: &Coercion, bucket: &Coercion) -> TypeSignature {
TypeSignature::Coercible(vec\![arg.clone(), arg.clone(), arg.clone(), bucket.clone()])
}🔒 Security ConsiderationsNo security concerns identified. The function maintains the same validation logic for:
⚡ Performance Considerations
🧪 Test CoverageThe existing test suite is comprehensive and covers:
Recommendation: The tests pass as-is, which confirms backward compatibility. ✅ 📋 Recommendations SummaryMust Fix:
Should Fix:
Nice to Have:
Overall AssessmentStatus: ✅ Approve with minor suggestions This is a solid refactoring that modernizes the code to use the current type signature system. The logic remains functionally equivalent, and the comprehensive test suite validates the behavior. The main concern is verifying whether the change in numeric type handling (accepting unsigned integers) is intentional. Code Quality: 8.5/10 |
value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct! The imports could be optimized and combined into one group. This will make it easier to locate them if/when a developer needs to confirm the module a given item comes from. |
value:annoying; category:bug; feedback:The Claude AI reviewer is not correct! The proposed way in the Pull Request gives the same information but it is future-proof and adding a new supported type will be listed automatically with the need of an update by the developer. |
value:good-but-wont-fix; category:bug; feedback:The Claude AI reviewer is correct! The new signature is more wide and accepts also unsigned integers (u8, u16, u32, etc.). Since all of them could be casted to f64 there is no real problem. |
19065: To review by AI