Refactor Spark bitshift signature #18649

Jefffrey · 2025-11-12T15:10:08Z

Which issue does this PR close?

Rationale for this change

Prefer to avoid user_defined for consistency in function definitions.

What changes are included in this PR?

Refactor signature of Spark bit shift functions (left, right, right unsigned) to use coercion API instead of being user defined.

Also refactor the bit shift code to have a common base struct.

Move the Rust unit tests to SLTs.

Are these changes tested?

Existing tests.

Are there any user-facing changes?

No.

Jefffrey · 2025-11-12T15:11:29Z

datafusion/spark/src/function/bitwise/bit_shift.rs

-/// Performs a bitwise left shift on each element of the `value` array by the corresponding amount in the `shift` array.
-/// The shift amount is normalized to the bit width of the type, matching Spark/Java semantics for negative and large shifts.
-///
-/// # Arguments
-/// * `value` - The array of values to shift.
-/// * `shift` - The array of shift amounts (must be Int32).
-///
-/// # Returns
-/// A new array with the shifted values.
-fn shift_left<T: ArrowPrimitiveType>(
+/// Bitwise left shift on elements in `value` by corresponding `shift` amount.
+/// The shift amount is normalized to the bit width of the type, matching Spark/Java
+/// semantics for negative and large shifts.
+fn shift_left<T>(
    value: &PrimitiveArray<T>,
-    shift: &PrimitiveArray<Int32Type>,
+    shift: &Int32Array,
 ) -> Result<PrimitiveArray<T>>
 where
-    T::Native: ArrowNativeType + std::ops::Shl<i32, Output = T::Native>,
+    T: ArrowPrimitiveType,
+    T::Native: std::ops::Shl<i32, Output = T::Native>,


Rewrote some comments to be more succinct, and also cleanup some function signatures (remove some unused bounds, use type aliases, etc.)

Jefffrey · 2025-11-12T15:12:00Z

datafusion/spark/src/function/bitwise/bit_shift.rs

-}
-
-#[derive(Debug, Hash, Eq, PartialEq)]
-pub struct SparkShiftLeft {


Folded these structs into a single common SparkBitShift struct

Jefffrey · 2025-11-12T15:12:09Z

datafusion/spark/src/function/bitwise/bit_shift.rs

+        );
        Self {
-            signature: Signature::user_defined(Volatility::Immutable),
+            signature: Signature::one_of(


Signature here

Jefffrey · 2025-11-12T15:12:22Z

datafusion/spark/src/function/bitwise/bit_shift.rs

 }
-
-#[cfg(test)]
-mod tests {


Moved these all to SLTs

Jefffrey · 2025-11-12T15:15:33Z

datafusion/functions/src/macros.rs

            std::sync::Arc::clone(&INSTANCE)
        }
    };
+    ($UDF:ty, $NAME:ident, $CTOR:path) => {


This is to accommodate being able to use a single struct (e.g. SparkBitShift) for multiple different functions; similar to how allow for window functions:

datafusion/datafusion/functions-window/src/macros.rs

Lines 96 to 116 in e661b33

macro_rules! get_or_init_udwf {

($UDWF:ident, $OUT_FN_NAME:ident, $DOC:expr) => {

get_or_init_udwf!($UDWF, $OUT_FN_NAME, $DOC, $UDWF::default);

};

($UDWF:ident, $OUT_FN_NAME:ident, $DOC:expr, $CTOR:path) => {

paste::paste! {

#[doc = concat!(" Returns a [`WindowUDF`](datafusion_expr::WindowUDF) for [`", stringify!($OUT_FN_NAME), "`].")]

#[doc = ""]

#[doc = concat!(" ", $DOC)]

pub fn [<$OUT_FN_NAME _udwf>]() -> std::sync::Arc<datafusion_expr::WindowUDF> {

// Singleton instance of UDWF, ensures it is only created once.

static INSTANCE: std::sync::LazyLock<std::sync::Arc<datafusion_expr::WindowUDF>> =

std::sync::LazyLock::new(|| {

std::sync::Arc::new(datafusion_expr::WindowUDF::from($CTOR()))

});

std::sync::Arc::clone(&INSTANCE)

}

}

};

}

I had a bit of difficulty trying to reduce it to something simpler like:

macro_rules! make_udf_function { ($UDF:ty, $NAME:ident) => { make_udf_function!($UDF, $NAME, $UDF::new); // Error on :: token }; ($UDF:ty, $NAME:ident, $CTOR:path) => { #[allow(rustdoc::redundant_explicit_links)] #[doc = concat!("Return a [`ScalarUDF`](datafusion_expr::ScalarUDF) implementation of ", stringify!($NAME))] pub fn $NAME() -> std::sync::Arc<datafusion_expr::ScalarUDF> { // Singleton instance of the function static INSTANCE: std::sync::LazyLock< std::sync::Arc<datafusion_expr::ScalarUDF>, > = std::sync::LazyLock::new(|| { std::sync::Arc::new(datafusion_expr::ScalarUDF::new_from_impl( $CTOR(), )) }); std::sync::Arc::clone(&INSTANCE) } }; }

To reduce duplication. For now just kept this minor duplication of the arms.

comphead

Thanks @Jefffrey

martin-g · 2025-11-12T21:41:15Z

datafusion/spark/src/function/bitwise/bit_shift.rs

+use arrow::array::{ArrayRef, ArrowPrimitiveType, AsArray, Int32Array, PrimitiveArray};
 use arrow::compute;
 use arrow::datatypes::{
    ArrowNativeType, DataType, Int32Type, Int64Type, UInt32Type, UInt64Type,


Suggested change

ArrowNativeType, DataType, Int32Type, Int64Type, UInt32Type, UInt64Type,

DataType, Int32Type, Int64Type, UInt32Type, UInt64Type,

The trait is still needed for calling get_byte_width in the shift functions

let bit_num = (T::Native::get_byte_width() * 8) as i32;

martin-g · 2025-11-12T21:50:14Z

datafusion/spark/src/function/bitwise/bit_shift.rs

-            &arg_types[1],
-        ));
+    if value_array.data_type().is_null() || shift_array.data_type().is_null() {
+        return Ok(Arc::new(Int32Array::new_null(value_array.len())));


Why always Int32Array ?
If shift_array.data_type().is_null() then I think you need to use the type returned by value_array.data_type(), which could be Int64 for example.
If value_array.data_type().is_null() then fallback to Int32Array

That's a good point, I'll fix it

I removed the explicit null handling as it seems coercion will coerce to int types for us; added tests to confirm this

768a657

Refactor Spark bitshift signature

a2f260b

github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation spark labels Nov 12, 2025

Jefffrey commented Nov 12, 2025

View reviewed changes

Jefffrey marked this pull request as ready for review November 12, 2025 15:16

comphead approved these changes Nov 12, 2025

View reviewed changes

martin-g reviewed Nov 12, 2025

View reviewed changes

Fix null handling

768a657

martin-g approved these changes Nov 13, 2025

View reviewed changes

Jefffrey added this pull request to the merge queue Nov 13, 2025

Merged via the queue into apache:main with commit 35b5363 Nov 13, 2025
28 checks passed

Jefffrey deleted the refactor-spark-bitshift branch November 13, 2025 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor Spark bitshift signature #18649

Refactor Spark bitshift signature #18649

Jefffrey commented Nov 12, 2025

Uh oh!

Jefffrey Nov 12, 2025

Uh oh!

Jefffrey Nov 12, 2025

Uh oh!

Jefffrey Nov 12, 2025

Uh oh!

Jefffrey Nov 12, 2025

Uh oh!

Jefffrey Nov 12, 2025

Uh oh!

comphead left a comment

Uh oh!

martin-g Nov 12, 2025

Uh oh!

Jefffrey Nov 13, 2025

Uh oh!

martin-g Nov 12, 2025

Uh oh!

Jefffrey Nov 13, 2025

Uh oh!

Jefffrey Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	macro_rules! get_or_init_udwf {
	($UDWF:ident, $OUT_FN_NAME:ident, $DOC:expr) => {
	get_or_init_udwf!($UDWF, $OUT_FN_NAME, $DOC, $UDWF::default);
	};

	($UDWF:ident, $OUT_FN_NAME:ident, $DOC:expr, $CTOR:path) => {
	paste::paste! {
	#[doc = concat!(" Returns a [`WindowUDF`](datafusion_expr::WindowUDF) for [`", stringify!($OUT_FN_NAME), "`].")]
	#[doc = ""]
	#[doc = concat!(" ", $DOC)]
	pub fn [<$OUT_FN_NAME _udwf>]() -> std::sync::Arc<datafusion_expr::WindowUDF> {
	// Singleton instance of UDWF, ensures it is only created once.
	static INSTANCE: std::sync::LazyLock<std::sync::Arc<datafusion_expr::WindowUDF>> =
	std::sync::LazyLock::new(\|\| {
	std::sync::Arc::new(datafusion_expr::WindowUDF::from($CTOR()))
	});
	std::sync::Arc::clone(&INSTANCE)
	}
	}
	};
	}

	ArrowNativeType, DataType, Int32Type, Int64Type, UInt32Type, UInt64Type,
	DataType, Int32Type, Int64Type, UInt32Type, UInt64Type,

Refactor Spark bitshift signature #18649

Refactor Spark bitshift signature #18649

Conversation

Jefffrey commented Nov 12, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants