Skip to content

19240: fix: #10020 make concat function support array concatenation#113

Open
martin-augment wants to merge 1 commit intomainfrom
pr-19240-2025-12-10-15-00-32
Open

19240: fix: #10020 make concat function support array concatenation#113
martin-augment wants to merge 1 commit intomainfrom
pr-19240-2025-12-10-15-00-32

Conversation

@martin-augment
Copy link
Owner

19240: To review by AI

@coderabbitai
Copy link

coderabbitai bot commented Dec 10, 2025

Walkthrough

The concat function in datafusion/functions/src/string/concat.rs has been modified to support array-aware concatenation. The signature initialization changed from Signature::variadic to Signature::variadic_any. Two new private helper methods were added: has_array_args to detect array arguments and to_array_concat to construct ArrayConcat expressions. The return_type method now handles array inputs by returning array types when List arguments are present. The invoke_with_args method was extended to detect and route array cases through the new array concatenation path while maintaining backward compatibility with scalar concatenation logic. Related imports were updated to support these changes.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-19240-2025-12-10-15-00-32

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 4 suggestions posted.

Comment augment review to trigger a new review at any time.

use arrow::datatypes::DataType::*;
args.iter().any(|arg| match arg {
ColumnarValue::Array(arr) => matches!(arr.data_type(), List(_)),
ColumnarValue::Scalar(scalar) => matches!(scalar, ScalarValue::List(_, _)),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern won’t match: ScalarValue::List takes a single Arc<ListArray> (and there are separate LargeList/FixedSizeList variants). As written, scalar list arguments won’t be detected.

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The proposed changes check only for the ScalarValue::List type but it should also check the other List types - LargeList and FixedSizeList, otherwise the solution won't be complete.

fn has_array_args(args: &[ColumnarValue]) -> bool {
use arrow::datatypes::DataType::*;
args.iter().any(|arg| match arg {
ColumnarValue::Array(arr) => matches!(arr.data_type(), List(_)),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only DataType::List(_) is checked; consider including LargeList(_) and FixedSizeList(_) so all list-like arrays are recognized.

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

@martin-augment martin-augment Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The proposed changes check only for the DataType::List type but it should also check the other List types - LargeList and FixedSizeList, otherwise the solution won't be complete.

let args = args.into_iter()
.map(|arg| {
// If the argument is not already an array, wrap it in an array
if !matches!(arg.get_type(&DataType::Null).unwrap_or_default(), DataType::List(_)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using expr.get_type without a schema (&DataType::Null) won’t determine if arg is a list and risks wrapping list args in MakeArray, producing nested lists. You likely need schema‑aware type resolution or a different check here.

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The proposed code does not even compile. The expected parameter is schema: &dyn ExprSchema, not &DataType.

use DataType::*;

// If any argument is an array, return the array type
if arg_types.iter().any(|t| matches!(t, List(_))) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array detection in return_type only checks List(_); it should likely also handle LargeList(_) and FixedSizeList(_) to return the correct array type. (Also applies to the if let Some(DataType::List(...)) below.)

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The proposed changes check only for the DataType::List type but it should also check the other List types - LargeList and FixedSizeList, otherwise the solution won't be complete.

@claude
Copy link

claude bot commented Dec 10, 2025

Code Review for PR #113: fix: apache#10020 make concat function support array concatenation

Thank you for working on this feature! I've reviewed the changes and identified several critical issues that need to be addressed before this can be merged.


🔴 Critical Issues

1. Compilation Error - Missing closing brace (Line 72)

Self {
    signature: Signature::variadic_any(Volatility::Immutable)
    ),  // ❌ Missing opening brace, extra closing paren
}

Should be:

Self {
    signature: Signature::variadic_any(Volatility::Immutable),
}

2. Missing closing brace for impl block (Line 107)

The to_array_concat helper function is defined inside the impl ConcatFunc block, but the impl block is never closed before starting impl ScalarUDFImpl for ConcatFunc on line 108. You need to add a closing brace } after line 106.

3. Non-existent Methods Used (Lines 151-153)

let exprs: Vec<Expr> = args.args.into_iter().map(|a| a.into_expr()).collect();
let expr = Self::to_array_concat(exprs)?;
return expr.eval(args.context);

Problems:

  • ColumnarValue does not have an into_expr() method in the DataFusion codebase
  • Expr does not have an eval() method that takes a context parameter
  • This approach fundamentally won't work - you cannot convert runtime ColumnarValue back to Expr and evaluate it

4. Incorrect Use of BuiltinScalarFunction (Lines 87, 92-94)

let array_concat = BuiltinScalarFunction::ArrayConcat;
// ...
Ok(Expr::ScalarFunction(ScalarFunction::new(
    BuiltinScalarFunction::MakeArray,
    vec\![arg],
)))

Problems:

  • BuiltinScalarFunction enum no longer exists in modern DataFusion (removed in v27+)
  • Array functions like array_concat are now implemented as UDFs (User Defined Functions), not built-in scalar functions
  • You should be calling the actual array_concat UDF from datafusion-functions-nested crate

🟡 Design & Architecture Concerns

5. Fundamental Architectural Issue

The approach of converting ColumnarValueExpr → evaluate is not possible in DataFusion's architecture:

  • During execution (invoke_with_args), you have runtime values (ColumnarValue)
  • You cannot convert these back to expressions (Expr) which are planning-time constructs
  • You need to call the array_concat function's implementation directly, not build and evaluate an expression

Suggested approach:

// Get the array_concat UDF
let array_concat_udf = datafusion_functions_nested::concat::array_concat();

// Call it directly with the args
return array_concat_udf.invoke_with_args(args);

6. Signature Change Too Permissive (Line 71)

Changing from:

Signature::variadic(vec\![Utf8View, Utf8, LargeUtf8], Volatility::Immutable)

to:

Signature::variadic_any(Volatility::Immutable)

This allows any types, which could cause issues. Consider using a signature that explicitly allows both string types AND list types.

7. Missing Type Coercion Logic

When mixing scalar values with arrays (e.g., concat('a', [1, 2])), how should type coercion work? The current implementation doesn't handle this properly.


🟠 Other Issues

8. Unused Imports (Lines 18-20, 30-31, 33)

use arrow::array::{Array, as_largestring_array, ArrayRef};
use arrow::datatypes::{DataType, Field, TimeUnit};  // TimeUnit unused
use arrow_array::types::*;  // Wildcard import, nothing used
use datafusion_common::{Result, ScalarValue, internal_err, plan_err, exec_err};  // exec_err unused
use datafusion_expr::expr::{ScalarFunction, ScalarFunctionExpr};  // ScalarFunctionExpr unused
use datafusion_expr::{ColumnarValue, Documentation, Expr, Volatility, lit, BuiltinScalarFunction};  // BuiltinScalarFunction doesn't exist

9. Return Type Logic May Be Incorrect (Lines 124-130)

When any argument is a List, you return the first List type found. But what if there are multiple List types with different element types? This could cause type mismatches.

10. No Tests Added

This is a significant behavioral change, but I don't see any new tests for:

  • Array concatenation: concat([1,2], [3,4])
  • Mixed scalar and array: concat('hello', ['world'])
  • Edge cases: empty arrays, null arrays, etc.

📋 Recommendations

  1. Fix compilation errors first (issues 18481: feat: Add selectivity metric to NestedLoopJoinExec for EXPLAIN ANALYZE #1 and 18457: Support Arrow IPC Stream Files #2)
  2. Rework the implementation approach:
    • Instead of trying to convert to expressions, directly invoke the array_concat UDF
    • Look at how other functions delegate to different implementations based on types
    • Consider whether concat should support arrays at all, or if users should use array_concat directly
  3. Add comprehensive tests covering all new functionality
  4. Consider backward compatibility: Does this change break existing behavior?
  5. Update documentation to explain array concatenation behavior

📚 References

  • Array concat implementation: datafusion/functions-nested/src/concat.rs
  • Example of UDF invocation patterns in DataFusion
  • Consider looking at how other dual-purpose functions handle different types

Please let me know if you need help with any of these issues. The concept of supporting array concatenation is good, but the implementation needs significant rework to align with DataFusion's architecture.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
datafusion/functions/src/string/concat.rs (1)

67-74: Syntax error: Mismatched parenthesis.

Line 72 contains a stray ) that doesn't match any opening parenthesis. This will cause a compilation error.

     pub fn new() -> Self {
         use DataType::*;
         Self {
-            signature: Signature::variadic_any(Volatility::Immutable)
-            ),
+            signature: Signature::variadic_any(Volatility::Immutable),
         }
     }
🧹 Nitpick comments (1)
datafusion/functions/src/string/concat.rs (1)

18-35: Unused imports detected.

Several imports appear to be unused in this file:

  • TimeUnit (line 19) - not referenced anywhere
  • exec_err (line 30) - not referenced anywhere
  • ScalarFunctionExpr (line 31) - not referenced anywhere

Consider removing these to keep the imports clean.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d704ed3 and 1f46e7a.

📒 Files selected for processing (1)
  • datafusion/functions/src/string/concat.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-06T13:37:46.028Z
Learnt from: martin-augment
Repo: martin-augment/datafusion PR: 5
File: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:997-1005
Timestamp: 2025-11-06T13:37:46.028Z
Learning: In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs, when implementing ScalarFunctionWrapper::replacement, always bounds-check placeholder_position against defaults.len() before indexing into the defaults array to prevent panics when SQL UDF bodies reference placeholders like $N where N exceeds the declared parameter count.

Applied to files:

  • datafusion/functions/src/string/concat.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: claude-review
  • GitHub Check: Analyze (rust)

Comment on lines +76 to +83
// Check if any argument is an array type
fn has_array_args(args: &[ColumnarValue]) -> bool {
use arrow::datatypes::DataType::*;
args.iter().any(|arg| match arg {
ColumnarValue::Array(arr) => matches!(arr.data_type(), List(_)),
ColumnarValue::Scalar(scalar) => matches!(scalar, ScalarValue::List(_, _)),
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the ScalarValue enum definition
rg -n "enum ScalarValue" --type rust -A 50 | head -100

Repository: martin-augment/datafusion

Length of output: 3784


🏁 Script executed:

#!/bin/bash
# Search for ScalarValue::List pattern usage in the codebase
rg "ScalarValue::List\(" --type rust -B2 -A2 | head -50

Repository: martin-augment/datafusion

Length of output: 3735


🏁 Script executed:

#!/bin/bash
# Look specifically in arrow crate or where ScalarValue is defined
fd -e rs | xargs rg -l "enum ScalarValue" | head -5

Repository: martin-augment/datafusion

Length of output: 104


🏁 Script executed:

#!/bin/bash
# Get the full ScalarValue enum definition including List variant
sed -n '324,450p' datafusion/common/src/scalar/mod.rs

Repository: martin-augment/datafusion

Length of output: 5239


Fix incorrect pattern matching for ScalarValue::List.

The pattern ScalarValue::List(_, _) on line 81 is incorrect. The List variant takes a single Arc<ListArray> parameter, not two. Change it to ScalarValue::List(_).

🤖 Prompt for AI Agents
In datafusion/functions/src/string/concat.rs around lines 76 to 83, the pattern
match for ScalarValue is incorrect: it uses ScalarValue::List(_, _) but the List
variant takes a single argument; change the match arm to ScalarValue::List(_) so
it matches the single Arc<ListArray> parameter (leave the Array case as-is).

Comment on lines +85 to +106
// Convert arguments to array_concat function
fn to_array_concat(args: Vec<Expr>) -> Result<Expr> {
let array_concat = BuiltinScalarFunction::ArrayConcat;
let args = args.into_iter()
.map(|arg| {
// If the argument is not already an array, wrap it in an array
if !matches!(arg.get_type(&DataType::Null).unwrap_or_default(), DataType::List(_)) {
Ok(Expr::ScalarFunction(ScalarFunction::new(
BuiltinScalarFunction::MakeArray,
vec![arg],
)))
} else {
Ok(arg)
}
})
.collect::<Result<Vec<_>>>()?;

Ok(Expr::ScalarFunction(ScalarFunction::new(
array_concat,
args,
)))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the actual file and understand the context
cd datafusion/functions/src/string
wc -l concat.rs

Repository: martin-augment/datafusion

Length of output: 82


🏁 Script executed:

# Read the specific lines mentioned in the review
sed -n '85,106p' concat.rs

Repository: martin-augment/datafusion

Length of output: 121


🏁 Script executed:

# Check if BuiltinScalarFunction exists and how it's used
rg -n "BuiltinScalarFunction" --type rust -C 2 datafusion/ | head -150

Repository: martin-augment/datafusion

Length of output: 1597


🏁 Script executed:

# Check Expr::get_type signature
rg -n "fn get_type" --type rust -A 3 datafusion/ | head -100

Repository: martin-augment/datafusion

Length of output: 2945


🏁 Script executed:

# Check ScalarFunction::new signature and usage
rg -n "ScalarFunction::new" --type rust -C 2 datafusion/ | head -100

Repository: martin-augment/datafusion

Length of output: 8058


🏁 Script executed:

# Check how array_concat is currently implemented
rg -n "array_concat" --type rust -C 3 datafusion/ | head -80

Repository: martin-augment/datafusion

Length of output: 6547


🏁 Script executed:

# Check for MakeArray implementation
rg -n "MakeArray" --type rust -C 3 datafusion/ | head -100

Repository: martin-augment/datafusion

Length of output: 2933


🏁 Script executed:

# Check the ScalarFunction::new signature
rg -n "fn new\(" --type rust datafusion/expr/src/expr/ -A 2 | grep -A 2 "ScalarFunction"

Repository: martin-augment/datafusion

Length of output: 134


🏁 Script executed:

# Verify if ScalarFunction::new still exists with BuiltinScalarFunction
rg -n "impl ScalarFunction" --type rust datafusion/expr/src/expr/ -A 10

Repository: martin-augment/datafusion

Length of output: 134


🏁 Script executed:

# Check what the correct approach should be - look at array_concat and make_array helper functions
rg -n "pub fn array_concat" --type rust datafusion/functions-nested/src/ -A 5

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

# Check make_array helper function
rg -n "pub fn make_array" --type rust datafusion/functions-nested/src/ -A 5

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

# Find ScalarFunction definition
rg -n "struct ScalarFunction" --type rust -A 5 datafusion/

Repository: martin-augment/datafusion

Length of output: 1965


🏁 Script executed:

# Find the public helper functions in functions-nested
find datafusion/functions-nested/src -name "*.rs" -type f | head -20

Repository: martin-augment/datafusion

Length of output: 939


🏁 Script executed:

# Check expr_fn module for public helpers
rg -n "pub fn array_concat\|pub fn make_array" --type rust datafusion/functions-nested/

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

# Check what's in expr_fn.rs
cat datafusion/functions-nested/src/expr_fn.rs | head -100

Repository: martin-augment/datafusion

Length of output: 143


🏁 Script executed:

# Find ScalarFunction impl and its methods
rg -n "impl ScalarFunction" --type rust -A 30 datafusion/expr/src/expr.rs

Repository: martin-augment/datafusion

Length of output: 1287


🏁 Script executed:

# Check functions-nested lib.rs for public exports
cat datafusion/functions-nested/src/lib.rs | head -150

Repository: martin-augment/datafusion

Length of output: 5356


🏁 Script executed:

# Check the macros to understand make_udf_expr_and_func
rg -n "make_udf_expr_and_func" --type rust -A 20 datafusion/functions-nested/src/macros.rs | head -80

Repository: martin-augment/datafusion

Length of output: 1936


🏁 Script executed:

# Double-check: search for any usage of ScalarFunction::new with non-new_udf pattern
rg -n "ScalarFunction::new\(" --type rust datafusion/expr/src/expr.rs

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

# Verify the old BuiltinScalarFunction is still in codebase but unused in new patterns
rg -n "enum BuiltinScalarFunction" --type rust datafusion/ -A 5

Repository: martin-augment/datafusion

Length of output: 51


This code will not compile—multiple critical API mismatches.

  1. Line 91: arg.get_type(&DataType::Null) has wrong signature. Expr::get_type() requires a &dyn ExprSchema parameter, not a &DataType.

  2. Lines 92-95, 102-105: ScalarFunction::new(BuiltinScalarFunction, args) does not exist. The only public constructor is ScalarFunction::new_udf(udf: Arc<ScalarUDF>, args).

  3. Lines 87, 93: BuiltinScalarFunction::ArrayConcat and BuiltinScalarFunction::MakeArray are not the correct types to use. Both functions are implemented as proper ScalarUDF instances in datafusion-functions-nested.

Fix: Use the public helper functions from datafusion_functions_nested::expr_fn:

  • Replace the entire to_array_concat implementation with calls to array_concat() and make_array() instead of manually constructing expressions with BuiltinScalarFunction and incorrect API calls.
🤖 Prompt for AI Agents
In datafusion/functions/src/string/concat.rs around lines 85 to 106, the current
to_array_concat implementation uses incorrect APIs (Expr::get_type signature,
ScalarFunction::new constructor, and BuiltinScalarFunction variants) and won't
compile; replace the entire function body to use the public helpers from
datafusion_functions_nested::expr_fn: call make_array(expr) to wrap non-list
arguments and then call array_concat(vec_of_exprs) to build the concatenation
expression (import the two helper functions), removing any direct use of
BuiltinScalarFunction or ScalarFunction::new and any incorrect get_type calls.

Comment on lines +106 to 108
}

impl ScalarUDFImpl for ConcatFunc {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing closing brace for impl ConcatFunc block.

The impl ConcatFunc block starting at line 67 is not closed before the impl ScalarUDFImpl block begins at line 108. A closing } is needed after the to_array_concat function.

         Ok(Expr::ScalarFunction(ScalarFunction::new(
             array_concat,
             args,
         )))
     }
+}
 
 impl ScalarUDFImpl for ConcatFunc {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
}
impl ScalarUDFImpl for ConcatFunc {
}
}
impl ScalarUDFImpl for ConcatFunc {
🤖 Prompt for AI Agents
In datafusion/functions/src/string/concat.rs around lines 106 to 108, the impl
ConcatFunc block opened earlier (around line 67) is missing its closing brace;
add a single closing curly brace `}` immediately after the to_array_concat
function to terminate the impl ConcatFunc block before the subsequent impl
ScalarUDFImpl begins so the file compiles and the impl blocks are properly
separated.

Comment on lines 121 to 143
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
use DataType::*;

// If any argument is an array, return the array type
if arg_types.iter().any(|t| matches!(t, List(_))) {
// Find the first array type to use as the base
if let Some(DataType::List(field)) = arg_types.iter().find(|t| matches!(t, List(_))) {
return Ok(DataType::List(field.clone()));
}
}

// Otherwise, use the existing string type logic
let mut dt = &Utf8;
arg_types.iter().for_each(|data_type| {
for data_type in arg_types {
if data_type == &Utf8View {
dt = data_type;
}
if data_type == &LargeUtf8 && dt != &Utf8View {
dt = data_type;
}
});

Ok(dt.to_owned())
}
Ok(dt.clone())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the concat.rs file more closely to understand the context
cd datafusion/functions/src/string && wc -l concat.rs

Repository: martin-augment/datafusion

Length of output: 82


🏁 Script executed:

# Check the full concat function implementation to understand the context
cd datafusion/functions/src/string && head -160 concat.rs | tail -60

Repository: martin-augment/datafusion

Length of output: 2014


🏁 Script executed:

# Search for tests related to array concatenation
find . -name "*.rs" -type f -exec grep -l "concat.*List\|array.*concat" {} \; | head -20

Repository: martin-augment/datafusion

Length of output: 726


🏁 Script executed:

# Look for type coercion logic in DataFusion
rg "coerce.*type|type.*coerce" --type rs -A 3 -B 3 | head -100

Repository: martin-augment/datafusion

Length of output: 95


🌐 Web query:

DataFusion concat array type coercion behavior

💡 Result:

Short answer: array_concat (and the array/list concat UDFs) require the input arrays to have element types that can be coerced to a single common element type; DataFusion will pick a common promoted/compatible Arrow DataType and (implicitly) cast elements as needed using DataFusion/Arrow cast rules. If types are not coercible the planner will error. Note also the SQL concat operator (string concat) is different — when given list/array arguments it may stringify them instead of performing array concatenation; use array_concat / list_concat for real array merge. [1][2][3][4]

Sources:

  • array/list function docs (array_concat signature and description). [1]
  • type-coercion / function-signature notes (arrays must be coercible to same inner type). [2]
  • bug/discussion showing concat vs array_concat behavior (arrays being stringified). [3]
  • Arrow/DataFusion cast behavior (how casts handle Utf8⇄numeric/boolean and list casts). [4]

🏁 Script executed:

# Check the proper array_concat implementation in functions-nested
cat datafusion/functions-nested/src/concat.rs | head -200

Repository: martin-augment/datafusion

Length of output: 5941


🏁 Script executed:

# Look for type coercion logic in the planner
rg "coerce_types_for_function|type_coerce" --type rs -A 5 | head -150

Repository: martin-augment/datafusion

Length of output: 95


🏁 Script executed:

# Search for tests related to array concatenation
rg "array_concat|concat.*List" --type rs tests/ | head -30

Repository: martin-augment/datafusion

Length of output: 95


🏁 Script executed:

# Check if there are integration tests for concat with arrays
find . -name "*.rs" -path "*/tests/*" -exec grep -l "array_concat\|concat.*\[" {} \; | head -10

Repository: martin-augment/datafusion

Length of output: 363


🏁 Script executed:

# Get the array_concat implementation in functions-nested
sed -n '200,400p' datafusion/functions-nested/src/concat.rs

Repository: martin-augment/datafusion

Length of output: 5908


🏁 Script executed:

# Search for ArrayConcat struct and return_type implementation
rg "struct ArrayConcat|impl.*ArrayConcat" --type rs -A 30 datafusion/functions-nested/src/concat.rs | head -80

Repository: martin-augment/datafusion

Length of output: 95


🏁 Script executed:

# Get the return_type method from array_concat
rg "fn return_type" --type rs -B 5 -A 20 datafusion/functions-nested/src/concat.rs

Repository: martin-augment/datafusion

Length of output: 95


🏁 Script executed:

# Look at how type coercion is done for arrays
rg "ListCoercion|coerce_types_for_function" --type rs -B 3 -A 10 | head -100

Repository: martin-augment/datafusion

Length of output: 95


🏁 Script executed:

# Get the full string concat implementation
cat datafusion/functions/src/string/concat.rs | head -250

Repository: martin-augment/datafusion

Length of output: 9541


🏁 Script executed:

# Look at the to_array_concat method and how it's used
rg "to_array_concat|has_array_args" --type rs -B 5 -A 15 datafusion/functions/src/string/concat.rs

Repository: martin-augment/datafusion

Length of output: 95


🏁 Script executed:

# Check if there's type coercion happening in the planner for this
rg "concat.*coerce|string.*concat.*type" --type rs datafusion/ | head -20

Repository: martin-augment/datafusion

Length of output: 95


Type signature mismatch: return_type() doesn't match array_concat's actual type coercion behavior.

The return_type() method returns the first List field found without type unification. When delegating to array_concat() in invoke_with_args(), the actual return type may differ due to array_concat's proper type promotion (via type_union_resolution()). For example, concat(List<Int32>, List<Int64>) would declare return type List<Int32> but actually return List<Int64>, causing type signature mismatches. Use type_union_resolution() in return_type() to match array_concat's behavior, similar to the proper implementation in functions-nested.

🤖 Prompt for AI Agents
In datafusion/functions/src/string/concat.rs around lines 121 to 143, the
return_type() currently picks the first List field found which can disagree with
array_concat()'s actual promoted return type; instead collect the element
DataType of every List(_) argument, call type_union_resolution() (the same
promotion used by array_concat) on those element types to produce a unified
field, and return DataType::List(unified_field); if resolution fails propagate
the error. Keep the existing non-list string promotion logic unchanged for the
Utf8/LargeUtf8 path.

Comment on lines 147 to 157
fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
// Check if any argument is an array
if Self::has_array_args(&args.args) {
// Convert to array_concat expression and evaluate it
let exprs: Vec<Expr> = args.args.into_iter().map(|a| a.into_expr()).collect();
let expr = Self::to_array_concat(exprs)?;
return expr.eval(args.context);
}

// Original string concatenation logic
let ScalarFunctionArgs { args, .. } = args;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fundamental design issue: Cannot convert ColumnarValue to Expr for evaluation.

This approach has critical flaws:

  1. Line 151: ColumnarValue::into_expr() does not exist. ColumnarValue represents runtime columnar data (arrays/scalars), while Expr represents logical expressions. They cannot be directly converted.

  2. Line 153: Expr::eval(args.context) is not a valid API. Expressions are evaluated through the execution engine, not directly.

  3. Conceptual issue: invoke_with_args receives already-evaluated values. The correct approach is to call the array concatenation logic directly on the ColumnarValue arguments, not attempt to rebuild and re-evaluate expressions.

Consider instead:

  • Import and call the array_concat kernel/function directly on the ColumnarValue arguments
  • Or delegate to the existing ArrayConcat UDF's invoke method
#!/bin/bash
# Find how ArrayConcat is implemented for direct invocation
rg -n "impl.*ScalarUDFImpl.*for.*ArrayConcat" --type rust -A 20

# Find the array_concat kernel or function
rg -n "fn array_concat" --type rust -C5
🤖 Prompt for AI Agents
In datafusion/functions/src/string/concat.rs around lines 147 to 157, do not try
to turn ColumnarValue into Expr or call Expr::eval; instead detect array
arguments at runtime and call the array-concatenation implementation directly on
the ColumnarValue inputs (or delegate to the existing ArrayConcat UDF
implementation). Concretely: remove the into_expr() map and
to_array_concat()/eval flow, match the incoming ScalarFunctionArgs.args for
ColumnarValue::Array/Scalar variants, call the array_concat kernel or
ArrayConcat.invoke_with_args (importing the module if needed) with the same
context/args, and return its ColumnarValue result; ensure errors are propagated
and types are validated before calling the kernel.

@martin-augment
Copy link
Owner Author

1. Compilation Error - Missing closing brace (Line 72)

Self {
    signature: Signature::variadic_any(Volatility::Immutable)
    ),  // ❌ Missing opening brace, extra closing paren
}

Should be:

Self {
    signature: Signature::variadic_any(Volatility::Immutable),
}

value:useful; category:bug; feedback:The Claude AI reviewer is correct! There is one more closing brace ()) that has left from the old implementation. Prevents compilation error.

@martin-augment
Copy link
Owner Author

2. Missing closing brace for impl block (Line 107)

The to_array_concat helper function is defined inside the impl ConcatFunc block, but the impl block is never closed before starting impl ScalarUDFImpl for ConcatFunc on line 108. You need to add a closing brace } after line 106.

value:useful; category:bug; feedback:The Claude AI reviewer is correct! There is a missing closing curly bracket (}) to close the previous impl block. Prevents compilation error.

@martin-augment
Copy link
Owner Author

3. Non-existent Methods Used (Lines 151-153)

let exprs: Vec<Expr> = args.args.into_iter().map(|a| a.into_expr()).collect();
let expr = Self::to_array_concat(exprs)?;
return expr.eval(args.context);

Problems:

  • ColumnarValue does not have an into_expr() method in the DataFusion codebase
  • Expr does not have an eval() method that takes a context parameter
  • This approach fundamentally won't work - you cannot convert runtime ColumnarValue back to Expr and evaluate it

value:useful; category:bug; feedback:The Claude AI reviewer is correct! It seems this code is AI-generated and not reviewed by the PR author. The proposed changes are full with compilation errors.

@martin-augment
Copy link
Owner Author

There is no point in reviewing this PR! It is AI-generated and not reviewed/polished by the PR author.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments