Implement method to apply scalar or aggregate function to Array elements #15882

timsaucer · 2025-04-28T17:19:41Z

Is your feature request related to a problem or challenge?

Suppose I have an DataFrame in which one column contains arrays. I wish to be able to apply any scalar expr to each value of that array and return an array out. For example I would like to be able to apply an abs() function and convert data such as this:

DataFrame()
+--------------+-------------+
| a            | abs(a)      |
+--------------+-------------+
| [-10, 5, 13] | [10, 5, 13] |
| [2]          | [2]         |
| [-3, 1]      | [3, 1]      |
+--------------+-------------+

Additionally it would be amazing to be able to apply any aggregate function to an array element.

DataFrame()
+--------------+--------+
| a            | sum(a) |
+--------------+--------+
| [-10, 5, 13] | 8      |
| [2]          | 2      |
| [-3, 1]      | 2      |
+--------------+--------+

Describe the solution you'd like

This is similar to the spark transform operation. It is very powerful for highly structured data. I don't know the best form that that functions would take, but it would be even more powerful if we could do element-by-element operations across more than one column in the dataframe. There are many use cases where you will have columns of array elements of the same length.

Describe alternatives you've considered

The current status quo is to either write a UDF to handle these on a case by case basis or to do an unnest and group by. The unnest and group by can be an expensive operation.

Additional context

No response

The text was updated successfully, but these errors were encountered:

alamb · 2025-04-28T20:37:11Z

I bet something like this already exists in datafusion-functions-array crate -- figuring out how to make it general would be very sweet

KR-bluejay · 2025-05-08T10:08:16Z

@alamb

Following your comment about making array functions more general, I suggest we create a common directory for array operations to reduce code duplication:

datafusion/common/src/array/

We could implement each operation in a separate file:

array_sum.rs
array_abs.rs
array_min.rs
array_max.rs

This would provide a consistent interface for both element-wise operations and aggregations on arrays, making the codebase more maintainable and easier to extend with new array functions in the future.

I'd like to work on implementing this approach. What do you think about organizing array operations this way? If this approach seems reasonable, I'm happy to start working on it.

alamb · 2025-05-09T18:37:50Z

I'd like to work on implementing this approach. What do you think about organizing array operations this way? If this approach seems reasonable, I'm happy to start working on it.

It seems like a reasonable idea to me. I think we would have to see how the code looks -- I am not familiar enough with its current structure to know how big a change this is

However, I think @timsaucer is asking for something different: not a particular array function like array_sum but instead something like array_appy(array, func)

Which woudl take an array and function and apply func to each distinct sub array

timsaucer added the enhancement New feature or request label Apr 28, 2025

timsaucer mentioned this issue Apr 28, 2025

Rerun Issues rerun-io/opensource#1

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement method to apply scalar or aggregate function to Array elements #15882

Implement method to apply scalar or aggregate function to Array elements #15882

timsaucer commented Apr 28, 2025

alamb commented Apr 28, 2025

Uh oh!

KR-bluejay commented May 8, 2025

Uh oh!

alamb commented May 9, 2025

Uh oh!

Implement method to apply scalar or aggregate function to Array elements #15882

Implement method to apply scalar or aggregate function to Array elements #15882

Comments

timsaucer commented Apr 28, 2025

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

alamb commented Apr 28, 2025

Uh oh!

KR-bluejay commented May 8, 2025

Uh oh!

alamb commented May 9, 2025

Uh oh!