Skip to content

Hard to use PrimitiveArray::unary_mut, PrimitiveArray:try_unary_mut, etc #8808

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This came up in the context of this PR in DataFusion:

In that case we are applying some operations to a PrimitiveArray and would like to reuse the allocation if possible

However, the current API of PrimitiveArray::unary_mut and similar functions makes this awkward to do as the caller must handle the case where the allocation can not be reused

// want to apply an operation to arr, reusing allocation if possible
let arr: PrimitiveArray<u64> = ...
// to do so we call try_unary but also must handle when the allocation is shared
let new_arr = match arr.unary_mut(|a| a+ 1) {
  Ok(arr) => arr,
  Err(old_arr) => old_arr.unary(|a| a+1)
}

This can be done, but it is hard to use.

I proposed the following function in DataFusion

/// Applies the unary operation in place if possible, or cloning the array if not
fn try_unary_mut_or_clone<F>(
    array: PrimitiveArray<Int64Type>,
    op: F,
) -> Result<PrimitiveArray<Int64Type>>
where
    F: Fn(i64) -> Result<i64>,
{
    match array.try_unary_mut(&op) {
        Ok(result) => result,
        // on error, make a new array
        Err(array) => array.try_unary(op),
    }
}

but quoting @findepi on https://github.com/apache/datafusion/pull/18360/files#r2475557450:

can this be made more flexible with a more generous use of generics?
perhaps it could even be in arrow-rs. it makes try_unary_mut significantly more approachable

Describe the solution you'd like

I would like it to be easier to apply unary and binary operations on PrimitiveArrays and reuse the allocation if possble

Describe alternatives you've considered

One alternative would be to follow the API of Arc::unwrap_or_clone

So that would mean functions something like

  • PrimitiveArray::unary_mut_or_clone
  • PrimitiveArray::try_unary_mut_or_clone
  • PrimitiveArray::binary_mut_or_clone
  • PrimitiveArray::try_binary_mut_or_clone

Which would be implemented like the function above

I think this would make it much easier to use these APIs

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions