Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsound MutableArrayData Constructor #5091

Closed
Veeupup opened this issue Nov 17, 2023 · 5 comments · Fixed by #5092
Closed

Unsound MutableArrayData Constructor #5091

Veeupup opened this issue Nov 17, 2023 · 5 comments · Fixed by #5092
Labels
arrow Changes to the arrow crate bug

Comments

@Veeupup
Copy link
Contributor

Veeupup commented Nov 17, 2023

Describe the bug

I want to use MutableArrayData to construct an array, but if the input array sequence is different, the output will be different.

This is because MutableArrayData::new_with_capacities will use the first array's data type.

https://github.com/apache/arrow-rs/blob/master/arrow-data/src/transform/mod.rs#L350-L356

To Reproduce

Such as

use arrow::array::{ArrayRef, Int64Array, MutableArrayData, NullArray, Capacities};
use std::sync::Arc;

fn main() {
    let x = Arc::new(Int64Array::from(vec![1, 2, 3])) as ArrayRef;
    let x_data = x.to_data();
    let y = Arc::new(NullArray::new(3)) as ArrayRef;
    let y_data = y.to_data();

    let arr1 = vec![&x_data, &y_data];
    let mut m1 = MutableArrayData::new(arr1, true, 1000);
    m1.extend(0, 0, 3);
    let ret = Int64Array::from(m1.freeze()); // works just fine
    
    let arr2 = vec![&y_data, &x_data];
    let mut m2 = MutableArrayData::new(arr2, true, 100);
    m2.extend(1, 0, 3);
    let ret = Int64Array::from(m2.freeze()); // This will panic because ArrayData data type is null 
}

Expected behavior

maybe we need a method to specify the ArrayData DataType, so whatever sequence of arrays we put in, we can get the excepted result.

maybe we can have a method like

pub fn with_capacities(
        arrays: Vec<&'a ArrayData>,
        use_nulls: bool,
        capacities: Capacities,
        data_type: DataType
    ) -> Self {

@tustvold @alamb how do you think?

Additional context

@Veeupup Veeupup added the bug label Nov 17, 2023
@tustvold
Copy link
Contributor

tustvold commented Nov 17, 2023

I think it is probably a bug that MutableArrayData lets you extend with a mixture of types, this is not what it is intended to do and is highly unlikely to behave correctly... I was sure it checked this

Edit: it would appear this does is not only incorrect but is unsound, will prioritize fixing this

@tustvold tustvold changed the title MutableArrayData may new with specified DataType Unsound MutableArrayData Constructor Nov 17, 2023
@Veeupup
Copy link
Contributor Author

Veeupup commented Nov 17, 2023

Thanks for your answer!

I'm refactoring array_array in arrow-datafusion, and I want to simplify macro by using MutableArrayData. So I use it like

https://github.com/apache/arrow-datafusion/pull/8252/files#diff-48cc9cf1bfdb0214a9f625b384d1c4fd5967a9da61e8f22a5dc1c4c5800563b4R395-R403

it can handle make_array(1, NULL, 2) but when it comes make_array(NULL, 1), this will be failed to construct. Because in the later case, this first array is NullArray

@tustvold
Copy link
Contributor

it can handle make_array(1, NULL, 2) but when it comes make_array(NULL, 1), this will be failed to construct. Because in the later case, this first array is NullArray

It is only handling this by accident and incorrectly. You need to coerce the types passed to make_array to a consistent type signature

@Veeupup
Copy link
Contributor Author

Veeupup commented Nov 17, 2023

thanks again! I have fixed it. @tustvold

@tustvold
Copy link
Contributor

tustvold commented Jan 5, 2024

label_issue.py automatically added labels {'arrow'} from #5092

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants