Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round trips through ScalarValue's sometimes don't preserve types (e.g. change types from DictionaryArray) #2874

Closed
alamb opened this issue Jul 11, 2022 · 1 comment · Fixed by #2891
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Jul 11, 2022

Describe the bug

Various parts of the DataFusion codebase assume that the transformation between ScalarValue <--> Array have the same datatype. This would seem to be a reasonable assumption, however it does not hold for at least for DictionaryArrays

For example, a ScalarVaule that is converted to an array, casted to a DictionaryArray<_> due to coertion rules, and then converted back to a ScalarVaule. When that supposedly cast ScalarValue is converted back to an Array, it does not maintain its Dictionary encoding, instead it results in a DataType::Utf8

To Reproduce

fn bad_cast() {
    // here is a problem with round trip casting to/from a dictionary
    // array. It is desired to cast this ScalarValue to a Dictionary
    // (for coertion, for example)
    let scalar = ScalarValue::Utf8(Some("foo".to_string()));

    let desired_type = DataType::Dictionary(
        // key type
        Box::new(DataType::Int32),
        // value type
        Box::new(DataType::Utf8)
    );

    // convert from scalar --> Array to call cast
    let scalar_array = scalar.to_array();
    // cast the actual value
    let cast_array = kernels::cast::cast(&scalar_array, &desired_type).unwrap();
    // turn it back to a scalar
    let cast_scalar = ScalarValue::try_from_array(&cast_array, 0).unwrap();

    // Some time later the "cast" scalar is turned back into an array:
    let array = cast_scalar.to_array_of_size(10);

    // The datatype should be "Dictionary" but is actually Utf8!!!
    assert_eq!(array.data_type(), &desired_type)
}

Running this function results in

  left: `Utf8`,
 right: `Dictionary(Int32, Utf8)`', src/main.rs:80:5

Expected behavior
Test case should pass

Additional context
I am not sure if it makes sense to add a ScalarValue::Dictionary type variant, or perhaps add a is_dictionary flag or something else, or maybe even just not assume a ScalarValue can be round tripped and maintain its data type

This is the root cause of #2873 -- I added a patch for that particular case but this problem can occur elsewhere

@alamb alamb added the bug Something isn't working label Jul 11, 2022
@alamb alamb changed the title Round trips through ScalarValue's change types from DictionaryArray Round trips through ScalarValue's sometimes don't preserve types (e.g. change types from DictionaryArray) Jul 11, 2022
@alamb
Copy link
Contributor Author

alamb commented Jul 11, 2022

This is not dissimilar to the issue @comphead is working on in #2840

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant