Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic in cmp kernels with DictionaryArrays: Option::unwrap() on a None value' #4788

Closed
alamb opened this issue Sep 7, 2023 · 3 comments · Fixed by #4789
Closed

panic in cmp kernels with DictionaryArrays: Option::unwrap() on a None value' #4788

alamb opened this issue Sep 7, 2023 · 3 comments · Fixed by #4789
Labels
arrow Changes to the arrow crate bug

Comments

@alamb
Copy link
Contributor

alamb commented Sep 7, 2023

Describe the bug
The cmp kernels do not ignore null key values of DictionaryArrays as they should

To Reproduce
Run this program:

use std::sync::Arc;

use arrow::array::{Int32Array, StringArray, BooleanBufferBuilder};
use arrow::buffer::{ScalarBuffer, NullBuffer};
use arrow::{array::DictionaryArray};
use arrow::util::pretty::pretty_format_columns;

fn main() {
    // Logically like this:
    // keys: PrimitiveArray<Int32>
    //                                                                                                          [
    //   null,
    //   1,
    //   0,
    // ]
    // values: StringArray
    // [
    //   "us-west",
    //   "us-east",
    // ]}
    let values = StringArray::from(vec![Some("us-west"), Some("us-east")]);

    let mut nulls = BooleanBufferBuilder::new(3);
    nulls.append(false); // null
    nulls.append(true);
    nulls.append(true);
    let nulls: NullBuffer = nulls.finish().into();

    // key values
    //
    // since element 0 is NULL, the index value of 100 should be
    // ignored
    let key_values = ScalarBuffer::from(vec![100i32, 1i32, 0i32]);
    let keys = Int32Array::new(key_values, Some(nulls));

    let col = DictionaryArray::try_new(keys, Arc::new(values)).unwrap();

    println!("Input col: {col:?}");

    let comparison  = arrow::compute::kernels::cmp::neq(
        &col.slice(0, col.len() - 1),
        &col.slice(1, col.len() - 1),
    )
        .expect("cmp");

    println!("comparison: {}", pretty_format_columns("neq", &[Arc::new(comparison) as _]).unwrap());
}

Results in

Input col: DictionaryArray {keys: PrimitiveArray<Int32>
[
  null,
  1,
  0,
] values: StringArray
[
  "us-west",
  "us-east",
]}

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /Users/alamb/Software/arrow-rs/arrow-array/src/array/byte_array.rs:294:38
stack backtrace:
   0: rust_begin_unwind
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14
   2: core::panicking::panic
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:117:5
   3: core::option::Option<T>::unwrap
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/option.rs:935:21
   4: arrow_array::array::byte_array::GenericByteArray<T>::value_unchecked
             at /Users/alamb/Software/arrow-rs/arrow-array/src/array/byte_array.rs:294:13
   5: <&arrow_array::array::byte_array::GenericByteArray<T> as arrow_ord::cmp::ArrayOrd>::value_unchecked
             at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:522:9
   6: arrow_ord::cmp::apply_op_vectored::{{closure}}
             at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:450:12
   7: arrow_ord::cmp::collect_bool
             at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:388:24
   8: arrow_ord::cmp::apply_op_vectored
             at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:447:5
   9: arrow_ord::cmp::apply
             at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:326:17
  10: arrow_ord::cmp::compare_op::{{closure}}
             at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:219:29
  11: arrow_ord::cmp::compare_op
             at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:286:44
  12: arrow_ord::cmp::neq
             at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:86:5
  13: rust_arrow_playground::main
             at ./src/main.rs:40:23
  14: core::ops::function::FnOnce::call_once
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Expected behavior
The code should not panic

Specifically, if you change the key values from

    let key_values = ScalarBuffer::from(vec![100i32, 1i32, 0i32]);

To

    let key_values = ScalarBuffer::from(vec![0i32, 1i32, 0i32]);

Then the reproducer completes without error:

comparison: +------+
| neq  |
+------+
|      |
| true |
+------+

Additional context

Found while updating https://github.com/influxdata/influxdb_iox/pull/8577

@alamb alamb added arrow Changes to the arrow crate bug labels Sep 7, 2023
@alamb
Copy link
Contributor Author

alamb commented Sep 7, 2023

Something else I noticed while reviewing the tests was that while the cmp kernels have special cases for more than 64 length elements I didn't find any tests for that code.

I did find these tests (which are in terms of the old interface)
https://github.com/apache/arrow-rs/blob/master/arrow-ord/src/comparison.rs#L1471

@tustvold
Copy link
Contributor

tustvold commented Sep 7, 2023

The normalized keys process is supposed to handle this, I will do some digging to see what is going wrong here

@alamb
Copy link
Contributor Author

alamb commented Sep 7, 2023

Here is a proposal for better test coverage: #4733 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants