Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new_null_array creates invalid struct arrays #734

Closed
bjchambers opened this issue Aug 31, 2021 · 3 comments · Fixed by #736
Closed

new_null_array creates invalid struct arrays #734

bjchambers opened this issue Aug 31, 2021 · 3 comments · Fixed by #736
Labels

Comments

@bjchambers
Copy link
Contributor

Describe the bug

I believe (based on how struct arrays are created) that all of the field arrays should have the same length, and that length corresponds to the length of the struct array.

The method new_null_array violates this: it creates a null struct array of length N with each of the field arrays being empty. This leads to problems when accessing the fields of the struct and expecting the number of rows to be correct.

https://github.com/apache/arrow-rs/blob/master/arrow/src/array/array.rs#L444

To Reproduce

Expected behavior

Additional context

@bjchambers bjchambers added the bug label Aug 31, 2021
@bjchambers
Copy link
Contributor Author

#[test]
    fn test_null_struct() {
        let struct_type =
            DataType::Struct(vec![Field::new("data", DataType::Int64, true)]);
        let array = new_null_array(&struct_type, 9);

        let a = array.as_any().downcast_ref::<StructArray>().unwrap();
        assert_eq!(a.len(), 9);
        for i in 0..9 {
            assert!(a.is_null(i));
        }

        // assert_eq!(a.column(0).len(), 9); // fails (length is 0)
        // a.slice(0, 5); // panics (because the field arrays are empty)
    }

I see two ways of fixing this:

  1. I'm guessing the right way is to have the new_null_array create null columns of the given length for each of the fields.
  2. An alternative, could be to allow empty columns in the case of an all null-array. This would allow avoiding the allocation. But... I don't think this is supported elsewhere, and seems like an extensive change. Hence, I'm guessing the first is the right solution.

@jorgecarleitao
Copy link
Member

I agree that 1. is the way to go; the second can lead to invalid memory assesses if someone consumes this data and does not perform bound checks (afaik the spec requires all fields to have the length of the validity).

@alamb
Copy link
Contributor

alamb commented Sep 9, 2021

Closed in #736

@alamb alamb closed this as completed Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants