Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet with wide nested schema panic when loaded #3892

Closed
andrei-ionescu opened this issue Jul 4, 2022 · 2 comments
Closed

Parquet with wide nested schema panic when loaded #3892

andrei-ionescu opened this issue Jul 4, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@andrei-ionescu
Copy link
Contributor

What language are you using?

Rust

Which feature gates did you use?

"polars-io", "parquet", "lazy", "dtype-struct"

Have you tried latest version of polars?

  • [yes]

What version of polars are you using?

0.22.8

What operating system are you using polars on?

macOS Monterey 12.3.1

What language version are you using

$ rustc --version
rustc 1.64.0-nightly (495b21669 2022-07-03)

$ cargo --version
cargo 1.64.0-nightly (dbff32b27 2022-06-24)

Describe your bug.

Tried to load a wide nested schema parquet file and it is panicking.

What are the steps to reproduce the behavior?

Given the attached parquet file: nested_dataset_1row_fewcols.snappy.parquet.zip

The following code:

let df = LazyFrame::scan_parquet(
        "nested_dataset_1row_fewcols.snappy.parquet".into(),
        ScanArgsParquet::default(),
    )?
    .select([all()])
    .collect()?;

does panic and exits.

What is the actual behavior?

It does panic with:

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: OutOfSpec("The validity length of a StructArray must match 
its number of elements")', /.../.cargo/git/checkouts/arrow2-8a2ad61d97265680/f5f6b7e/src/array/struct_/mod.rs:118:52
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: OutOfSpec("The validity length of a StructArray must match 
its number of elements")', /.../.cargo/git/checkouts/arrow2-8a2ad61d97265680/f5f6b7e/src/array/struct_/mod.rs:118:52
stack backtrace:
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: OutOfSpec("The validity length of a StructArray must match 
its number of elements")', /.../.cargo/git/checkouts/arrow2-8a2ad61d97265680/f5f6b7e/src/array/struct_/mod.rs:118:52
   0: rust_begin_unwind
             at /rustc/495b216696ccbc27c73d6bdc486bf4621d610f4b/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/495b216696ccbc27c73d6bdc486bf4621d610f4b/library/core/src/panicking.rs:142:14
   2: core::result::unwrap_failed
             at /rustc/495b216696ccbc27c73d6bdc486bf4621d610f4b/library/core/src/result.rs:1805:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/495b216696ccbc27c73d6bdc486bf4621d610f4b/library/core/src/result.rs:1098:23
   4: arrow2::array::struct_::StructArray::new
             at /Users/aionescu/.cargo/git/checkouts/arrow2-8a2ad61d97265680/f5f6b7e/src/array/struct_/mod.rs:118:9
   5: arrow2::array::struct_::StructArray::from_data
             at /Users/aionescu/.cargo/git/checkouts/arrow2-8a2ad61d97265680/f5f6b7e/src/array/struct_/mod.rs:127:9
   6: <arrow2::io::parquet::read::deserialize::struct_::StructIterator as core::iter::traits::iterator::Iterator>::next
             at /Users/aionescu/.cargo/git/checkouts/arrow2-8a2ad61d97265680/f5f6b7e/src/io/parquet/read/deserialize/struct_.rs:50:22
   7: <alloc::boxed::Box<I,A> as core::iter::traits::iterator::Iterator>::next
             at /rustc/495b216696ccbc27c73d6bdc486bf4621d610f4b/library/alloc/src/boxed.rs:1868:9
   8: <arrow2::io::parquet::read::deserialize::struct_::StructIterator as core::iter::traits::iterator::Iterator>::next::{{closure}}
             at /Users/aionescu/.cargo/git/checkouts/arrow2-8a2ad61d97265680/f5f6b7e/src/io/parquet/read/deserialize/struct_.rs:26:25
   9: core::iter::adapters::map::map_fold::{{closure}}
             at /rustc/495b216696ccbc27c73d6bdc486bf4621d610f4b/library/core/src/iter/adapters/map.rs:84:28
  10: core::iter::traits::iterator::Iterator::fold
             at /rustc/495b216696ccbc27c73d6bdc486bf4621d610f4b/library/core/src/iter/traits/iterator.rs:2414:21
...

What is the expected behavior?

Properly load the parquet file.

@andrei-ionescu andrei-ionescu added the bug Something isn't working label Jul 4, 2022
@jorgecarleitao
Copy link
Collaborator

Same as #2448 and filled upstream at jorgecarleitao/arrow2#937 . Sorry for that - I am working on a fix in arrow2 for this.

@ritchie46
Copy link
Member

fixed by #3926

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants