Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Deeply nested struct type panics on reading parquet statistics #1239

Closed
ritchie46 opened this issue Sep 1, 2022 · 0 comments · Fixed by #1240
Closed

Deeply nested struct type panics on reading parquet statistics #1239

ritchie46 opened this issue Sep 1, 2022 · 0 comments · Fixed by #1240
Labels
bug Something isn't working no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog

Comments

@ritchie46
Copy link
Collaborator

This is a continuation of pola-rs/polars#3942. The reading of the file is successful, but the statistics architecture lags behind.

MWE

use std::fs::File;
use polars::export::arrow;
use arrow::error::Error;
use arrow::io::parquet::read;

fn main() -> Result<(), Error> {
    let file_path = "/home/ritchie46/Downloads/part-00003-a422a23f-e65a-4cab-9bd0-6e877a8f7337-c000.snappy.parquet";
    let mut reader = File::open(file_path)?;

    let metadata = read::read_metadata(&mut reader)?;
    let schema = read::infer_schema(&metadata)?;
    let statistics = read::statistics::deserialize(&schema.fields[9], &metadata.row_groups)?;
    Ok(())
}

Backtrace

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: OutOfSpec("The children must have an equal number of values.\n                         However, the values at index 1 have a length of 1, which is different from values at index 0, 2.")', /home/ritchie46/.cargo/git/checkouts/arrow2-8a2ad61d97265680/0b345ae/src/array/struct_/mod.rs:120:52
stack backtrace:
   0: rust_begin_unwind
             at /rustc/93ffde6f04d3d24327a4e17a2a2bf4f63c246235/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/93ffde6f04d3d24327a4e17a2a2bf4f63c246235/library/core/src/panicking.rs:142:14
   2: core::result::unwrap_failed
             at /rustc/93ffde6f04d3d24327a4e17a2a2bf4f63c246235/library/core/src/result.rs:1814:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/93ffde6f04d3d24327a4e17a2a2bf4f63c246235/library/core/src/result.rs:1107:23
   4: arrow2::array::struct_::StructArray::new
             at /home/ritchie46/.cargo/git/checkouts/arrow2-8a2ad61d97265680/0b345ae/src/array/struct_/mod.rs:120:9
   5: <arrow2::io::parquet::read::statistics::struct_::DynMutableStructArray as arrow2::array::MutableArray>::as_box
             at /home/ritchie46/.cargo/git/checkouts/arrow2-8a2ad61d97265680/0b345ae/src/io/parquet/read/statistics/struct_.rs:43:18
   6: <arrow2::io::parquet::read::statistics::list::DynMutableListArray as arrow2::array::MutableArray>::as_box
             at /home/ritchie46/.cargo/git/checkouts/arrow2-8a2ad61d97265680/0b345ae/src/io/parquet/read/statistics/list.rs:39:21
   7: <arrow2::io::parquet::read::statistics::Statistics as core::convert::From<arrow2::io::parquet::read::statistics::MutableStatistics>>::from
             at /home/ritchie46/.cargo/git/checkouts/arrow2-8a2ad61d97265680/0b345ae/src/io/parquet/read/statistics/mod.rs:80:13
   8: <T as core::convert::Into<U>>::into
             at /rustc/93ffde6f04d3d24327a4e17a2a2bf4f63c246235/library/core/src/convert/mod.rs:550:9
   9: arrow2::io::parquet::read::statistics::deserialize
             at /home/ritchie46/.cargo/git/checkouts/arrow2-8a2ad61d97265680/0b345ae/src/io/parquet/read/statistics/mod.rs:554:8
  10: memcheck::main
             at ./src/main.rs:12:22
  11: core::ops::function::FnOnce::call_once
             at /rustc/93ffde6f04d3d24327a4e17a2a2bf4f63c246235/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

The file is available from the link mentioned here; pola-rs/polars#3942 (comment)

@jorgecarleitao jorgecarleitao added bug Something isn't working no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog labels Sep 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants