Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

crash in parquet read #459

Closed
aptr322 opened this issue Sep 28, 2021 · 9 comments
Closed

crash in parquet read #459

aptr322 opened this issue Sep 28, 2021 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@aptr322
Copy link
Contributor

aptr322 commented Sep 28, 2021

15: 0x5595b6619e4d - core::panicking::panic::h344f23ad26057b48
at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/core/src/panicking.rs:50:5
16: 0x5595b6e21f57 - parquet2::encoding::hybrid_rle::read_next::heffbce2cbf271355
17: 0x5595b6e22002 - parquet2::encoding::hybrid_rle::HybridRleDecoder::new::h4bed53f1cf574f45
18: 0x5595b66bc788 - arrow2::io::parquet::read::binary::basic::iter_to_array::h904f7ce8e6ca0d40
19: 0x5595b667374a - arrow2::io::parquet::read::page_iter_to_array::hf6ec1a66c97269d8
20: 0x5595b667ebc4 - <core::iter::adapters::enumerate::Enumerate as core::iter::traits::iterator::Iterator>::try_fold::hc0a8230807788873
21: 0x5595b667f320 - <arrow2::io::parquet::read::record_batch::RecordReader as core::iter::traits::iterator::Iterator>::next::hde2d9e82a688b0f8
22: 0x5595b666f160 - polars_io::finish_reader::h3777ac1c190f89b8
23: 0x5595b667dc93 - <polars_io::parquet::ParquetReader as polars_io::SerReader>::finish::hc1e08eb39eb006fc
24: 0x5595b6669511 - core::ops::function::impls::<impl core::ops::function::FnMut for &F>::call_mut::h9969795e932f825a
25: 0x5595b6681e6e - rayon::iter::plumbing::Folder::consume_iter::h92fcd41992ba09a0
26: 0x5595b6685edf - rayon::iter::plumbing::bridge_producer_consumer::helper::h8ef2ea33830b94cf
27: 0x5595b666937f - std::panicking::try::hc5f2657442e028eb
28: 0x5595b66a710d - rayon_core::registry::in_worker::ha0a928d462944c85
29: 0x5595b6685fde - rayon::iter::plumbing::bridge_producer_consumer::helper::h8ef2ea33830b94cf
30: 0x5595b666937f - std::panicking::try::hc5f2657442e028eb
31: 0x5595b66a710d - rayon_core::registry::in_worker::ha0a928d462944c85
32: 0x5595b6685fde - rayon::iter::plumbing::bridge_producer_consumer::helper::h8ef2ea33830b94cf
33: 0x5595b667d91b - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::hff5168451b9879a8
34: 0x5595b6611821 - rayon_core::registry::WorkerThread::wait_until_cold::he518e83f6505630f
35: 0x5595b6bbafbd - rayon_core::registry::ThreadBuilder::run::h5d56f209b153a9bd
36: 0x5595b6bbc8e5 - std::sys_common::backtrace::__rust_begin_short_backtrace::h69cdd24543799db2
37: 0x5595b6bb8a9b - core::ops::function::FnOnce::call_once{{vtable.shim}}::hd2e6efae7943bf64
38: 0x5595b6fd5207 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once::h6bff7798948b1075
at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/alloc/src/boxed.rs:1572:9
39: 0x5595b6fd5207 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once::hc2d25ac38f6b2342
at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/alloc/src/boxed.rs:1572:9
40: 0x5595b6fd5207 - std::sys::unix::thread::Thread::new::thread_start::hbba5bc368baac205
at /rustc/c8dfcfe046a7680554bf4eb612bad840e7631c4b/library/std/src/sys/unix/thread.rs:74:17
41: 0x7f48289bb609 - start_thread

@aptr322
Copy link
Contributor Author

aptr322 commented Sep 28, 2021

Application crashes on some parquet files. I use polars, from stacktrace it makes sense to discuss it here, seems.
I'm about to minimize my app and dataset so it would be reproducable

@aptr322
Copy link
Contributor Author

aptr322 commented Sep 28, 2021

pyspark and pandas read that parquet file wo noticable problems

@ritchie46
Copy link
Collaborator

A minimal working example would be really valuable here. Have you got a code snippet and the parquet file?

@aptr322
Copy link
Contributor Author

aptr322 commented Sep 28, 2021

`
use polars::prelude::*;
use std::error::Error;
use std::fs::File;

fn read_parquet(fname: &str) -> std::result::Result<DataFrame, PolarsError> {
let file = File::open(fname)?;
ParquetReader::new(file).finish()
}

fn main() {
let filename = std::env::args().nth(1).expect("no filename given");

let df = read_parquet(&filename).unwrap();
println!("{:?}", df);

}
`

@aptr322
Copy link
Contributor Author

aptr322 commented Sep 28, 2021

@jorgecarleitao
Copy link
Owner

Thanks! I am taking a look

@jorgecarleitao jorgecarleitao self-assigned this Sep 29, 2021
@jorgecarleitao jorgecarleitao added the bug Something isn't working label Sep 29, 2021
@jorgecarleitao
Copy link
Owner

Found root cause and fixed it upstream: jorgecarleitao/parquet2#53 . Will try to release a patch soon so that we can benefit from it here and polars.

@jorgecarleitao
Copy link
Owner

I am closing this, as its root was in parquet2. Thanks a lot for the report, that led to the fix!

@jorgecarleitao
Copy link
Owner

For reference, to check, I used

cargo run --features io_parquet,io_parquet_compression --example parquet_read -- part-00000.parquet 11 2

(column 11, row group 2)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants