Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Reading Parquet binary column panics during deserialization 'attempt to subtract with overflow` #944

Closed
mdrach opened this issue Apr 13, 2022 · 1 comment · Fixed by #945
Labels
bug Something isn't working

Comments

@mdrach
Copy link
Contributor

mdrach commented Apr 13, 2022

Reading a large binary Parquet file causes an overflow error. You can reproduce this issue but running the unit test on this branch. The column in the dataset that causes the issue is a nullable binary column called ORDER_AD_SHOWN.

Does anyone have any thoughts as to why this issue is occurring?

Full stack trace:

running 1 test
test io::parquet::read::binary_column_with_nulls_test ... FAILED

failures:

---- io::parquet::read::binary_column_with_nulls_test stdout ----
thread 'io::parquet::read::binary_column_with_nulls_test' panicked at 'attempt to subtract with overflow', src/io/parquet/read/deserialize/binary/basic.rs:223:17
stack backtrace:
   0: rust_begin_unwind
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:143:14
   2: core::panicking::panic
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:48:5
   3: <arrow2::io::parquet::read::deserialize::binary::basic::BinaryDecoder<O> as arrow2::io::parquet::read::deserialize::utils::Decoder>::extend_from_state
             at ./src/io/parquet/read/deserialize/binary/basic.rs:223:17
   4: arrow2::io::parquet::read::deserialize::utils::extend_from_new_page
             at ./src/io/parquet/read/deserialize/utils.rs:275:5
   5: arrow2::io::parquet::read::deserialize::utils::next
             at ./src/io/parquet/read/deserialize/utils.rs:315:13
   6: <arrow2::io::parquet::read::deserialize::binary::basic::Iter<O,A,I> as core::iter::traits::iterator::Iterator>::next
             at ./src/io/parquet/read/deserialize/binary/basic.rs:301:27
   7: <arrow2::io::parquet::read::deserialize::binary::basic::Iter<O,A,I> as core::iter::traits::iterator::Iterator>::next
             at ./src/io/parquet/read/deserialize/binary/basic.rs:313:32
   8: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/adapters/map.rs:103:9
   9: <alloc::boxed::Box<I,A> as core::iter::traits::iterator::Iterator>::next
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/boxed.rs:1787:9
  10: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/adapters/map.rs:103:9
  11: <alloc::boxed::Box<I,A> as core::iter::traits::iterator::Iterator>::next
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/boxed.rs:1787:9
  12: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/adapters/map.rs:103:9
  13: <alloc::boxed::Box<I,A> as core::iter::traits::iterator::Iterator>::next
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/boxed.rs:1787:9
  14: <arrow2::io::parquet::read::row_group::RowGroupDeserializer as core::iter::traits::iterator::Iterator>::next::{{closure}}
             at ./src/io/parquet/read/row_group.rs:72:29
  15: core::iter::adapters::map::map_try_fold::{{closure}}
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/adapters/map.rs:91:28
  16: core::iter::traits::iterator::Iterator::try_fold
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/traits/iterator.rs:2109:21
  17: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/adapters/map.rs:117:9
  18: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/adapters/mod.rs:182:9
  19: core::iter::traits::iterator::Iterator::try_for_each
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/traits/iterator.rs:2170:9
  20: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::next
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/adapters/mod.rs:165:9
  21: alloc::vec::Vec<T,A>::extend_desugared
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/vec/mod.rs:2649:35
  22: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/vec/spec_extend.rs:18:9
  23: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/vec/spec_from_iter_nested.rs:43:9
  24: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/vec/spec_from_iter.rs:33:9
  25: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/vec/mod.rs:2552:9
  26: core::iter::traits::iterator::Iterator::collect
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/traits/iterator.rs:1778:9
  27: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter::{{closure}}
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/result.rs:2031:49
  28: core::iter::adapters::try_process
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/adapters/mod.rs:151:17
  29: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/result.rs:2031:9
  30: core::iter::traits::iterator::Iterator::collect
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/traits/iterator.rs:1778:9
  31: <arrow2::io::parquet::read::row_group::RowGroupDeserializer as core::iter::traits::iterator::Iterator>::next
             at ./src/io/parquet/read/row_group.rs:68:21
  32: <arrow2::io::parquet::read::file::FileReader<R> as core::iter::traits::iterator::Iterator>::next
             at ./src/io/parquet/read/file.rs:138:19
  33: <arrow2::io::parquet::read::file::FileReader<R> as core::iter::traits::iterator::Iterator>::next
             at ./src/io/parquet/read/file.rs:158:21
  34: core::iter::traits::iterator::Iterator::try_fold
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/traits/iterator.rs:2108:29
  35: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/adapters/mod.rs:182:9
  36: core::iter::traits::iterator::Iterator::try_for_each
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/traits/iterator.rs:2170:9
  37: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::next
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/adapters/mod.rs:165:9
  38: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/vec/spec_from_iter_nested.rs:26:32
  39: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/vec/spec_from_iter.rs:33:9
  40: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/vec/mod.rs:2552:9
  41: core::iter::traits::iterator::Iterator::collect
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/traits/iterator.rs:1778:9
  42: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter::{{closure}}
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/result.rs:2031:49
  43: core::iter::adapters::try_process
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/adapters/mod.rs:151:17
  44: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/result.rs:2031:9
  45: core::iter::traits::iterator::Iterator::collect
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/iter/traits/iterator.rs:1778:9
  46: it::io::parquet::read::binary_column_with_nulls_test
             at ./tests/it/io/parquet/read.rs:558:5
  47: it::io::parquet::read::binary_column_with_nulls_test::{{closure}}
             at ./tests/it/io/parquet/read.rs:552:1
  48: core::ops::function::FnOnce::call_once
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:227:5
  49: core::ops::function::FnOnce::call_once
             at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


failures:
    io::parquet::read::binary_column_with_nulls_test

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 998 filtered out; finished in 0.13s

error: test failed, to rerun pass '--test it'
@jorgecarleitao jorgecarleitao added the bug Something isn't working label Apr 13, 2022
@jorgecarleitao
Copy link
Owner

Thanks for the report! It is a bug; fixed in #945 . Sorry for that!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants