Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return error instead of assert when meeting incompatitble type #4995

Closed
ZENOTME opened this issue Oct 26, 2023 · 3 comments · Fixed by #5341
Closed

Return error instead of assert when meeting incompatitble type #4995

ZENOTME opened this issue Oct 26, 2023 · 3 comments · Fixed by #5341
Labels
enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers help wanted parquet Changes to the parquet crate

Comments

@ZENOTME
Copy link

ZENOTME commented Oct 26, 2023

assert_eq!(field.data_type(), array.data_type());

When I use the writer and pass incompatitble type, it will panic instead of return a error. Maybe it should return a error here or other place? 🧐

@tustvold
Copy link
Contributor

In general we use assertions for invariants that are not expected to be encountered in a well-formed program. I'm not sure that a well-formed program should be able to hit this as it would indicate some form of schema inconsistency, but then I'm also not opposed to making this an error.

@carols10cents
Copy link
Contributor

carols10cents commented Jan 29, 2024

I've just hit this panic as well, and it's definitely because I'm messing up the schemas somewhere, but it's hard to tell what the problem is from the panic message.

At the very least, I expected some Arrow method in the call stack of the backtrace to have documentation with a "Panics" section that details the invariant that I'm not upholding, but I don't see any (and I'm not sure exactly where it should go). I'd be happy to send in a documentation PR if given some guidance for where it should go!

Backtrace
hread \'main\' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/parquet-50.0.0/src/arrow/arrow_writer/levels.rs:135:9:
assertion `left == right` failed
  left: Int64
 right: Float64
stack backtrace:
   0: rust_begin_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
   2: core::panicking::assert_failed_inner
   3: core::panicking::assert_failed
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:279:5
   4: parquet::arrow::arrow_writer::levels::LevelInfoBuilder::try_new
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/parquet-50.0.0/src/arrow/arrow_writer/levels.rs:135:9
   5: parquet::arrow::arrow_writer::levels::calculate_array_levels
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/parquet-50.0.0/src/arrow/arrow_writer/levels.rs:54:23
   6: parquet::arrow::arrow_writer::compute_leaves
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/parquet-50.0.0/src/arrow/arrow_writer/mod.rs:355:18
   7: parquet::arrow::arrow_writer::ArrowRowGroupWriter::write
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/parquet-50.0.0/src/arrow/arrow_writer/mod.rs:534:25
   8: parquet::arrow::arrow_writer::ArrowWriter<W>::write
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/parquet-50.0.0/src/arrow/arrow_writer/mod.rs:194:9
   9: bulk_ingester::fsm::create_intermediate_files::IntermediateFileManager::add_batch
             at ./src/fsm/create_intermediate_files.rs:253:12
  10: bulk_ingester::fsm::create_intermediate_files::partition_file::{{closure}}::{{closure}}
             at ./src/fsm/create_intermediate_files.rs:185:17
  11: core::option::Option<T>::map
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/option.rs:1072:29
  12: bulk_ingester::fsm::create_intermediate_files::partition_file::{{closure}}
             at ./src/fsm/create_intermediate_files.rs:182:9
  13: bulk_ingester::fsm::create_intermediate_files::<impl bulk_ingester::fsm::Fsm<bulk_ingester::fsm::create_intermediate_files::CreateIntermediateFiles>>::create_intermediate_files::{{closure}}
             at ./src/fsm/create_intermediate_files.rs:116:18
  14: bulk_ingester::fsm::FsmDriver::try_transition::{{closure}}::{{closure}}::{{closure}}
             at ./src/fsm.rs:116:26
  15: state_machine_utils::AlwaysSome<T>::mutate_async::{{closure}}
             at /home/rust/project/state_machine_utils/src/lib.rs:55:73
  16: bulk_ingester::fsm::FsmDriver::try_transition::{{closure}}
             at ./src/fsm.rs:125:14
  17: bulk_ingester::bulk_ingest::{{closure}}
             at ./src/main.rs:273:37
  18: bulk_ingester::main::{{closure}}
             at ./src/main.rs:160:45
  19: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-0.1.40/src/instrument.rs:321:9
  20: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282:63
  21: tokio::runtime::coop::with_budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:107:5
  22: tokio::runtime::coop::budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:73:5
  23: tokio::runtime::park::CachedParkThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282:31
  24: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/blocking.rs:66:9
  25: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/mod.rs:87:13
  26: tokio::runtime::context::runtime::enter_runtime
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/runtime.rs:65:16
  27: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/mod.rs:86:9
  28: tokio::runtime::runtime::Runtime::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/runtime.rs:350:45
  29: bulk_ingester::main
             at ./src/main.rs:160:5
  30: core::ops::function::FnOnce::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Places I looked for documentation:

  • parquet::arrow::arrow_writer::levels::LevelInfoBuilder::try_new - private
  • parquet::arrow::arrow_writer::levels::calculate_array_levels - private
  • parquet::arrow::arrow_writer::compute_leaves - no "Panics" section
  • parquet::arrow::arrow_writer::ArrowRowGroupWriter::write - private
  • parquet::arrow::arrow_writer::ArrowWriter - no "Panics" section, this is where I'd guess the invariants should be documented, as this is what I'm calling

@tustvold
Copy link
Contributor

tustvold commented Jan 29, 2024

Oh I see what has happened, at some point the schema check got removed from ArrowWriter, possibly #4871, you can see it in this PR - https://github.com/apache/arrow-rs/pull/4027/files#diff-dbb6b7806a48ce2b6fb40ce4a429599808a87671274ee6b2dc7c1f87429320b3R135

As such you can now hit this panic. We should change it to be an error with a more descriptive message.

Edit: especially as compute_leaves is now public and so this can also be hit that way

@tustvold tustvold added good first issue Good for newcomers parquet Changes to the parquet crate enhancement Any new improvement worthy of a entry in the changelog help wanted labels Jan 29, 2024
tustvold pushed a commit that referenced this issue Jan 30, 2024
* fix: Return an error on type mismatch rather than panic (#4995)

* test: ArrowWriter and batch schema mismatch is an error

* docs: Clarify that ArrowWriter expects the batch's schema to match
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers help wanted parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants