Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get MIRI running against parquet crate #773

Closed
alamb opened this issue Jul 25, 2021 · 3 comments
Closed

Get MIRI running against parquet crate #773

alamb opened this issue Jul 25, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jul 25, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The MIRI tool helps crates ensure they have no undefined behavior https://github.com/rust-lang/miri

To improve the stability / security of the parquet crate, it would be cool to run MIRI against that too.

Describe the solution you'd like

Ideally, we would run same kind of MIRI checks on the parquet crate as we do now on arrow: https://github.com/apache/arrow-rs/blob/master/.github/workflows/miri.yaml#L60

This means a command something like

cargo +nightly miri  test  -p parquet

Sadly, when I run this locally the very first test fails with an alignment issue

(arrow_dev) alamb@MacBook-Pro:~/Software/arrow-rs$ cargo +nightly miri  test  -p parquet
WARNING: Ignoring `RUSTC_WRAPPER` environment variable, Miri does not support wrapping.
   Compiling arrow v6.0.0-SNAPSHOT (/Users/alamb/Software/arrow-rs/arrow)
   Compiling parquet v6.0.0-SNAPSHOT (/Users/alamb/Software/arrow-rs/parquet)
    Finished test [unoptimized + debuginfo] target(s) in 4.53s
     Running unittests (target/miri/x86_64-apple-darwin/debug/deps/parquet-7056b5943fd0017c)

running 456 tests
test arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels ... error: Undefined Behavior: accessing memory with alignment 1, but alignment 4 is required
    --> parquet/src/util/bit_packing.rs:150:12
     |
150  |     *out = (*in_buf) % (1u32 << 2);
     |            ^^^^^^^^^ accessing memory with alignment 1, but alignment 4 is required
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
             
     = note: inside `util::bit_packing::unpack2_32` at parquet/src/util/bit_packing.rs:150:12
note: inside `util::bit_packing::unpack32` at parquet/src/util/bit_packing.rs:37:14
    --> parquet/src/util/bit_packing.rs:37:14
     |
37   |         2 => unpack2_32(in_ptr, out_ptr),
     |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `util::bit_util::BitReader::get_batch::<i16>` at parquet/src/util/bit_util.rs:568:30
    --> parquet/src/util/bit_util.rs:568:30
     |
568  |                     in_ptr = unpack32(in_ptr, out_ptr, num_bits);
     |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `encodings::rle::RleDecoder::get_batch::<i16>` at parquet/src/encodings/rle.rs:415:30
    --> parquet/src/encodings/rle.rs:415:30
     |
415  |                   num_values = bit_reader.get_batch::<T>(
     |  ______________________________^
416  | |                     &mut buffer[values_read..values_read + num_values],
417  | |                     self.bit_width as usize,
418  | |                 );
     | |_________________^
note: inside `encodings::levels::LevelDecoder::get` at parquet/src/encodings/levels.rs:262:35
    --> parquet/src/encodings/levels.rs:262:35
     |
262  |                 let values_read = decoder.get_batch::<i16>(&mut buffer[0..len])?;
     |                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `column::reader::ColumnReaderImpl::<data_type::ByteArrayType>::read_def_levels` at parquet/src/column/reader.rs:459:9
    --> parquet/src/column/reader.rs:459:9
     |
459  |         level_decoder.get(buffer)
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `column::reader::ColumnReaderImpl::<data_type::ByteArrayType>::read_batch` at parquet/src/column/reader.rs:210:38
    --> parquet/src/column/reader.rs:210:38
     |
210  |                       num_def_levels = self.read_def_levels(
     |  ______________________________________^
211  | |                         &mut levels[levels_read..levels_read + iter_batch_size],
212  | |                     )?;
     | |_____________________^
note: inside `<arrow::array_reader::ComplexObjectArrayReader<data_type::ByteArrayType, arrow::converter::ArrayRefConverter<std::vec::Vec<std::option::Option<data_type::ByteArray>>, arrow::array::GenericStringArray<i32>, arrow::converter::Utf8ArrayConverter>> as arrow::array_reader::ArrayReader>::next_batch` at parquet/src/arrow/array_reader.rs:490:17
    --> parquet/src/arrow/array_reader.rs:490:17
     |
490  | /                 self.column_reader.as_mut().unwrap().read_batch(
491  | |                     num_to_read,
492  | |                     cur_def_levels_buf,
493  | |                     cur_rep_levels_buf,
494  | |                     cur_data_buf,
495  | |                 )?;
     | |_________________^
note: inside `arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels` at parquet/src/arrow/array_reader.rs:2229:21
    --> parquet/src/arrow/array_reader.rs:2229:21
     |
2229 |         let array = array_reader.next_batch(values_per_page / 2).unwrap();
     |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at parquet/src/arrow/array_reader.rs:2154:5
    --> parquet/src/arrow/array_reader.rs:2154:5
     |
2153 |       #[test]
     |       ------- in this procedural macro expansion
2154 | /     fn test_complex_array_reader_def_and_rep_levels() {
2155 | |         // Construct column schema
2156 | |         let message_type = "
2157 | |         message test_schema {
...    |
2276 | |         );
2277 | |     }
     | |_____^
     = note: inside `<[closure@parquet/src/arrow/array_reader.rs:2154:5: 2277:6] as std::ops::FnOnce<()>>::call_once - shim` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `test::__rust_begin_short_backtrace::<fn()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:578:5
     = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:569:30
     = note: inside `<[closure@test::run_test::{closure#2}] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1572:9
     = note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:347:9
     = note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
     = note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
     = note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
     = note: inside `test::run_test_in_process` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:601:18
     = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:493:39
     = note: inside `test::run_test::run_test_inner` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:531:13
     = note: inside `test::run_test` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:565:28
     = note: inside `test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:306:17
     = note: inside `test::run_tests_console` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/console.rs:290:5
     = note: inside `test::test_main` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:123:15
     = note: inside `test::test_main_static` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:142:5
     = note: inside `main`
     = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
     = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:63:18
     = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
     = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
     = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
     = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
     = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:45:48
     = note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
     = note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
     = note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
     = note: inside `std::rt::lang_start_internal` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:45:20
     = note: inside `std::rt::lang_start::<()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:62:5
     = note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)

Describe alternatives you've considered
The parquet2 crate in https://github.com/jorgecarleitao/parquet2 from @jorgecarleitao apparently already has parquet running under MIR this already running, so depending on if that gets integrated into the main arrow-rs repo, that might change how we go about getting a clean run

Additional context
Thanks to help from @roee88 , @jhorstmann and others who I probably missed, we now have MIRI checks validated for the arrow crate on each PR: https://github.com/apache/arrow-rs/runs/3154551440

Which runs a command such as:

  cargo miri test -p arrow -- --skip csv --skip ipc --skip json
@alamb alamb added the enhancement New feature or request label Jul 25, 2021
@alamb
Copy link
Contributor Author

alamb commented Jul 25, 2021

🤔 though I tried to get a clean MIRI run against the parquet2 crate

(arrow_dev) alamb@MacBook-Pro:~/Software/parquet2$ MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test -- --skip io

And I got some errors; I am probably running it incorrectly 😢

test read::levels::tests::test_get_bit_width ... ok
test read::metadata::tests::test_basics ... error: unsupported operation: can't call foreign function: strerror_r
   --> /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:122:12
    |
122 |         if strerror_r(errno as c_int, p, buf.len()) < 0 {
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't call foreign function: strerror_r
    |
    = help: this is likely not a bug in the program; it indicates that the program performed an operation that the interpreter does not support
            
    = note: inside `std::sys::unix::os::error_string` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:122:12
    = note: inside `<std::io::error::Repr as std::fmt::Debug>::fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/io/error.rs:699:36
    = note: inside `<std::io::Error as std::fmt::Debug>::fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/io/error.rs:65:9
    = note: inside `<&dyn std::fmt::Debug as std::fmt::Debug>::fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:2036:62
    = note: inside `std::fmt::write` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:1115:17
    = note: inside `<std::string::String as std::fmt::Write>::write_fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:187:9
    = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:481:22
    = note: inside `std::option::Option::<std::string::String>::get_or_insert_with::<[closure@std::panicking::begin_panic_handler::PanicPayload::fill::{closure#0}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/option.rs:1270:26
    = note: inside `std::panicking::begin_panic_handler::PanicPayload::fill` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:479:13
    = note: inside `<std::panicking::begin_panic_handler::PanicPayload as core::panic::BoxMeUp>::get` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:497:13
    = note: inside `std::panicking::rust_panic_with_hook` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:621:34
    = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:519:13
    = note: inside `std::sys_common::backtrace::__rust_end_short_backtrace::<[closure@std::panicking::begin_panic_handler::{closure#0}], !>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:141:18
    = note: inside `core::panicking::panic_fmt::panic_impl` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:515:5
    = note: inside `core::panicking::panic_fmt` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/panicking.rs:92:14
    = note: inside `std::result::unwrap_failed` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/panic.rs:24:9
    = note: inside `std::result::Result::<std::fs::File, std::io::Error>::unwrap` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:1281:23
note: inside `read::metadata::tests::test_basics` at src/read/metadata.rs:188:24

@houqp
Copy link
Member

houqp commented Jul 25, 2021

this issue should be opened in arrow-rs repo instead right?

@alamb
Copy link
Contributor Author

alamb commented Jul 25, 2021

Good catch @houqp -- I have filed it in arrow-rs here apache/arrow-rs#614 and am closing this one. Sorry for the noise

@alamb alamb closed this as completed Jul 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants