ARROW-10943: WIP: HACK! Test bool miri bugs#8948
Closed
GregBowyer wants to merge 2 commits intoapache:masterfrom
Closed
ARROW-10943: WIP: HACK! Test bool miri bugs#8948GregBowyer wants to merge 2 commits intoapache:masterfrom
GregBowyer wants to merge 2 commits intoapache:masterfrom
Conversation
Running the boolean encoding test with Miri has the following warning:
```
greg@gregslaptop ~/.../rust/parquet $ MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test 'encodings::encoding::tests::test_bool'
Compiling parquet v3.0.0-SNAPSHOT (/home/greg/projects/arrow/rust/parquet)
Finished test [unoptimized + debuginfo] target(s) in 1.94s
Running /home/greg/projects/arrow/rust/target/x86_64-unknown-linux-gnu/debug/deps/parquet-9bca74cf6a2c2a95
running 1 test
test encodings::encoding::tests::test_bool ... error: Undefined Behavior: accessing memory with alignment 2, but alignment 4 is required
--> parquet/src/util/bit_packing.rs:84:13
|
84 | *out = ((*in_buf) >> 1) & 1;
| ^^^^^^^^^ accessing memory with alignment 2, but alignment 4 is required
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: inside `util::bit_packing::unpack1_32` at parquet/src/util/bit_packing.rs:84:13
note: inside `util::bit_packing::unpack32` at parquet/src/util/bit_packing.rs:36:14
--> parquet/src/util/bit_packing.rs:36:14
|
36 | 1 => unpack1_32(in_ptr, out_ptr),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `util::bit_util::BitReader::get_batch::<bool>` at parquet/src/util/bit_util.rs:554:30
--> parquet/src/util/bit_util.rs:554:30
|
554 | in_ptr = unpack32(in_ptr, out_ptr, num_bits);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `<bool as data_type::private::ParquetValueType>::decode` at parquet/src/data_type.rs:684:31
--> parquet/src/data_type.rs:684:31
|
684 | let values_read = bit_reader.get_batch(&mut buffer[..num_values], 1);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `<encodings::decoding::PlainDecoder<data_type::BoolType> as encodings::decoding::Decoder<data_type::BoolType>>::get` at parquet/src/encodings/decoding.rs:197:9
--> parquet/src/encodings/decoding.rs:197:9
|
197 | T::T::decode(buffer, &mut self.inner)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `encodings::encoding::tests::put_and_get::<data_type::BoolType>` at parquet/src/encodings/encoding.rs:1271:9
--> parquet/src/encodings/encoding.rs:1271:9
|
1271 | decoder.get(output)
| ^^^^^^^^^^^^^^^^^^^
note: inside `<data_type::BoolType as encodings::encoding::tests::EncodingTester<data_type::BoolType>>::test_internal` at parquet/src/encodings/encoding.rs:1214:28
--> parquet/src/encodings/encoding.rs:1214:28
|
1214 | actual_total = put_and_get(
| ____________________________^
1215 | | &mut encoder,
1216 | | &mut decoder,
1217 | | &values[..],
1218 | | &mut result_data[..],
1219 | | )?;
| |_____________^
note: inside `<data_type::BoolType as encodings::encoding::tests::EncodingTester<data_type::BoolType>>::test` at parquet/src/encodings/encoding.rs:1159:24
--> parquet/src/encodings/encoding.rs:1159:24
|
1159 | enc => Self::test_internal(enc, total, type_length),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `encodings::encoding::tests::test_bool` at parquet/src/encodings/encoding.rs:967:9
--> parquet/src/encodings/encoding.rs:967:9
|
967 | BoolType::test(Encoding::PLAIN, TEST_SET_SIZE, -1);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at parquet/src/encodings/encoding.rs:966:5
--> parquet/src/encodings/encoding.rs:966:5
|
966 | / fn test_bool() {
967 | | BoolType::test(Encoding::PLAIN, TEST_SET_SIZE, -1);
968 | | BoolType::test(Encoding::PLAIN_DICTIONARY, TEST_SET_SIZE, -1);
969 | | BoolType::test(Encoding::RLE, TEST_SET_SIZE, -1);
970 | | }
| |_____^
= note: inside <[closure@parquet/src/encodings/encoding.rs:966:5: 970:6] as std::ops::FnOnce<()>>::call_once - shim` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `test::__rust_begin_short_backtrace::<fn()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:516:5
= note: inside closure at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:507:30
= note: inside `<[closure@test::run_test::{closure#2}] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1328:9
= note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:322:9
= note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379:40
= note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343:19
= note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:396:14
= note: inside `test::run_test_in_process` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:538:18
= note: inside closure at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:449:39
= note: inside `test::run_test::run_test_inner` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:474:13
= note: inside `test::run_test` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:504:28
= note: inside `test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:283:13
= note: inside `test::run_tests_console` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/console.rs:289:5
= note: inside `test::test_main` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:121:15
= note: inside `test::test_main_static` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:140:5
= note: inside `main`
= note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
= note: inside closure at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:66:18
= note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
= note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379:40
= note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343:19
= note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:396:14
= note: inside `std::rt::lang_start_internal` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:51:25
= note: inside `std::rt::lang_start::<()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:65:5
= note: this error originates in an attribute macro (in Nightly builds, run with -Z macro-backtrace for more info)
error: aborting due to previous error
error: test failed, to rerun pass '--lib'`
```
... This might be related to ARROW-10943, stating that the pointer read
is unaligned throughout seems to resolve the miri error
Miri complains that murmur2 is performing unaligned reads (and without telling the compiler about this, its technically UB even on x64). wyhash has emerged as a simpler hash with a similar performance and without needing unaligned reads.
Contributor
|
Thanks @GregBowyer - this is great. I plan to review this / do some more careful testing later this week or the weekend |
Contributor
|
I found that by running the parquet tests in a loop I can reproduce the failure on my machine locally more details in JIRA. Unfortunately, when I tried running the same experiment on this branch the test still fails intermittently |
Contributor
Author
|
This appears not to be the issue, we might want miri at some point but not his. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
@alamb @andygrove
No idea if this is the root cause, but I figured that if its reading an incorrect number as a boolean its a messy transmute or similar.
As such I ran miri over the specific failing test and got two hits, one on the hash function and one on the bit packing.
The first commit states that the bit packing is unaligned which I think is ok to take (although it might be worth revisiting the bit-packing code).
The second replaces the hash function which is stated as being unaligned. I dont know if wyhash is the perfect choice (xxHash, t1ha or others might be equally good choices) but it does not suffer from irritating miri.
Thoughts? Ideas?