Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in the buddy_allocator #653

Closed
dignifiedquire opened this issue Aug 2, 2023 · 9 comments
Closed

Panic in the buddy_allocator #653

dignifiedquire opened this issue Aug 2, 2023 · 9 comments

Comments

@dignifiedquire
Copy link

dignifiedquire commented Aug 2, 2023

Hey, currently working on integrating of redb into our code, and got this error on CI.

The environment is cross with target aarch64-linux-android.

thread 'sync::tests::test_replica_sync_fs' panicked at 
'assertion failed: !self.get_order_allocated(order).get(page_number)', 
/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/redb-1.0.5/src/tree_store/page_store/buddy_allocator.rs:450:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
@cberner
Copy link
Owner

cberner commented Aug 2, 2023

How can I reproduce the panic?

@dignifiedquire
Copy link
Author

You can find the full CI run here https://github.com/n0-computer/iroh/actions/runs/5738430197/job/15552011144?pr=1315

The command being executed is

cross test --all --target aarch64-linux-android -- --test-threads=12

on this branch: https://github.com/n0-computer/iroh/tree/sync-db.

I am still working on reproducing it locally myself.

@dignifiedquire
Copy link
Author

On that branch, it reproduces locally with the above command, minimizing the execution to the relevant test

> RUST_BACKTRACE=full cross test -p iroh-sync test_replica_sync_fs --target aarch64-linux-android
    Finished test [unoptimized + debuginfo] target(s) in 0.25s
     Running unittests src/lib.rs (/target/aarch64-linux-android/debug/deps/iroh_sync-f89f428cffa29e08)

running 1 test
test sync::tests::test_replica_sync_fs ... FAILED

failures:

---- sync::tests::test_replica_sync_fs stdout ----
thread 'sync::tests::test_replica_sync_fs' panicked at 'assertion failed: !self.get_order_allocated(order).get(page_number)', /cargo/registry/src/gith
ub.com-1ecc6299db9ec823/redb-1.0.5/src/tree_store/page_store/buddy_allocator.rs:450:13
stack backtrace:
   0:       0x55003edaf8 - <unknown>
   1:       0x550040b868 - <unknown>
   2:       0x55003ea4ac - <unknown>
   3:       0x55003ed8fc - <unknown>
   4:       0x55003ef420 - <unknown>
   5:       0x55003ef094 - <unknown>
   6:       0x55001809d8 - <unknown>
   7:       0x55003efafc - <unknown>
   8:       0x55003ef848 - <unknown>
   9:       0x55003ee00c - <unknown>
  10:       0x55003ef5d8 - <unknown>
  11:       0x5500037d2c - <unknown>
  ... 

I got this, but unfortunately as you can see the backtrace is unusable

@cberner
Copy link
Owner

cberner commented Aug 2, 2023

@dignifiedquire surprisingly it looks like this is a compiler bug, or maybe an issue with the cross tool you're using (I have no idea how it works). Try with this branch https://github.com/cberner/redb/tree/iroh and you'll see a different panic, which definitely should not be possible since it comes from this single line change which asserts that a zero initialized Vec is in fact all zeros: 42444df

@dignifiedquire
Copy link
Author

Looks like this does change the panic:

RUST_BACKTRACE=full cross test -p iroh-sync test_replica_sync_fs --target aarch64-linux-android
   Compiling redb v1.0.5 (https://github.com/cberner/redb?branch=iroh#42444df6)
   Compiling iroh-sync v0.1.0 (/project/iroh-sync)
    Finished test [unoptimized + debuginfo] target(s) in 5.87s
     Running unittests src/lib.rs (/target/aarch64-linux-android/debug/deps/iroh_sync-2600d34f0c7ac5a1)

running 1 test
test sync::tests::test_replica_sync_fs ... FAILED

failures:

---- sync::tests::test_replica_sync_fs stdout ----
thread 'sync::tests::test_replica_sync_fs' panicked at 'assertion failed: data.iter().all(|x| *x == 0)', /cargo/git/checkouts/redb-24e44532b0b35edd/42444df/src/tree_store/page_store/bitmap.rs:297:9
stack backtrace:
   0:       0x55003eca50 - <unknown>
   1:       0x550040a7c0 - <unknown>
   2:       0x55003e9404 - <unknown>
   3:       0x55003ec854 - <unknown>
   4:       0x55003ee378 - <unknown>
   5:       0x55003edfec - <unknown>
   6:       0x55001808c8 - <unknown>
   7:       0x55003eea54 - <unknown>
   8:       0x55003ee7a0 - <unknown>
   9:       0x55003ecf64 - <unknown>
  10:       0x55003ee530 - <unknown>
  11:       0x5500037c1c - <unknown>
  12:       0x5500037cb0 - <unknown>
  13:       0x55002a0740 - <unknown>
  14:       0x550027ab34 - <unknown>
  15:       0x550026e2c8 - <unknown>
  16:       0x550025f924 - <unknown>
  17:       0x550026024c - <unknown>
  18:       0x55002a20c4 - <unknown>
  19:       0x55002797d4 - <unknown>
  20:       0x550003948c - <unknown>
  21:       0x5500039584 - <unknown>
  22:       0x5500087a84 - <unknown>
  23:       0x55000df81c - <unknown>
  24:       0x55000f82d8 - <unknown>
  25:       0x550012e290 - <unknown>
  26:       0x5500184568 - <unknown>
  27:       0x550018367c - <unknown>
  28:       0x550015a448 - <unknown>
  29:       0x550015f638 - <unknown>
  30:       0x55003f22a0 - <unknown>
  31:       0x550208193c - <unknown>
  32:       0x550202347c - <unknown>
  33:                0x0 - <unknown>


failures:
    sync::tests::test_replica_sync_fs

@dignifiedquire
Copy link
Author

dignifiedquire commented Aug 3, 2023

Some additional debugging with printing out the size of the created vec and its content when it fails. It looks like it works most of the time, until it doesn't..

new_empty: 2
new_empty: 1
new_empty: 1
new_empty: 1
new_empty: 1
new_empty: 1                                                                                                                                                                          
new_empty: 1
new_empty: 1
new_empty: 16384
new_empty: 8192
new_empty: 4096
new_empty: 2048
new_empty: 1024
new_empty: 512
new_empty: 256
new_empty: 128
new_empty: 64
new_empty: 32
new_empty: 16
new_empty: 8
new_empty: 4
new_empty: 2
new_empty: 1
new_empty: 1
new_empty: 1
new_empty: 1
new_empty: 1
new_empty: 1
new_empty: 1
new_empty: 16384
new_empty: 8192
new_empty: 4096
new_empty: 2048
new_empty: 1024
new_empty: 512
new_empty: 256
new_empty: 128
new_empty: 64
new_empty: 32
new_empty: 16
new_empty: 8
new_empty: 4
new_empty: 2
new_empty: 1
new_empty: 1
new_empty: 1
new_empty: 1
new_empty: 1
new_empty: 1
new_empty: 1
new_empty: 16384
thread 'sync::tests::test_replica_sync_fs' panicked at '[18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 184467440737095
51615, 18446744073709551615, 18446744073709551615, 18446744073709551615,
<all the same value>

@dignifiedquire
Copy link
Author

It seems more related to cross and rustc than to the architecture, got the same error now on armv7 as well: https://github.com/n0-computer/iroh/actions/runs/5738430197/job/15591730496

@dignifiedquire
Copy link
Author

Reproduced the issue on redbs tests:

> cross test --target aarch64-linux-android
failures:

---- db::test::small_pages stdout ----
thread 'db::test::small_pages' panicked at '16384: [0]', src/tree_store/page_store/bitmap.rs:298:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- db::test::small_pages2 stdout ----
thread 'db::test::small_pages2' panicked at '16384: [0]', src/tree_store/page_store/bitmap.rs:298:9

---- db::test::small_pages3 stdout ----
thread 'db::test::small_pages3' panicked at '16384: [18446744073709551615]', src/tree_store/page_store/bitmap.rs:298:9

---- db::test::small_pages4 stdout ----
thread 'db::test::small_pages4' panicked at '16384: [27584547717644288]', src/tree_store/page_store/bitmap.rs:298:9

---- transactions::test::transaction_id_persistence stdout ----
thread 'transactions::test::transaction_id_persistence' panicked at '16384: [18446744073709551615]', src/tree_store/page_store/bitmap.rs:298:9

---- tree_store::page_store::buddy_allocator::test::serialized_size stdout ----
thread 'tree_store::page_store::buddy_allocator::test::serialized_size' panicked at '16384: [19855633815385]', src/tree_store/page_store/bitmap.rs:298:9

---- tree_store::page_store::header::test::repair_allocator_checksums stdout ----
thread 'tree_store::page_store::header::test::repair_allocator_checksums' panicked at '16384: [29836347531329536]', src/tree_store/page_store/bitmap.rs:298:9

---- tree_store::page_store::header::test::repair_empty stdout ----
thread 'tree_store::page_store::header::test::repair_empty' panicked at '16384: [18446744069414592765]', src/tree_store/page_store/bitmap.rs:298:9

---- tree_store::page_store::header::test::repair_insert_reserve_regression stdout ----
thread 'tree_store::page_store::header::test::repair_insert_reserve_regression' panicked at '16384: [18446744073709551615]', src/tree_store/page_store/bitmap.rs:298:9


failures:
    db::test::small_pages
    db::test::small_pages2
    db::test::small_pages3
    db::test::small_pages4
    transactions::test::transaction_id_persistence
    tree_store::page_store::buddy_allocator::test::serialized_size
    tree_store::page_store::header::test::repair_allocator_checksums
    tree_store::page_store::header::test::repair_empty
    tree_store::page_store::header::test::repair_insert_reserve_regression

But it gets more interesting, if I replace the init code in bitmap.rs with this

pub fn new_empty(len: u32, capacity: u32) -> Self {
        let cap = Self::required_words(capacity);
        // let data = vec![0; cap];
        let data: Vec<u64> = (0..cap).map(|_| 0).collect();
        assert!(data.iter().all(|x| *x == 0), "{}: {:?}", cap, &data[..1]);
        Self { len, data }
}

I get the following failures only

> cross test --target aarch64-linux-android

running 30 tests
test db::test::crash_regression3 ... ok
test db::test::crash_regression4 ... ok
test db::test::dynamic_shrink ... ok
test db::test::small_pages ... ok
test db::test::small_pages2 ... ok
test db::test::small_pages3 ... ok
test db::test::small_pages4 ... ok
test transactions::test::transaction_id_persistence ... ok
test tree_store::page_store::base::test::last_page ... ok
test tree_store::page_store::bitmap::test::all_space_used ... ok
test tree_store::page_store::bitmap::test::alloc ... ok
test tree_store::page_store::bitmap::test::find_free ... ok
test tree_store::page_store::bitmap::test::free ... ok
test tree_store::page_store::bitmap::test::iter ... ok
test tree_store::page_store::bitmap::test::random_pattern ... ok
test tree_store::page_store::bitmap::test::record_alloc ... ok
test tree_store::page_store::bitmap::test::reuse_lowest ... ok
test tree_store::page_store::buddy_allocator::test::alloc_large ... ok
test tree_store::page_store::buddy_allocator::test::buddy_merge ... ok
test tree_store::page_store::buddy_allocator::test::record_alloc_buddy ... ok
test tree_store::page_store::buddy_allocator::test::serialized_size ... ok
test tree_store::page_store::header::test::magic_number ... ok
test tree_store::page_store::header::test::repair_allocator_checksums ... ok
test tree_store::page_store::header::test::repair_empty ... ok
test tree_store::page_store::header::test::repair_insert_reserve_regression ... ok
test tree_store::page_store::layout::test::full_layout ... ok
test tree_store::page_store::page_manager::test::out_of_regions ... ok
test tree_store::page_store::xxh3::test::test_empty ... ok
test tree_store::table_tree::test::round_trip ... ok
test tuple_types::test::width ... ok

test result: ok. 30 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 18.37s

     Running tests/backward_compatibility.rs (/target/aarch64-linux-android/debug/deps/backward_compatibility-ec8da7951aa53207)

running 3 tests
test container_types ... ok
test mixed_width ... FAILED
test primitive_types ... FAILED

failures:

---- mixed_width stdout ----
thread 'mixed_width' panicked at 'assertion failed: !self.get_order_allocated(order).get(page_number)', /cargo/registry/src/github.com-1ecc6299db9ec823/redb-1.0.5/src/tree_store/page_store/buddy_allocator.rs:450:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- primitive_types stdout ----
thread 'primitive_types' panicked at 'assertion failed: !self.get_order_allocated(order).get(page_number)', /cargo/registry/src/github.com-1ecc6299db9ec823/redb-1.0.5/src/tree_store/page_store/buddy_allocator.rs:450:13


failures:
    mixed_width
    primitive_types

@cberner
Copy link
Owner

cberner commented Aug 6, 2023

Going to close this since it doesn't seem to be a redb bug. I'd report it to either rustc or cross, depending on which you think is broken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants