[Bug]: Test cases get_from_should_success and recovery_from_log will fail sometime in CI #259

Phoenix500526 · 2023-05-06T11:49:34Z

Description about the bug

As the title described, the test cases get_from_should_success and recovery_from_log will fail sometime.

These two cases are pasted below:

    #[test]
    fn get_from_should_success() {
        let log_entry = LogEntry::new(0, 0, Arc::new(TestCommand::default()));
        let log_entry_size = serialized_size(&log_entry).unwrap();

        let (tx, _rx) = mpsc::unbounded_channel();
        let mut log =
            Log::<TestCommand>::new(tx, default_batch_max_size(), default_log_entries_cap());

        let _res = repeat(TestCommand::default())
            .take(10)
            .map(|cmd| log.push_cmd(1, Arc::new(cmd)).unwrap())
            .collect::<Vec<u64>>();

        set_batch_limit(&mut log, 3 * log_entry_size - 1);
        let bound_1 = log.get_range_by_batch(3);
        assert_eq!(
            bound_1,
            3..5,
            "batch_index = {:?}, batch = {}",
            log.batch_index,
            log.batch_limit
        );
        assert!(log.has_next_batch(8));
        assert!(!log.has_next_batch(9));
        ... // omit some test cases
    }

    #[test]
    fn recover_log_should_success() {
        let entry_size =
            serialized_size(&LogEntry::new(0, 0, Arc::new(TestCommand::default()))).unwrap();
        let entries = repeat(Arc::new(TestCommand::default()))
            .enumerate()
            .take(10)
            .map(|(idx, cmd)| LogEntry::new((idx + 1).numeric_cast(), 0, cmd))
            .collect::<Vec<LogEntry<TestCommand>>>();
        let (tx, _rx) = mpsc::unbounded_channel();
        let mut log =
            Log::<TestCommand>::new(tx, default_batch_max_size(), default_log_entries_cap());

        log.restore_entries(entries);
        assert_eq!(log.entries.len(), 10);
        assert_eq!(log.batch_index.len(), 11);
        assert_eq!(log.batch_index[0], 0);

        log.batch_index
            .iter()
            .enumerate()
            .for_each(|(idx, &size)| assert_eq!(size, entry_size * idx.numeric_cast::<u64>()));
    }

The root cause is that we calculate the serialized size for a given LogEntry, and this value may be different from the serialized size calculated in push_cmd, because the index and term of the log are variable. The bincode::serialized_size may be optimized for some specific values to obtain compact binary bytes, so the serialized size calculated with the given LogEntry (index and term are both 0) differs from the serialized size calculated in push_cmd by up to 1 byte.

Version

0.1.0 (Default)

Relevant log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

Closes xline-kv#259

Phoenix500526 · 2023-05-17T09:26:14Z

cargo test
   Compiling xline v0.1.0 (/home/jiawei/Xline/xline)
   Compiling benchmark v0.1.0 (/home/jiawei/Xline/benchmark)
    Finished test [unoptimized + debuginfo] target(s) in 24.61s
     Running unittests src/lib.rs (target/debug/deps/benchmark-449d4b19243199b7)

running 1 test
test runner::test::test_fill_usize_to_buf ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/main.rs (target/debug/deps/benchmark-702d98bb1b710748)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/curp-3becb5ba7fd5755b)

running 43 tests
test client::tests::superquorum_should_work ... ok
test server::raw_curp::log::tests::recover_log_should_success ... ok
test server::raw_curp::log::tests::try_append_entries_will_remove_inconsistencies ... ok
test server::raw_curp::log::tests::compact_test ... ok
test server::raw_curp::log::tests::test_log_up_to_date ... ok
test server::raw_curp::log::tests::try_append_entries_will_not_append ... ok
test client::tests::will_get_notify_on_leader_changes ... ok
test server::raw_curp::log::tests::get_from_should_success ... FAILED
test rpc::connect::tests::test_install_snapshot_stream ... ok
test server::raw_curp::tests::heartbeat_will_calibrate_term ... ok
test server::raw_curp::tests::quorum ... ok
test server::raw_curp::tests::handle_ae_will_reject_wrong_log ... ok
test server::raw_curp::tests::heartbeat_will_calibrate_next_index ... ok
test server::raw_curp::tests::handle_ae_will_reject_wrong_term ... ok
test server::raw_curp::tests::leader_handle_propose_will_succeed ... ok
test server::raw_curp::tests::follower_handle_propose_will_reject_conflicted ... ok
test server::raw_curp::tests::handle_vote_will_reject_smaller_term ... ok
test server::raw_curp::tests::handle_ae_will_calibrate_term ... ok
test server::raw_curp::tests::candidate_will_become_leader_after_election_succeeds ... ok
test server::raw_curp::tests::handle_ae_will_set_leader_id ... ok
test server::raw_curp::tests::follower_handle_propose_will_succeed ... ok
test server::raw_curp::tests::vote_will_calibrate_candidate_term ... ok
test server::raw_curp::tests::handle_vote_will_reject_outdated_candidate ... ok
test server::raw_curp::tests::handle_vote_will_calibrate_term ... ok
test server::raw_curp::tests::leader_handle_propose_will_reject_duplicated ... ok
test server::raw_curp::tests::recover_from_spec_pools_will_pick_the_correct_cmds ... ok
test server::raw_curp::tests::leader_handle_propose_will_reject_conflicted ... ok
test server::cmd_worker::tests::slow_path_normal ... ok
test server::cmd_worker::tests::fast_path_normal ... ok
test server::cmd_worker::tests::reset_will_wipe_all_states_and_outdated_cmds ... ok
test server::cmd_worker::tests::test_snapshot ... ok
test server::cmd_worker::tests::slow_path_exe_fails ... ok
test server::cmd_worker::tests::conflict_cmd_order ... ok
test server::cmd_worker::tests::fast_path_cond1 ... ok
test server::gc::tests::spec_gc_will_not_panic ... ok
test server::gc::tests::spec_gc_test ... ok
test server::gc::tests::cmd_board_gc_test ... ok
test server::cmd_worker::tests::fast_path_cond2 ... ok
test server::curp_node::tests::sync_task_will_send_hb ... ok
test server::storage::rocksdb::tests::create_and_recover ... ok
test server::raw_curp::tests::follower_or_candidate_will_start_election_if_timeout ... ok
test server::curp_node::tests::tick_task_will_bcast_votes ... ok
test server::raw_curp::tests::follower_will_not_start_election_when_heartbeats_are_received ... ok

failures:

---- server::raw_curp::log::tests::get_from_should_success stdout ----
thread 'server::raw_curp::log::tests::get_from_should_success' panicked at 'assertion failed: `(left == right)`
  left: `3..6`,
 right: `3..7`: batch_index = [0, 67, 135, 203, 271, 339, 407, 475, 543, 611, 679], batch = 268, log_entry_size = 67', curp/src/server/raw_curp/log.rs:452:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    server::raw_curp::log::tests::get_from_should_success

test result: FAILED. 42 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 7.54s

error: test failed, to rerun pass `-p curp --lib`

Closes: xline-kv#259 Signed-off-by: bsbds <69835502+bsbds@users.noreply.github.com>

bsbds · 2023-06-29T04:39:48Z

The test may fail sometime because the serialized size if TestCommand is not fixed.

The root cause is the TestCommand rely on a global variable NEXT_ID, in the generation of ProposeId.

Xline/curp-test-utils/src/test_cmd.rs

Lines 29 to 33 in 304e17c

    
           static NEXT_ID: Lazy<AtomicU64> = Lazy::new(|| AtomicU64::new(1)); 
        
           fn next_id() -> u64 { 
        
               NEXT_ID.fetch_add(1, Ordering::SeqCst) 
        
           }

The ProposeId call the to_string method may generate different sized struct, for instance: "9" is one byte and "10" is two bytes. But the tests assume that all sizes of the generated command are equal.

Xline/curp-test-utils/src/test_cmd.rs

Line 58 in 304e17c

id: ProposeId::new(next_id().to_string()),

Xline/curp/src/server/raw_curp/log.rs

Line 436 in 304e17c

let log_entry_size = log.batch_index[1];

And because cargo test will run the tests in any sequence in a single process, different tests may increment the NEXT_ID, which result in the non-deterministic failure.

Closes: #259 Signed-off-by: bsbds <69835502+bsbds@users.noreply.github.com>

Phoenix500526 added the bug Something isn't working label May 6, 2023

Phoenix500526 self-assigned this May 6, 2023

Phoenix500526 added a commit to Phoenix500526/Xline that referenced this issue May 6, 2023

fix: fix the issue 259

c4493a0

Closes xline-kv#259

Phoenix500526 mentioned this issue May 6, 2023

Refactor/dependencies #246

Closed

Phoenix500526 linked a pull request May 6, 2023 that will close this issue

Refactor/dependencies #246

Closed

Phoenix500526 added a commit to Phoenix500526/Xline that referenced this issue May 6, 2023

fix: fix the issue 259

5448904

Closes xline-kv#259

Phoenix500526 mentioned this issue May 6, 2023

Refactor/dependencies #260

Merged

Phoenix500526 added a commit to Phoenix500526/Xline that referenced this issue May 7, 2023

fix: fix the issue 259

cd6d226

Closes xline-kv#259

Phoenix500526 added a commit to Phoenix500526/Xline that referenced this issue May 11, 2023

fix: fix the issue 259

393d006

Closes xline-kv#259

Phoenix500526 added a commit to Phoenix500526/Xline that referenced this issue May 11, 2023

fix: fix the issue 259

79b9ff7

Closes xline-kv#259

mergify bot closed this as completed in 799c3a0 May 11, 2023

Phoenix500526 reopened this May 17, 2023

bsbds added a commit to bsbds/Xline that referenced this issue Jun 29, 2023

fix: resolve failing tests related to serialized size

999d377

Closes: xline-kv#259 Signed-off-by: bsbds <69835502+bsbds@users.noreply.github.com>

bsbds added a commit to bsbds/Xline that referenced this issue Jun 29, 2023

fix: resolve failing tests related to serialized size

3403565

Closes: xline-kv#259 Signed-off-by: bsbds <69835502+bsbds@users.noreply.github.com>

bsbds added a commit to bsbds/Xline that referenced this issue Jun 29, 2023

fix: resolve failing tests related to serialized size

15d744d

Closes: xline-kv#259 Signed-off-by: bsbds <69835502+bsbds@users.noreply.github.com>

bsbds added a commit to bsbds/Xline that referenced this issue Jun 29, 2023

fix: resolve failing tests related to serialized size

063c170

Closes: xline-kv#259 Signed-off-by: bsbds <69835502+bsbds@users.noreply.github.com>

bsbds added a commit to bsbds/Xline that referenced this issue Jun 29, 2023

fix: resolve failing tests related to serialized size

c1b3e32

Closes: xline-kv#259 Signed-off-by: bsbds <69835502+bsbds@users.noreply.github.com>

bsbds mentioned this issue Jun 29, 2023

Fix/Resolve failing tests related to serialized size #350

Merged

mergify bot closed this as completed in #350 Jun 29, 2023

mergify bot pushed a commit that referenced this issue Jun 29, 2023

fix: resolve failing tests related to serialized size

90cf3c5

Closes: #259 Signed-off-by: bsbds <69835502+bsbds@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Test cases get_from_should_success and recovery_from_log will fail sometime in CI #259

[Bug]: Test cases get_from_should_success and recovery_from_log will fail sometime in CI #259

Phoenix500526 commented May 6, 2023

Phoenix500526 commented May 17, 2023

bsbds commented Jun 29, 2023 •

edited

Loading

[Bug]: Test cases get_from_should_success and recovery_from_log will fail sometime in CI #259

[Bug]: Test cases get_from_should_success and recovery_from_log will fail sometime in CI #259

Comments

Phoenix500526 commented May 6, 2023

Description about the bug

Version

Relevant log output

Code of Conduct

Phoenix500526 commented May 17, 2023

bsbds commented Jun 29, 2023 • edited Loading

bsbds commented Jun 29, 2023 •

edited

Loading