Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Test cases get_from_should_success and recovery_from_log will fail sometime in CI #259

Closed
1 task done
Phoenix500526 opened this issue May 6, 2023 · 2 comments · Fixed by #350
Closed
1 task done
Assignees
Labels
bug Something isn't working

Comments

@Phoenix500526
Copy link
Collaborator

Description about the bug

As the title described, the test cases get_from_should_success and recovery_from_log will fail sometime.

These two cases are pasted below:

    #[test]
    fn get_from_should_success() {
        let log_entry = LogEntry::new(0, 0, Arc::new(TestCommand::default()));
        let log_entry_size = serialized_size(&log_entry).unwrap();

        let (tx, _rx) = mpsc::unbounded_channel();
        let mut log =
            Log::<TestCommand>::new(tx, default_batch_max_size(), default_log_entries_cap());

        let _res = repeat(TestCommand::default())
            .take(10)
            .map(|cmd| log.push_cmd(1, Arc::new(cmd)).unwrap())
            .collect::<Vec<u64>>();

        set_batch_limit(&mut log, 3 * log_entry_size - 1);
        let bound_1 = log.get_range_by_batch(3);
        assert_eq!(
            bound_1,
            3..5,
            "batch_index = {:?}, batch = {}",
            log.batch_index,
            log.batch_limit
        );
        assert!(log.has_next_batch(8));
        assert!(!log.has_next_batch(9));
        ... // omit some test cases
    }

    #[test]
    fn recover_log_should_success() {
        let entry_size =
            serialized_size(&LogEntry::new(0, 0, Arc::new(TestCommand::default()))).unwrap();
        let entries = repeat(Arc::new(TestCommand::default()))
            .enumerate()
            .take(10)
            .map(|(idx, cmd)| LogEntry::new((idx + 1).numeric_cast(), 0, cmd))
            .collect::<Vec<LogEntry<TestCommand>>>();
        let (tx, _rx) = mpsc::unbounded_channel();
        let mut log =
            Log::<TestCommand>::new(tx, default_batch_max_size(), default_log_entries_cap());

        log.restore_entries(entries);
        assert_eq!(log.entries.len(), 10);
        assert_eq!(log.batch_index.len(), 11);
        assert_eq!(log.batch_index[0], 0);

        log.batch_index
            .iter()
            .enumerate()
            .for_each(|(idx, &size)| assert_eq!(size, entry_size * idx.numeric_cast::<u64>()));
    }

The root cause is that we calculate the serialized size for a given LogEntry, and this value may be different from the serialized size calculated in push_cmd, because the index and term of the log are variable. The bincode::serialized_size may be optimized for some specific values to obtain compact binary bytes, so the serialized size calculated with the given LogEntry (index and term are both 0) differs from the serialized size calculated in push_cmd by up to 1 byte.

Version

0.1.0 (Default)

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Phoenix500526 Phoenix500526 added the bug Something isn't working label May 6, 2023
@Phoenix500526 Phoenix500526 self-assigned this May 6, 2023
Phoenix500526 added a commit to Phoenix500526/Xline that referenced this issue May 6, 2023
@Phoenix500526 Phoenix500526 linked a pull request May 6, 2023 that will close this issue
Phoenix500526 added a commit to Phoenix500526/Xline that referenced this issue May 6, 2023
Phoenix500526 added a commit to Phoenix500526/Xline that referenced this issue May 7, 2023
Phoenix500526 added a commit to Phoenix500526/Xline that referenced this issue May 11, 2023
Phoenix500526 added a commit to Phoenix500526/Xline that referenced this issue May 11, 2023
@mergify mergify bot closed this as completed in 799c3a0 May 11, 2023
@Phoenix500526
Copy link
Collaborator Author

cargo test
   Compiling xline v0.1.0 (/home/jiawei/Xline/xline)
   Compiling benchmark v0.1.0 (/home/jiawei/Xline/benchmark)
    Finished test [unoptimized + debuginfo] target(s) in 24.61s
     Running unittests src/lib.rs (target/debug/deps/benchmark-449d4b19243199b7)

running 1 test
test runner::test::test_fill_usize_to_buf ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/main.rs (target/debug/deps/benchmark-702d98bb1b710748)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (target/debug/deps/curp-3becb5ba7fd5755b)

running 43 tests
test client::tests::superquorum_should_work ... ok
test server::raw_curp::log::tests::recover_log_should_success ... ok
test server::raw_curp::log::tests::try_append_entries_will_remove_inconsistencies ... ok
test server::raw_curp::log::tests::compact_test ... ok
test server::raw_curp::log::tests::test_log_up_to_date ... ok
test server::raw_curp::log::tests::try_append_entries_will_not_append ... ok
test client::tests::will_get_notify_on_leader_changes ... ok
test server::raw_curp::log::tests::get_from_should_success ... FAILED
test rpc::connect::tests::test_install_snapshot_stream ... ok
test server::raw_curp::tests::heartbeat_will_calibrate_term ... ok
test server::raw_curp::tests::quorum ... ok
test server::raw_curp::tests::handle_ae_will_reject_wrong_log ... ok
test server::raw_curp::tests::heartbeat_will_calibrate_next_index ... ok
test server::raw_curp::tests::handle_ae_will_reject_wrong_term ... ok
test server::raw_curp::tests::leader_handle_propose_will_succeed ... ok
test server::raw_curp::tests::follower_handle_propose_will_reject_conflicted ... ok
test server::raw_curp::tests::handle_vote_will_reject_smaller_term ... ok
test server::raw_curp::tests::handle_ae_will_calibrate_term ... ok
test server::raw_curp::tests::candidate_will_become_leader_after_election_succeeds ... ok
test server::raw_curp::tests::handle_ae_will_set_leader_id ... ok
test server::raw_curp::tests::follower_handle_propose_will_succeed ... ok
test server::raw_curp::tests::vote_will_calibrate_candidate_term ... ok
test server::raw_curp::tests::handle_vote_will_reject_outdated_candidate ... ok
test server::raw_curp::tests::handle_vote_will_calibrate_term ... ok
test server::raw_curp::tests::leader_handle_propose_will_reject_duplicated ... ok
test server::raw_curp::tests::recover_from_spec_pools_will_pick_the_correct_cmds ... ok
test server::raw_curp::tests::leader_handle_propose_will_reject_conflicted ... ok
test server::cmd_worker::tests::slow_path_normal ... ok
test server::cmd_worker::tests::fast_path_normal ... ok
test server::cmd_worker::tests::reset_will_wipe_all_states_and_outdated_cmds ... ok
test server::cmd_worker::tests::test_snapshot ... ok
test server::cmd_worker::tests::slow_path_exe_fails ... ok
test server::cmd_worker::tests::conflict_cmd_order ... ok
test server::cmd_worker::tests::fast_path_cond1 ... ok
test server::gc::tests::spec_gc_will_not_panic ... ok
test server::gc::tests::spec_gc_test ... ok
test server::gc::tests::cmd_board_gc_test ... ok
test server::cmd_worker::tests::fast_path_cond2 ... ok
test server::curp_node::tests::sync_task_will_send_hb ... ok
test server::storage::rocksdb::tests::create_and_recover ... ok
test server::raw_curp::tests::follower_or_candidate_will_start_election_if_timeout ... ok
test server::curp_node::tests::tick_task_will_bcast_votes ... ok
test server::raw_curp::tests::follower_will_not_start_election_when_heartbeats_are_received ... ok

failures:

---- server::raw_curp::log::tests::get_from_should_success stdout ----
thread 'server::raw_curp::log::tests::get_from_should_success' panicked at 'assertion failed: `(left == right)`
  left: `3..6`,
 right: `3..7`: batch_index = [0, 67, 135, 203, 271, 339, 407, 475, 543, 611, 679], batch = 268, log_entry_size = 67', curp/src/server/raw_curp/log.rs:452:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    server::raw_curp::log::tests::get_from_should_success

test result: FAILED. 42 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 7.54s

error: test failed, to rerun pass `-p curp --lib`

@Phoenix500526 Phoenix500526 reopened this May 17, 2023
bsbds added a commit to bsbds/Xline that referenced this issue Jun 29, 2023
Closes: xline-kv#259

Signed-off-by: bsbds <69835502+bsbds@users.noreply.github.com>
bsbds added a commit to bsbds/Xline that referenced this issue Jun 29, 2023
Closes: xline-kv#259

Signed-off-by: bsbds <69835502+bsbds@users.noreply.github.com>
bsbds added a commit to bsbds/Xline that referenced this issue Jun 29, 2023
Closes: xline-kv#259

Signed-off-by: bsbds <69835502+bsbds@users.noreply.github.com>
bsbds added a commit to bsbds/Xline that referenced this issue Jun 29, 2023
Closes: xline-kv#259

Signed-off-by: bsbds <69835502+bsbds@users.noreply.github.com>
bsbds added a commit to bsbds/Xline that referenced this issue Jun 29, 2023
Closes: xline-kv#259

Signed-off-by: bsbds <69835502+bsbds@users.noreply.github.com>
@bsbds
Copy link
Collaborator

bsbds commented Jun 29, 2023

The test may fail sometime because the serialized size if TestCommand is not fixed.

The root cause is the TestCommand rely on a global variable NEXT_ID, in the generation of ProposeId.

static NEXT_ID: Lazy<AtomicU64> = Lazy::new(|| AtomicU64::new(1));
fn next_id() -> u64 {
NEXT_ID.fetch_add(1, Ordering::SeqCst)
}

The ProposeId call the to_string method may generate different sized struct, for instance: "9" is one byte and "10" is two bytes. But the tests assume that all sizes of the generated command are equal.

id: ProposeId::new(next_id().to_string()),

let log_entry_size = log.batch_index[1];

And because cargo test will run the tests in any sequence in a single process, different tests may increment the NEXT_ID, which result in the non-deterministic failure.

@mergify mergify bot closed this as completed in #350 Jun 29, 2023
mergify bot pushed a commit that referenced this issue Jun 29, 2023
Closes: #259

Signed-off-by: bsbds <69835502+bsbds@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants