[Feature]: Membership change #306

themanforfree · 2023-06-05T03:48:28Z

Description about the feature

Membership change has two different situations

Change only one node at a time, in this case there must be duplicate nodes in majority of the old and new configurations, it is impossible to elect two leaders at the same time, we can handle this situation simply, after the new configuration is committed, the new configuration can be used to make decisions.
Change multiple nodes at a time, the following situation may occur. nodes using different configurations may have two disjoint majorities, which means that two leaders may be elected, the solution is introduce an intermediate state, after the leader receives the configuration change request, the new configuration node set is C_new, and the old configuration node set is C_old. After C_new is committed, it enters the joint consensus state. When an election occurs, C_old and C_new jointly decide, and then the leader creates an empty Configuration change log, after this log committed, cluster can use C_new alone to make decisions, and the cluster completes this configuration change node, and this configuration change is complete.

Learner
When adding a new node to the cluster, the new node does not have the current data of the cluster. At this time, the leader needs to sync the data to the new node. This process may take a while, which will increase the risk of cluster unavailability. Therefore, the Learner node is introduced. Nodes only sync logs, do not participate in voting, wait until the Leaner node catches up with the progress of the leader, and then promote it to a normal node.

I plan to implement the membership change in three stages

implement single node change, and prepare some struct for multiple nodes change.
implement Learner node.
implement multiple nodes change.

Tests

The member id and cluster id calculated by all nodes should be consistent when Initialize the cluster.
The new node should pull member IDs directly from the existing cluster and use them, and should not calculate IDs by itself.
Client should use new configuration after cluster member change

Related papers

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

themanforfree · 2023-06-09T02:53:05Z

More details of stage 1

use Arc to share LogEntry, then we can make LogEntry instead of Command transmitted in conflict checked mpmc channel, this can make curp manage content other than Command. Feature/member add single #382
Build client without ServerId. refactor: build client without ServerId #412
Use u64 as ServerId instead of string. refactor: use u64 as ServerId #413
Handle single node config change logic in curp. (voters, follower status, sync_follower_daemons, sync event, cluster info) feature: add apply_conf_change method for RawCurp #433
Update photobuf, add new ProposeConfChange interface and implement propose logic. Propose conf change #437
Use a logical clock to represent the cluster configuration version, and detect that the client rejects the request when using the old configuration. Client discover cluster #460
Add Cluster Server in Xline. Feat/cluster server #464

themanforfree · 2023-08-04T01:13:26Z

Problem

At first, I planned to reuse the processing logic for command directly as follows, but the ER and ASR is defined externally, And ConfChange is an internal behavior of curp, so we can't make ConfChange produce externally defined ERs and ASRs.

...
TaskType::SpecExe(entry, pre_err) => {
    let er = if let Some(err_msg) = pre_err {
        Err(err_msg)
    } else {
        match entry.entry_data {
            EntryData::Command(ref cmd) => ce
                .execute(cmd, entry.index)
                .await
                .map_err(|e| e.to_string()),
            EntryData::ConfChange(_) => todo!(),
        }
    };
    let er_ok = er.is_ok();
    cb.write().insert_er(entry.id(), er);
    if !er_ok {
        sp.lock().remove(entry.id());
        let _ig = ucp.lock().remove(entry.id());
    }
    debug!(
        "{id} cmd({}) is speculatively executed, exe status: {er_ok}",
        entry.id()
    );
    er_ok
}
...

Solution

...
TaskType::SpecExe(entry, pre_err) => match entry.entry_data {
    EntryData::Command(ref cmd) => {
        let er = if let Some(err_msg) = pre_err {
            Err(err_msg)
        } else {
            ce.execute(cmd, entry.index)
                .await
                .map_err(|e| e.to_string())
        };
        let er_ok = er.is_ok();
        cb.write().insert_er(entry.id(), er);
        if !er_ok {
            sp.lock().remove(entry.id());
            let _ig = ucp.lock().remove(entry.id());
        }
        debug!(
            "{id} cmd({}) is speculatively executed, exe status: {er_ok}",
            entry.id()
        );
        er_ok
    }
    EntryData::ConfChange(_) => true,
}
...

In fact, cmd_worker only needs to return a bool value to indicate whether the task is running successfully, and does not care about the internal specific execution process, so there is no need for ConfChange to produce ER, just return a bool

Phoenix500526 · 2023-08-10T05:58:15Z

Problem

At first, I planned to reuse the processing logic for command directly as follows, but the ER and ASR is defined externally, And ConfChange is an internal behavior of curp, so we can't make ConfChange produce externally defined ERs and ASRs.

...
TaskType::SpecExe(entry, pre_err) => {
    let er = if let Some(err_msg) = pre_err {
        Err(err_msg)
    } else {
        match entry.entry_data {
            EntryData::Command(ref cmd) => ce
                .execute(cmd, entry.index)
                .await
                .map_err(|e| e.to_string()),
            EntryData::ConfChange(_) => todo!(),
        }
    };
    let er_ok = er.is_ok();
    cb.write().insert_er(entry.id(), er);
    if !er_ok {
        sp.lock().remove(entry.id());
        let _ig = ucp.lock().remove(entry.id());
    }
    debug!(
        "{id} cmd({}) is speculatively executed, exe status: {er_ok}",
        entry.id()
    );
    er_ok
}
...

Solution

...
TaskType::SpecExe(entry, pre_err) => match entry.entry_data {
    EntryData::Command(ref cmd) => {
        let er = if let Some(err_msg) = pre_err {
            Err(err_msg)
        } else {
            ce.execute(cmd, entry.index)
                .await
                .map_err(|e| e.to_string())
        };
        let er_ok = er.is_ok();
        cb.write().insert_er(entry.id(), er);
        if !er_ok {
            sp.lock().remove(entry.id());
            let _ig = ucp.lock().remove(entry.id());
        }
        debug!(
            "{id} cmd({}) is speculatively executed, exe status: {er_ok}",
            entry.id()
        );
        er_ok
    }
    EntryData::ConfChange(_) => true,
}
...

In fact, cmd_worker only needs to return a bool value to indicate whether the task is running successfully, and does not care about the internal specific execution process, so there is no need for ConfChange to produce ER, just return a bool

LGTM

themanforfree added the enhancement New feature or request label Jun 5, 2023

Phoenix500526 added this to the Release v0.5.0 milestone Jun 5, 2023

themanforfree mentioned this issue Jul 14, 2023

Feature/member add single #382

Merged

rogercloud modified the milestones: v0.5.0, v0.6.0 Jul 18, 2023

This was referenced Aug 7, 2023

refactor: build client without ServerId #412

Merged

refactor: use u64 as ServerId #413

Merged

This was referenced Aug 28, 2023

feature: add apply_conf_change method for RawCurp #433

Merged

Propose conf change #437

Merged

This was referenced Sep 14, 2023

Stop removed node and start added node #448

Merged

Add learner #449

Merged

themanforfree self-assigned this Sep 20, 2023

themanforfree mentioned this issue Sep 25, 2023

Client discover cluster #460

Merged

Phoenix500526 closed this as completed Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Membership change #306

[Feature]: Membership change #306

themanforfree commented Jun 5, 2023 •

edited

Loading

themanforfree commented Jun 9, 2023 •

edited

Loading

themanforfree commented Aug 4, 2023

Phoenix500526 commented Aug 10, 2023

Problem

Solution

[Feature]: Membership change #306

[Feature]: Membership change #306

Comments

themanforfree commented Jun 5, 2023 • edited Loading

Description about the feature

Tests

Related papers

Code of Conduct

themanforfree commented Jun 9, 2023 • edited Loading

themanforfree commented Aug 4, 2023

Problem

Solution

Phoenix500526 commented Aug 10, 2023

Problem

Solution

themanforfree commented Jun 5, 2023 •

edited

Loading

themanforfree commented Jun 9, 2023 •

edited

Loading