refactor: implement failover based distributed etcd lock #142

ZuLiangWang · 2023-03-09T09:45:14Z

Which issue does this PR close?

Closes #

Rationale for this change

In order to ensure that user data in distributed mode will not be lost, we need a mechanism to ensure that CeresDB will not have multiple leader shard under any circumstances. etcd lock has been added in apache/horaedb#706 , we refactor scheduler to implement failover based etcd distributed lock.

What changes are included in this PR?

Add shard watcher to get notify when shard lock is expired.
Refactor scheduler module, add shard lock delete callback to implement auto failover.

Are there any user-facing changes?

None.

How does this change test

Pass all unit tests and integration tests.

ZuLiangWang force-pushed the refactor_cluster_procedure branch 4 times, most recently from 7da1c94 to 3ec790c Compare March 14, 2023 12:17

refactor: implement failover based distributed etcd lock

8339883

ZuLiangWang force-pushed the refactor_cluster_procedure branch from 3ec790c to 8339883 Compare March 14, 2023 12:32

refactor: refactor scheduler module

c40f161

ZuLiangWang force-pushed the refactor_cluster_procedure branch from d1496db to 2a5a993 Compare March 20, 2023 13:13

refactor: refactor scheduler module

6937d3b

ZuLiangWang force-pushed the refactor_cluster_procedure branch from 2a5a993 to 6937d3b Compare March 21, 2023 08:27

ZuLiangWang closed this Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: implement failover based distributed etcd lock #142

refactor: implement failover based distributed etcd lock #142

ZuLiangWang commented Mar 9, 2023

refactor: implement failover based distributed etcd lock #142

refactor: implement failover based distributed etcd lock #142

Conversation

ZuLiangWang commented Mar 9, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

How does this change test