TiCDC cluster suffers a round robin owner election during rolling update #3529
Labels
area/ticdc
Issues or PRs related to TiCDC.
severity/moderate
subject/new-feature
Denotes an issue or pull request adding a new feature.
type/bug
The issue is confirmed as a bug.
What did you do?
What did you expect to see?
Replication continues normally during TiCDC is rolling update
What did you see instead?
Supposing the owner is restarted at first, then owner will be elected to each following TiCDC node(This is caused by the election way in etcd, it simply selects the election key with the smallest revision as the campaign winner), while the elected owner will be restarted soon by rolling update.
The initialization phase of a TiCDC owner could cost long time, it has many procedures, including initializing each existing changefeeds (when initializing a changefeed it will create a downstream sink, imaging we create a Kafka sink and do some verification jobs, it is heavy work).
Then we will waste a lot of time in each TiCDC owner node to do owner initialization. What's more, maybe no owner finishes initialization before it restarts, the replication checkpoint could pause during rolling update, and the longer rolling update takes, the larger replication lag may happen.
Versions of the cluster
Upstream TiDB cluster version (execute
SELECT tidb_version();
in a MySQL client):v5.3.0
TiCDC version (execute
cdc version
):master@pingcap/ticdc@fe92b89
Brainstorming
DM-master
does in DM. We can update owner nodes first, then processor nodes. (This changes existing architecture of TiCDC)The text was updated successfully, but these errors were encountered: