Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Can orchestrator + semi-sync guarantee zero data loss? #1312

Closed
Fanduzi opened this issue Feb 18, 2021 · 9 comments · Fixed by Fanduzi/orchestrator#1
Closed

Can orchestrator + semi-sync guarantee zero data loss? #1312

Fanduzi opened this issue Feb 18, 2021 · 9 comments · Fixed by Fanduzi/orchestrator#1
Assignees

Comments

@Fanduzi
Copy link

Fanduzi commented Feb 18, 2021

Let's say I have one master and two slaves, semi-sync is on, rpl_semi_sync_master_wait_for_slave_count = 1
M is the master
S1 and S2 are the slaves
At some point:

  • S1's Executed_Gtid_Set < S2's Executed_Gtid_Set
  • There is a problem with S2 network. Some binlog events have not yet written to S2's relay log, but the IO_THREAD status is still running.
    Then, master crash. which slave will be the new master?

Here I provide a test method for this scenario:

  1. start a script to INSERT into table T1.
  2. on S1 lock table t1 read, after that sql thread will block , S1's Executed_Gtid_Set will not change any more, and at this moment S2's sql thread is still running applying relay logs, so S2's Executed_Gtid_Set will be a superset of S1's Executed_Gtid_Set
  3. on S2 run these commnad(I don't know exactly how these commands work, but the purpose is to simulate a network exception)
set slave_net_timeout = 3600 -- Just to make it easier to do tests

tc qdisc del dev ens33 root  
tc qdisc add dev ens33 root handle 1: prio
tc filter add dev ens33 protocol ip parent 1: prio 1 u32 match ip dst 172.16.120.10 flowid 1:1
tc filter add dev ens33 protocol all parent 1: prio 2 u32 match ip dst 0.0.0.0/0 flowid 1:2
tc filter add dev ens33 protocol all parent 1: prio 2 u32 match ip protocol 1 0xff flowid 1:2
tc qdisc add dev ens33 parent 1:1 handle 10: netem delay 180000ms
tc qdisc add dev ens33 parent 1:2 handle 20: sfq

after run these command, S2 will not receive master's binlog event, but S2's Slave_IO_Running is still 'ON'
4. shutdown master, run tc qdisc del dev ens33 root on S2, release lock on S1
5. see who will be the new master(In our tests, the orchestrator chose S2 as the New Master, But I think S1 should be chosen as the new master)

@Fanduzi
Copy link
Author

Fanduzi commented Mar 5, 2021

@shlomi-noach I take the liberty of hoping that you will take the time to answer my question, as I don't know golang, so I don't know much about the failover logic of the orchestrator, so please forgive me if I'm wrong, and I look forward to your reply.

@shlomi-noach
Copy link
Collaborator

Whoops, sorry, missed this in the backlog.

Right, I think I saw another similar questio nrecently. What you tests show is:

  • orchestrator will promote the replica which has executed more events
  • rather than the replica which has more data in the relay logs

The systems I've worked with are such that replication lag is very low (by actively pushing back on apps). Therefore, at time of failover, it only takes a fraction of a second for any replica to consume whatever relay log events are in the queue.

Back to your question, could the following configuration help? "DelayMasterPromotionIfSQLThreadNotUpToDate": true. Off the top of my head, not sure -- this check is made after we've picked the promoted replica.

So, we need a mechanism that chooses a replica based on potential data, not on current data. This is only applicable for GTID based failovers, because you can only compare replicas in GTID topologies.

Let me look into this.

@shlomi-noach shlomi-noach self-assigned this Apr 1, 2021
@Fanduzi
Copy link
Author

Fanduzi commented Apr 2, 2021

maybe use Master_Log_File and Read_Master_Log_Pos?

@Fanduzi
Copy link
Author

Fanduzi commented Jun 17, 2021

大佬, have you made any progress?

@binwiederhier
Copy link
Contributor

You may be interested in this: https://datto.engineering/post/lossless-mysql-semi-sync-replication-and-automated-failover (disclaimer: I wrote it :-))

@Fanduzi
Copy link
Author

Fanduzi commented May 5, 2022

binwiederhier

Thank you @binwiederhier , the article was very helpful
I'm currently using MHA + ProxySQL + semi-sync(AFTER_SYNC) + GTID. Since MHA selects the latest slave based on ReadBinlogCoordinates, it seems that our current architecture is theoretically safe from data loss. Of course, I've also made some modifications to prevent split brain issues so that there is only one "master" in ProxySQL when failover occurs(because proxysql cluster is not a real cluster)

I was planning to replace MHA with Orchestrator this year, but I've found that Orchestrator's philosophy is different from MHA. Orchestrator tends to prioritise availability and retain the maximum number of replica in the cluster. Orchestrator uses ExecBinlogCoordinates to select candidate, which does have the potential for data loss in the extreme scenario I described. So I learned a bit about go, and made some "modifications" on May Day Holiday, which are still being tested.

However, in the process of learning the source code I found that there was something wrong with DelayMasterPromotionIfSQLThreadNotUpToDate, it didn't "work", according to the path I sorted out for the code call:

RegroupReplicasGTID-> GetCandidateReplica-> sortedReplicasDataCenterHint-> StopReplicas->StopReplicationNicely

StopReplicationNicely finally executes stop slave. I don't find anywhere in the code where the start slave sql_thread is executed afterwards. So DelayMasterPromotionIfSQLThreadNotUpToDate has been waiting for a stopped slave...

I'll have to look into it, your orchestrator.json is very informative for me, anyway, thanks~

@ht2324
Copy link

ht2324 commented Jan 4, 2024

@Fanduzi 无损半同步复制也会丢数据么 在上面这个情况下?

@Fanduzi
Copy link
Author

Fanduzi commented Jan 4, 2024

@Fanduzi 无损半同步复制也会丢数据么 在上面这个情况下?

我测试的结果是会,你也可以测测

@ht2324
Copy link

ht2324 commented Jan 4, 2024

@Fanduzi 无损半同步复制也会丢数据么 在上面这个情况下?

我测试的结果是会,你也可以测测

嗯 应该是的 从库没有收到完整的relay log,旧主会多出事务

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants