Skip to content
This repository was archived by the owner on Feb 18, 2025. It is now read-only.
This repository was archived by the owner on Feb 18, 2025. It is now read-only.

Orchestrator promotes a replica with lag on Mariadb #1363

@mohankara

Description

@mohankara

We have a 3 node MariaDB 10.5.10 setup on Centos. 1 Primary and 2 replicas with semi-sync enabled.
Our current orchestrator version is 3.2.4

We had a scenario where the replicas were lagging by few hours, master was not reachable so one of the replicas was promoted as primary in spite of the huge lag. This resulted in a data loss. Ideally orchestrator should wait for the replica's relay logs to be applied on the replica then promote as a master. This seems to be the behavior on MySQL based on my testing but not on Mariadb.

--Test case:
Tests against MySQL and Mariadb are done with these orchestrator parameters in /etc/orchaestrator.conf.json

"DelayMasterPromotionIfSQLThreadNotUpToDate": true,
"debug": true

Restart orchestrator on all 3 nodes
I)Test on MariaDB:
Start a 3 node Mariadb cluster (Semi-sync enabled)
1.Create and add data to a test table
create table test (colA int, colB int, colC datetime, colD int);

insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
insert into test values (rand()*100,rand()*1000,now(),rand()*10000);

  1. Stop slave SQL_THREAD on replicas (Node 2, 3)

  2. Wait for few secs and add some more data to Node 1 (master)
    insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
    insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
    insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
    insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
    insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
    insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
    insert into test values (rand()*100,rand()*1000,now(),rand()*10000);

  3. Stop mysqld on Master (Node 1)

  4. You will see the orchestrator promoting a replica without the data added in Step Rqlite bin #3.

Test on MySQL 5.7.32:
Repeat the same test on 3 node MySQL
You will notice orchestrator promoting one of the replicas with out any data loss ie seeing 14 rows !!!

Thank You
Mohan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions