Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

How to turn DeadMaster into slave when MySQL comes back online after automatic failover? #891

Closed
NielsH opened this issue May 15, 2019 · 12 comments

Comments

@NielsH
Copy link

NielsH commented May 15, 2019

Hi,

I have a setup with 1 MariaDB master and 2 slaves:

sqlproxytest01-lan:3307   [0s,ok,10.2.23-MariaDB-10.2.23+maria~stretch-log,rw,ROW]
+ sqlproxytest02-lan:3307 [0s,ok,10.2.23-MariaDB-10.2.23+maria~stretch-log,ro,ROW,GTID]
+ sqlproxytest03-lan:3307 [0s,ok,10.2.23-MariaDB-10.2.23+maria~stretch-log,ro,ROW,GTID]

With this config:
Config:

{
  "Debug": true,
  "EnableSyslog": false,
  "ListenAddress": ":3000",
  "MySQLTopologyUser": "<redacted>",
  "MySQLTopologyPassword": "<redacted>",
  "MySQLTopologyCredentialsConfigFile": "",
  "MySQLTopologySSLPrivateKeyFile": "",
  "MySQLTopologySSLCertFile": "",
  "MySQLTopologySSLCAFile": "",
  "MySQLTopologySSLSkipVerify": true,
  "MySQLTopologyUseMutualTLS": false,
  "BackendDB": "sqlite",
  "SQLite3DataFile": "/usr/local/orchestrator/orchestrator.sqlite3",
  "RaftEnabled": true,
  "RaftDataDir": "/usr/local/orchestrator",
  "RaftBind": "172.29.176.80",
  "DefaultRaftPort": 10008,
  "RaftNodes": ["172.29.176.80", "172.29.176.81", "172.29.176.82"],
  "MySQLConnectTimeoutSeconds": 1,
  "DefaultInstancePort": 3307,
  "DiscoverByShowSlaveHosts": true,
  "InstancePollSeconds": 5,
  "DiscoveryIgnoreReplicaHostnameFilters": [
    "a_host_i_want_to_ignore[.]example[.]com",
    ".*[.]ignore_all_hosts_from_this_domain[.]example[.]com"
  ],
  "UnseenInstanceForgetHours": 240,
  "SnapshotTopologiesIntervalHours": 0,
  "InstanceBulkOperationsWaitTimeoutSeconds": 10,
  "HostnameResolveMethod": "default",
  "MySQLHostnameResolveMethod": "@@report_host",
  "SkipBinlogServerUnresolveCheck": true,
  "ExpiryHostnameResolvesMinutes": 60,
  "RejectHostnameResolvePattern": "",
  "ReasonableReplicationLagSeconds": 10,
  "ProblemIgnoreHostnameFilters": [],
  "VerifyReplicationFilters": false,
  "ReasonableMaintenanceReplicationLagSeconds": 20,
  "CandidateInstanceExpireMinutes": 60,
  "AuditLogFile": "",
  "AuditToSyslog": false,
  "RemoveTextFromHostnameDisplay": ".mydomain.com:3306",
  "ReadOnly": false,
  "AuthenticationMethod": "",
  "HTTPAuthUser": "",
  "HTTPAuthPassword": "",
  "AuthUserHeader": "",
  "PowerAuthUsers": [
    "*"
  ],
  "ClusterNameToAlias": {
    "127.0.0.1": "test suite"
  },
  "SlaveLagQuery": "",
  "DetectClusterAliasQuery": "SELECT name FROM orchestrator_meta.cluster",
  "DetectClusterDomainQuery": "",
  "DetectInstanceAliasQuery": "",
  "DetectPromotionRuleQuery": "",
  "DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com",
  "PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com",
  "PromotionIgnoreHostnameFilters": [],
  "DetectSemiSyncEnforcedQuery": "",
  "ReplicationCredentialsQuery": "SELECT username,password FROM orchestrator_meta.replication_credentials",
  "ServeAgentsHttp": false,
  "AgentsServerPort": ":3001",
  "AgentsUseSSL": false,
  "AgentsUseMutualTLS": false,
  "AgentSSLSkipVerify": false,
  "AgentSSLPrivateKeyFile": "",
  "AgentSSLCertFile": "",
  "AgentSSLCAFile": "",
  "AgentSSLValidOUs": [],
  "UseSSL": false,
  "UseMutualTLS": false,
  "SSLSkipVerify": false,
  "SSLPrivateKeyFile": "",
  "SSLCertFile": "",
  "SSLCAFile": "",
  "SSLValidOUs": [],
  "URLPrefix": "",
  "StatusEndpoint": "/api/status",
  "StatusSimpleHealth": true,
  "StatusOUVerify": false,
  "AgentPollMinutes": 60,
  "UnseenAgentForgetHours": 6,
  "StaleSeedFailMinutes": 60,
  "SeedAcceptableBytesDiff": 8192,
  "PseudoGTIDPattern": "",
  "PseudoGTIDPatternIsFixedSubstring": false,
  "PseudoGTIDMonotonicHint": "asc:",
  "DetectPseudoGTIDQuery": "",
  "BinlogEventsChunkSize": 10000,
  "SkipBinlogEventsContaining": [],
  "ReduceReplicationAnalysisCount": true,
  "FailureDetectionPeriodBlockMinutes": 60,
  "RecoveryPeriodBlockSeconds": 3600,
  "RecoveryIgnoreHostnameFilters": [],
  "RecoverMasterClusterFilters": [
    ".*"
  ],
  "RecoverIntermediateMasterClusterFilters": [
    ".*"
  ],
  "OnFailureDetectionProcesses": [
    "echo 'Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"
  ],
  "PreFailoverProcesses": [
    "echo 'Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"
  ],
  "PostFailoverProcesses": [
    "echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostUnsuccessfulFailoverProcesses": [],
  "PostMasterFailoverProcesses": [
    "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostIntermediateMasterFailoverProcesses": [
    "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "CoMasterRecoveryMustPromoteOtherCoMaster": true,
  "DetachLostSlavesAfterMasterFailover": true,
  "ApplyMySQLPromotionAfterMasterFailover": true,
  "PreventCrossDataCenterMasterFailover": false,
  "MasterFailoverDetachSlaveMasterHost": false,
  "MasterFailoverLostInstancesDowntimeMinutes": 0,
  "PostponeSlaveRecoveryOnLagMinutes": 0,
  "DelayMasterPromotionIfSQLThreadNotUpToDate": true,
  "OSCIgnoreHostnameFilters": [],
  "GraphiteAddr": "",
  "GraphitePath": "",
  "GraphiteConvertHostnameDotsToUnderscores": true
}

I have noticed that when I restart or stop the MySQL master (or in any other way if it becomes unavailable) the master becomes "lost". The failover happens properly; a slave is promoted to master and the other slave now slaves from the new master.

But the old master, once it comes back online shows up in its own cluster. It is not recognized as being part of the old cluster. I'd like to have it automatically rejoin and turn into a slave instead. I thought that perhaps this was possible by having it automatically rollback anything that was done that was not transferred to the slaves at the time of the failover and then start the slave from the last-known "shared" change that the old master originally slaved to the new master that was promoted during the failover.
These changes shouldn't even be there anyways but I guess it can happen when the slave lag was high during the failover. However they can be discarded anyways.

What currently happens if I restart the MySQL master is this becomes the new topology:

root@sqlproxytest01:/var/lib/mysql# orchestrator-client -c topology -a MyCluster
sqlproxytest02-lan:3307   [0s,ok,10.2.23-MariaDB-10.2.23+maria~stretch-log,rw,ROW,>>]
+ sqlproxytest03-lan:3307 [0s,ok,10.2.23-MariaDB-10.2.23+maria~stretch-log,ro,ROW,>>,GTID]
root@sqlproxytest01:/var/lib/mysql# orchestrator-client -c topology -a sqlproxytest01-lan:3307
sqlproxytest01-lan:3307 [0s,ok,10.2.23-MariaDB-10.2.23+maria~stretch-log,ro,ROW,>>]
root@sqlproxytest01:/var/lib/mysql# orchestrator-client -c clusters
sqlproxytest01-lan:3307
sqlproxytest02-lan:3307
root@sqlproxytest01:/var/lib/mysql# orchestrator-client -c all-clusters-masters
sqlproxytest02-lan:3307
root@sqlproxytest01:~# orchestrator-client -c api --path clusters-info | jq -r .
[
  {
    "ClusterName": "sqlproxytest01-lan:3307",
    "ClusterAlias": "sqlproxytest01-lan:3307",
    "ClusterDomain": "",
    "CountInstances": 1,
    "HeuristicLag": 0,
    "HasAutomatedMasterRecovery": true,
    "HasAutomatedIntermediateMasterRecovery": true
  },
  {
    "ClusterName": "sqlproxytest02-lan:3307",
    "ClusterAlias": "MyCluster",
    "ClusterDomain": "",
    "CountInstances": 2,
    "HeuristicLag": 0,
    "HasAutomatedMasterRecovery": true,
    "HasAutomatedIntermediateMasterRecovery": true
  }
]

Is there anything I can do to make it work as I want? Or is it expected behaviour that if I restart the Master server or do failover in any way I should manually rebuild it as a slave and rejoin it into the cluster afterwards?

Thank you in advance!

@hellracer
Copy link

if you are using GTID based replication you can easily repoint the demoted previous master to slave by CHANGE MASTER TO MASTER_USER='user',MASTER_PASSWORD='password',MASTER_AUTO_POSITION=1;

you can use PostGracefulTakeoverProcesses hooks to accomplished what you want e.g

"PostGracefulTakeoverProcesses": [
    "/opt/orchestrator/scripts/postfailover"
  ],

after a successful take over that postfailover shell script will execute and you can do whatever you want.

For me i think it's a conservative moved/decision by shlomi to make it this as a default behaviour to give us the user time to investigate as to why the previous master has been demoted.

imagine the scenario if orchestrator do this thing for you automagically and that server is not not meant to be automatically rejoin the cluster, it's just my 0.2$

@hellracer
Copy link

I must be very honest with you, i've also done what you are trying to accomplished until one day i've been bitten by it and learned my lesson the hardway, you will realised this as soon as you throw proxySQL/maxscale in to the equation :)

@shlomi-noach
Copy link
Collaborator

Or is it expected behaviour that if I restart the Master server or do failover in any way I should manually rebuild it as a slave and rejoin it into the cluster afterwards?

That's the expected behavior and you should rebuild the server.

I thought that perhaps this was possible by having it automatically rollback anything that was done that was not transferred to the slaves at the time of the failover and then start the slave from the last-known "shared" change that the old master originally slaved to the new master that was promoted during the failover.
These changes shouldn't even be there anyways but I guess it can happen when the slave lag was high during the failover. However they can be discarded anyways.

Very non-trivial. See http://code.openark.org/blog/mysql/un-split-brain-mysql-via-gh-mysql-rewind

@hellracer
Copy link

@shlomi-noach a very good read will reserved this late this evening :)

@NielsH
Copy link
Author

NielsH commented May 15, 2019

Hi all,

Thanks for the clarification.

if you are using GTID based replication you can easily repoint the demoted previous master to slave by CHANGE MASTER TO MASTER_USER='user',MASTER_PASSWORD='password',MASTER_AUTO_POSITION=1;

In our case we use MariaDB which doesn't support MASTER_AUTO_POSITION but I did it like this now on the detached old master and it seems to work:

CHANGE MASTER TO
  MASTER_HOST='<new_master_ip>',
  MASTER_USER='<user>',
  MASTER_PASSWORD='<pass>',
  MASTER_PORT=3307,
  MASTER_CONNECT_RETRY=1,
  master_use_gtid=slave_pos;

This is a lot quicker recovery and less work than rebuilding slaves. For now we'll consider doing this manually instead of through a hook.

Very non-trivial. See http://code.openark.org/blog/mysql/un-split-brain-mysql-via-gh-mysql-rewind

Cheers, thanks for the read. Will have a look! In our case I think it's less of an issue because we plan to use SQL Proxy to only send writes to the master and reads to the slaves. So queries shouldn't be going to the old master. But I understand it is always a risk.

I'll test to see if this works and doesn't give weird unexpected issues. Thanks so-far!

@hellracer
Copy link

hellracer commented May 15, 2019

@NielsH our millage may vary in my end just to make sure that the newly promoted slave was indeed sync with the master I always run pt-table-checksum just to make it sure that they were sync. I don't know if pt-table-checksum is compatible with mariadb because we are a percona shop :)

Anyway glad that it was working for you as well, another tip use semi sync replication if it was supported by mariadb and also try to perform a graceful failover using orchestrator-cli not a GUI whenever you are doing a test specially when you are going to integrate proxysql because you will see a verbose message on what's going on in the background and have much better understanding behind the scene's to avoid surprises.

you don't want the app might write some data on the hostgroup reader because of failed failover so extra careful is necessary ^_^ this is off topic by the way

@NielsH NielsH closed this as completed Jul 22, 2019
@mostafahussein
Copy link

mostafahussein commented Sep 11, 2019

@hellracer , it seems that PostGracefulTakeoverProcesses has nothing to do with the master instance when it comes online again or maybe it should be triggered under specific circumstances. All i do is to stop the master and wait for the failover to be done.

I am looking for a hook that should be executed when this old master comes to life or should i manage this case outside of orchestrator ? as i couldn't find any hook for it.

I have another question that i think it was not mentioned in this issue. why the failover is creating a different alias in the dashboard ? how can i reuse the same old name ? or this is the default ?

Note: I am using pseudo gtid with mariadb

@mostafahussein
Copy link

@NielsH have you solved this instead of doing it manually ?

@NielsH
Copy link
Author

NielsH commented Sep 11, 2019

@mostafahussein we ended up doing it manually. However, we did create a script that hooks into the mariadb systemd unit file and that runs before stopping mysql itself. The script triggers a failover before shutting down, so the node can rejoin the cluster properly without having to do manual recovery.

This is because for us a primary concern was mainly that we would have to manually restart mariadb (through systemctl restart mariadb) and it would not be able to rejoin the cluster that way. This is a scenario we wanted to prevent.

The script may not work for your configuration/usecase and hasn't seen a lot of testing, so you probably have to change it to fit your needs, but if it helps:

#!/usr/bin/env bash

# {{ ansible_managed }}

# This script checks if the current server is a MySQL master within Orchestrator
# and performs a graceful failover if it is.
# This script does nothing if the current server is not a MySQL master within Orchestrator.

set -e

_cluster=$(orchestrator-client -c clusters)
_clustercount=$(echo -n "${_cluster}" | grep -c '^')

# first check if we can detect a cluster
# there should only be one; managing multiple clusters is not supported in this script
if [ ! "${_clustercount}" -eq "1" ]; then
        echo "Unable to detect cluster"
        exit 1
fi

# now get cluster master
_cluster_master=$(orchestrator-client -c which-cluster-master -i "${_cluster}" | cut -d: -f1)
_cluster_available_replica=$(orchestrator-client -c which-replicas -i "${_cluster}" | head -1)
_node_name=$(mysql --defaults-file=/etc/mysql/mysql-root.cnf -N -e 'select @@report_host')

if [ "${_cluster_master}" == "${_node_name}" ]; then
        echo "WARNING: This MySQL server is a master node. Attempting to do a graceful-takeover before shutting down."
        orchestrator-client -c graceful-master-takeover -i "${_cluster}" -d "${_cluster_available_replica}"
        sleep 5
fi

And the systemd config:

# {{ ansible_managed }}
[Service]
ExecStop=/usr/local/sbin/orchestrator-failover-master.sh

@mostafahussein
Copy link

@NielsH Thanks alot!
I have another question how did you make the cluster do the failover under the same cluster alias/name ? In my case it creates another cluster (with a separate url). I am trying to figure it out but still no luck.

@mostafahussein
Copy link

Hello @NielsH, This is just a reminder when you have available time to follow up my previous question. Thanks for the assist

@NielsH
Copy link
Author

NielsH commented Sep 29, 2019

Hi,

Sorry, I lost track of the question.
It is not something I am able to easily test at the moment, config-wise what I have is:

  • configured report_host / report_port in the mariadb config
  • configured a query for DetectClusterAliasQuery (in orchestrator config file)

With this, and the above script, whenever I restarted a node in the cluster, as long as the failover worked in advance, so it was a slave at the time of restarting, it would show back up in the same cluster. I did have to manually click "start replication" again, but it would not be a separate cluster.

If the mysql server was a master when being restarted (so no failover prior) it would show up as a seperate cluster and I had to manually convert it to a slave (through #891 (comment) )

Hope this helps.... ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants