Update of mysql cluster lead to problem with replication. #565

DeamonMV · 2020-07-15T06:37:00Z

What is the problem

Update of the mysql cluster lead to problem with replication.

How I got this problem

My mysql cluster had worked about 20 days without problem. Yesterday I deployed an update for mysql and after this I got problem. I've updated mysql configuration.

Operator started updates from slave mysql node. After some time, about 3 minutes, slave node become master, and old master went to termination. But old master didn't become a slave.

Log of new slave node:

# kubectl -n ch-wordpress logs -f cluster-com-mysql-0 mysql
2020-07-14T16:09:46.240702Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2020-07-14T16:09:46.241535Z 0 [Note] mysqld (mysqld 5.7.26-29-log) starting as process 1 ...
2020-07-14T16:09:46.400346Z 0 [Note] InnoDB: PUNCH HOLE support available
2020-07-14T16:09:46.400373Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-07-14T16:09:46.400376Z 0 [Note] InnoDB: Uses event mutexes
2020-07-14T16:09:46.400379Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2020-07-14T16:09:46.400381Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2020-07-14T16:09:46.400383Z 0 [Note] InnoDB: Using Linux native AIO
2020-07-14T16:09:46.400563Z 0 [Note] InnoDB: Number of pools: 1
2020-07-14T16:09:46.400642Z 0 [Note] InnoDB: Using CPU crc32 instructions
2020-07-14T16:09:46.545165Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2020-07-14T16:09:46.548452Z 0 [Note] InnoDB: Completed initialization of buffer pool
2020-07-14T16:09:46.550047Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-07-14T16:09:46.808392Z 0 [Note] InnoDB: Crash recovery did not find the parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite
2020-07-14T16:09:46.886568Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2020-07-14T16:09:47.697355Z 0 [Note] InnoDB: Created parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite, size 3932160 bytes
2020-07-14T16:09:55.466791Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2020-07-14T16:09:55.466877Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2020-07-14T16:09:56.921477Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2020-07-14T16:09:56.922189Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2020-07-14T16:09:56.922197Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2020-07-14T16:09:56.939436Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 10390ms. The settings might not  be optimal. (flushed=0, during the time.)
2020-07-14T16:09:56.939702Z 0 [Note] InnoDB: Waiting for purge to start
2020-07-14T16:09:57.040971Z 0 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.7.26-29 started; log sequence number 53846107422
2020-07-14T16:09:57.041143Z 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2020-07-14T16:09:57.041469Z 0 [Note] Plugin 'FEDERATED' is disabled.
2020-07-14T16:09:58.603350Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2020-07-14T16:09:58.603368Z 0 [Note] Skipping generation of SSL certificates as certificate files are present in data directory.
2020-07-14T16:09:58.681490Z 0 [Warning] CA certificate ca.pem is self signed.
2020-07-14T16:09:58.681535Z 0 [Note] Skipping generation of RSA key pair as key files are present in data directory.
2020-07-14T16:09:58.732461Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
2020-07-14T16:09:58.732489Z 0 [Note] IPv6 is available.
2020-07-14T16:09:58.732496Z 0 [Note]   - '::' resolves to '::';
2020-07-14T16:09:58.732509Z 0 [Note] Server socket created on IP: '::'.
2020-07-14T16:10:00.526879Z 0 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=cluster-com-mysql-0-relay-bin' to avoid this problem.
2020-07-14T16:10:01.349463Z 0 [ERROR] Error during --relay-log-recovery: Could not locate rotate event from the master.
2020-07-14T16:10:01.427199Z 0 [ERROR] Slave: Failed to initialize the master info structure for channel ''; its record may still be present in 'mysql.slave_master_info' table, consider deleting it.
2020-07-14T16:10:01.427221Z 0 [ERROR] Failed to create or recover replication info repositories.
2020-07-14T16:10:01.427226Z 0 [Note] Some of the channels are not created/initialized properly. Check for additional messages above. You will not be able to start replication on those channels until the issue is resolved and the server restarted.
2020-07-14T16:10:03.869457Z 0 [Note] Event Scheduler: Loaded 0 events
2020-07-14T16:10:03.869601Z 0 [Note] Execution of init_file '/etc/mysql/conf.d/operator-init.sql' started.
2020-07-14T16:10:04.049402Z 0 [Note] Execution of init_file '/etc/mysql/conf.d/operator-init.sql' ended.
2020-07-14T16:10:04.049518Z 0 [Note] mysqld: ready for connections.
Version: '5.7.26-29-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona Server (GPL), Release 29, Revision 11ad961
2020-07-14T16:10:08.630735Z 26 [Note] Got packets out of order
2020-07-14T16:10:10.054908Z 0 [Note] InnoDB: Buffer pool(s) load completed at 200714 16:10:10
2020-07-14T16:10:18.630782Z 36 [Note] Got packets out of order
2020-07-14T16:10:28.630729Z 90 [Note] Got packets out of order
2020-07-14T16:10:38.631478Z 121 [Note] Got packets out of order
2020-07-14T16:10:48.630487Z 149 [Note] Got packets out of order
2020-07-14T16:10:58.631484Z 176 [Note] Got packets out of order
2020-07-14T16:11:08.631532Z 203 [Note] Got packets out of order

This message xxxxx [Note] Got packets out of order writing to the log without ending. I've waited about 10 minutes but cluster didn't become ready.

Slave status

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: //cluster-com-mysql-1.mysql.ch-wordpress
                  Master_User: sys_replication
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File:
          Read_Master_Log_Pos: 4
               Relay_Log_File: cluster-com-mysql-0-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File:
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 0
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 0
                  Master_UUID:
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set: 6b45a7cc-b6a5-11ea-b8ea-32ad1a075609:1-1361877,
d816fc29-b6a5-11ea-a57d-7e8cfea4061a:1-377385
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

mysql> show master status;\G;
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                                                             |
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------------------+
| mysql-bin.000156 |   780098 |              |                  | 6b45a7cc-b6a5-11ea-b8ea-32ad1a075609:1-1361912,
d816fc29-b6a5-11ea-a57d-7e8cfea4061a:1-377385 |
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Deleting slave mysql node didn't helped.

What I did to make cluster work

on slave mysql node

stop slave;
slave reset;
CHANGE MASTER TO MASTER_HOST='cluster-com-mysql-1.mysql.ch-wordpress', MASTER_USER='sys_replication', MASTER_PASSWORD='GYTB7hegSVZ234sdfz6oM';

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: cluster-com-mysql-1.mysql.ch-wordpress
                  Master_User: sys_replication
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql-bin.000079
          Read_Master_Log_Pos: 132015898
               Relay_Log_File: cluster-com-mysql-0-relay-bin.000002
                Relay_Log_Pos: 833
        Relay_Master_Log_File: mysql-bin.000079
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1032
                   Last_Error: Could not execute Update_rows event on table userdb.2Igai1gaW1_options; Can't find record in '2Igai1gaW1_options', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000079, end_log_pos 66005263
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 52804871
              Relay_Log_Space: 79212081
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1032
               Last_SQL_Error: Could not execute Update_rows event on table userdb.2Igai1gaW1_options; Can't find record in '2Igai1gaW1_options', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000079, end_log_pos 66005263
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 101
                  Master_UUID: d816fc29-b6a5-11ea-a57d-7e8cfea4061a
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 200714 17:52:09
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: d816fc29-b6a5-11ea-a57d-7e8cfea4061a:377386-377408
            Executed_Gtid_Set: 6b45a7cc-b6a5-11ea-b8ea-32ad1a075609:1-1373717,
d816fc29-b6a5-11ea-a57d-7e8cfea4061a:1-377386
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

Because of this Slave_SQL_Running: No Relay_Log_Pos: 833 Relay_Master_Log_File: mysql-bin.000079 I have tried one more thing

slave stop;
slave reset;
CHANGE MASTER TO master_auto_position=0;
CHANGE MASTER TO MASTER_HOST='cluster-com-mysql-1.mysql.ch-wordpress', MASTER_USER='sys_replication', MASTER_PASSWORD='GYTB7hegSVZ234sdfz6oM', MASTER_LOG_FILE='mysql-bin.000081', MASTER_LOG_POS= 234;

               Slave_IO_State: Waiting for master to send event
                  Master_Host: cluster-com-mysql-1.mysql.ch-wordpress
                  Master_User: sys_replication
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql-bin.000081
          Read_Master_Log_Pos: 234
               Relay_Log_File: cluster-com-mysql-0-relay-bin.000002
                Relay_Log_Pos: 320
        Relay_Master_Log_File: mysql-bin.000081
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 234
              Relay_Log_Space: 541
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 101
                  Master_UUID: d816fc29-b6a5-11ea-a57d-7e8cfea4061a
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set: 6b45a7cc-b6a5-11ea-b8ea-32ad1a075609:1-1373717,
d816fc29-b6a5-11ea-a57d-7e8cfea4061a:1-377386
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

After this attempt replication status look better, but mysql cluster stop working at all. K8s endpoints cluster-com-mysql-master and cluster-com-mysql become empty. So I had to delete cluster and recreate it again.

What is my question

How to deal with such problems. It's seems that operator not able to restore cluster. Can you please explain the algorithm, I do not want to recreate mysql cluster every time.

What is ENV

kubernetes 1.14.6
rook ceph 0.9.3/13.2.6
mysql operator 0.3.9
mysql version 5.7
mysql_conf:

mysql_cluster_mysqlconf_innodb_buffer_pool_size: "128M"
mysql_cluster_mysqlconf_innodb_buffer_pool_chunk_size: "128M"
mysql_cluster_mysqlconf_innodb_log_file_size: "16M"
mysql_cluster_mysqlconf_max_connections: "40"
mysql_cluster_mysqlconf_thread_pool_size: "20"
mysql_cluster_mysqlconf_expire_logs_days: "5"

The text was updated successfully, but these errors were encountered:

AMecea · 2020-07-15T15:51:11Z

Hi @DeamonMV, thank you for filling this issue.

Here are some tips that may help in the future:

Deleting slave mysql node didn't helped.

When a pod is deleted the PVC is not removed. So in order to reinitialize a node, you can scale it down to 1 replica and then back. By doing this it will delete the PVC for replica and it will reinitialize from node 0.

Also, can you tell me which was the master before the update? Because it seems that node 0 was slave and node 1 was master and after update node 0 become new master and node 1 new slave, is that correct? If yes then you set node 0 slave to node 1 so you made them co-masters.

DeamonMV · 2020-07-15T17:28:29Z

Hello @AMecea

Sorry, but I can't tell for sure, but I remember that cluster-com-mysql-0 was a master.

When a pod is deleted the PVC is not removed. So in order to reinitialize a node, you can scale it down to 1 replica and then back. By doing this it will delete the PVC for replica and it will reinitialize from node 0.

Yes, I just deleted a pod, and do not touched the amount of replicas of mysql cluster.

Is here exist a way to do such reinitialization of mysql node without deleting a PVC? just to be sure that here no any other way.

elderapo · 2021-04-25T18:47:15Z

I just experienced a similar issue (well it happened 3 weeks ago but now I just realized that my slave is out of sync 🤭).

2021-04-25T18:39:04.408564Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2021-04-25T18:39:04.410166Z 0 [Note] mysqld (mysqld 5.7.26-29-log) starting as process 1 ...
2021-04-25T18:39:04.428066Z 0 [Note] InnoDB: PUNCH HOLE support available
2021-04-25T18:39:04.428117Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2021-04-25T18:39:04.428123Z 0 [Note] InnoDB: Uses event mutexes
2021-04-25T18:39:04.428129Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2021-04-25T18:39:04.428133Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2021-04-25T18:39:04.428138Z 0 [Note] InnoDB: Using Linux native AIO
2021-04-25T18:39:04.428741Z 0 [Note] InnoDB: Number of pools: 1
2021-04-25T18:39:04.429387Z 0 [Note] InnoDB: Using CPU crc32 instructions
2021-04-25T18:39:04.434439Z 0 [Note] InnoDB: Initializing buffer pool, total size = 1.25G, instances = 1, chunk size = 128M
2021-04-25T18:39:04.483808Z 0 [Note] InnoDB: Completed initialization of buffer pool
2021-04-25T18:39:04.498492Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2021-04-25T18:39:04.522930Z 0 [Note] InnoDB: Crash recovery did not find the parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite
2021-04-25T18:39:04.525201Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2021-04-25T18:39:05.806990Z 0 [Note] InnoDB: Created parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite, size 3932160 bytes
2021-04-25T18:39:06.135764Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2021-04-25T18:39:06.136194Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2021-04-25T18:39:06.355848Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2021-04-25T18:39:06.356944Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2021-04-25T18:39:06.356977Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2021-04-25T18:39:06.358679Z 0 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.7.26-29 started; log sequence number 295607608711
2021-04-25T18:39:06.359449Z 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2021-04-25T18:39:06.359758Z 0 [Note] Plugin 'FEDERATED' is disabled.
2021-04-25T18:39:15.885339Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2021-04-25T18:39:15.885378Z 0 [Note] Skipping generation of SSL certificates as certificate files are present in data directory.
2021-04-25T18:39:15.888148Z 0 [Warning] CA certificate ca.pem is self signed.
2021-04-25T18:39:15.888204Z 0 [Note] Skipping generation of RSA key pair as key files are present in data directory.
2021-04-25T18:39:15.889990Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
2021-04-25T18:39:15.890088Z 0 [Note] IPv6 is available.
2021-04-25T18:39:15.890096Z 0 [Note] - '::' resolves to '::';
2021-04-25T18:39:15.890120Z 0 [Note] Server socket created on IP: '::'.

AMecea mentioned this issue Nov 3, 2020

Let the orchestrator handle the read-only/writable of nodes #627

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update of mysql cluster lead to problem with replication. #565

Update of mysql cluster lead to problem with replication. #565

DeamonMV commented Jul 15, 2020 •

edited

Loading

AMecea commented Jul 15, 2020

DeamonMV commented Jul 15, 2020

elderapo commented Apr 25, 2021

Update of mysql cluster lead to problem with replication. #565

Update of mysql cluster lead to problem with replication. #565

Comments

DeamonMV commented Jul 15, 2020 • edited Loading

AMecea commented Jul 15, 2020

DeamonMV commented Jul 15, 2020

elderapo commented Apr 25, 2021

DeamonMV commented Jul 15, 2020 •

edited

Loading