Healthy MySQL goes OFFLINE_HARD by no reason #1039

andreygolev · 2017-06-02T06:57:36Z

Hi!
Some strange thing happens sporadically by unknown yet reason.
There are 3 mysql servers in a pool. master + 2 slaves. Write queries are going to master. Read queries to slaves.
All mysql's are running behind 3 proxysql servers that are running on different servers.
Applications are connecting to proxysql via 127.0.0.1.

All setup can work without any issues let's say for a week.
After some unknown time, one of three proxysql's decides that master is dead and server is being marked as OFFLINE_HARD until I restart proxysql.
After restart, everything is good for another unknown period of time.
This happens from time to time on any of proxysql.

According to log file, there happens some timeout on checking read_only state and that's all is visible.

I also checked these queries and got empty set for both. Maybe because log retention is not long or maybe because of proxysql empties the log. Don't know yet.
select * from mysql_server_ping_log where hostname="172.31.55.104" and ping_error not null; Empty set (0.00 sec)
select * from mysql_server_connect_log where hostname="172.31.55.104" and connect_error not null; Empty set (0.00 sec)

2017-06-01 17:16:55 MySQL_Monitor.cpp:586:monitor_read_only_thread(): [ERROR] Timeout on read_only check for 172.31.55.104:3306 after 506ms. If the server is overload, increase mysql-monitor_read_only_timeout. Assuming read_only=1
2017-06-01 17:16:55 [INFO] Dumping current MySQL Servers structures for hostgroup ALL
HID: 1 , address: 172.31.68.0 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 1 , address: 172.31.55.104 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 1 , address: 172.31.92.141 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 0 , address: 172.31.55.104 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
2017-06-01 17:16:55 [INFO] Dumping current MySQL Servers structures for hostgroup ALL
HID: 1 , address: 172.31.68.0 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 1 , address: 172.31.55.104 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 1 , address: 172.31.92.141 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 0 , address: 172.31.55.104 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
2017-06-01 17:16:55 MySQL_HostGroups_Manager.cpp:530:commit(): [WARNING] Removed server at address 140088452248064, hostgroup 0, address 172.31.55.104 port 3306. Setting status OFFLINE HARD and immediately dropping all free connections. Used connections will be dropped when trying to use them
2017-06-01 17:16:55 [INFO] New mysql_replication_hostgroups table
writer_hostgroup: 0 , reader_hostgroup: 1,
2017-06-01 17:16:55 [INFO] New mysql_group_replication_hostgroups table
2017-06-01 17:16:55 MySQL_Session.cpp:2512:handler(): [ERROR] Detected an offline server during query: 172.31.55.104, 3306
2017-06-01 17:16:56 MySQL_Session.cpp:2512:handler(): [ERROR] Detected an offline server during query: 172.31.55.104, 3306
...

Could please anyone help what exactly makes proxysql think that server is dead? Maybe additional logging is required here in order to find out which one checks fails.
And also it's unknown how to make server alive back without restarting proxysql.
Thank you in advance.

The text was updated successfully, but these errors were encountered:

renecannao · 2017-06-02T09:16:08Z

@obion : proxysql doesn't believe that the server is dead, but is not able to verify that is a master (read_only=0), as this message reports:

2017-06-01 17:16:55 MySQL_Monitor.cpp:586:monitor_read_only_thread(): [ERROR] Timeout on read_only check for 172.31.55.104:3306 after 506ms. If the server is overload, increase mysql-monitor_read_only_timeout. Assuming read_only=1

You should:

increase mysql-monitor_read_only_timeout
check the read only check time in mysql_server_read_only_log

andreygolev · 2017-06-05T19:24:13Z

@renecannao Thanks for an answer. Got it.
Is it right, that proxysql marks server offline_hard on a single failed check and only manual intervention can fix it?

paulcarlucci · 2017-06-05T19:31:02Z

When this occurs, does your proxysql process go to 100% CPU utilization?

…

On Jun 5, 2017 3:24 PM, "obion" ***@***.***> wrote: @renecannao <https://github.com/renecannao> Thanks for an answer. Got it. Is it right, that proxysql marks server offline_hard on a *single* failed check and only manual intervention can fix it? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1039 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AVCtPKkAVWT0UFcm8EX8Ky64kQhuBisIks5sBFXggaJpZM4Nt5oi> .

andreygolev · 2017-06-06T14:51:39Z

nope, nothing is visible on CPU metrics during this incident.

renecannao · 2017-06-06T15:16:08Z

@obion : can you please share the error log? I want to dig into it a bit further.
Server 172.31.55.104 was set OFFLINE_HARD in hostgroup 0, but was still online in hostgroup 1 , therefore proxysql should have automatically bring it back to hostgroup when the next read_only check was going to return 0.
Also, what is the value of mysql-monitor_read_only_interval ?

andreygolev · 2017-06-06T18:38:21Z

@renecannao
I hope you need this one log.
proxysql.txt
mysql-monitor_read_only_interval is 1500

renecannao · 2017-06-07T00:18:03Z

@obion : the error log is full of dumps of the mysql_servers table.
I will investigate what could cause it.

andreygolev · 2017-06-07T08:07:12Z

I thought that's just a status update :)
Thank you!

ryanschwartz · 2017-06-19T20:26:36Z

+1 on this issue, we're seeing a server go into OFFLINE_HARD and not come back to ONLINE status even after ping checks are healthy. Interestingly, only the hostgroup 0 connection stays in OFFLINE_HARD - the hostgroup 1 entry shows ONLINE.

Here is some supporting data:
https://gist.github.com/ryanschwartz/4ad7d27f047d492a7a602de9a757e98b#file-gistfile1-txt-L272
https://gist.github.com/ryanschwartz/4ad7d27f047d492a7a602de9a757e98b#file-gistfile1-txt-L284

And these are the configuration values in our environment:

mysql_variables=
{
    threads=4
    max_connections=2048 # default
    default_query_delay=0 # default
    default_query_timeout=86400000 # default (ms)
    have_compress=true # default
    poll_timeout=2000 # default
    interfaces="0.0.0.0:6033;/tmp/proxysql.sock"
    default_schema="information_schema"
    stacksize=1048576 # default = 0
    server_version="5.5.30" # default
    connect_timeout_server=3000 # default = 1000 (ms)
    monitor_username="monitor"
    monitor_password="LOL-PASSWORD-HERE"
    monitor_history=600000 # default
    monitor_connect_interval=60000 # default = 120000 (ms)
    monitor_ping_interval=5000 # default = 60000 (ms)
    monitor_read_only_interval=1500 # default = 1000 (ms)
    monitor_read_only_timeout=500 # default = 800 (ms)
    ping_interval_server_msec=120000 # default = 60000 (ms)
    ping_timeout_server=500 # default = 200
    commands_stats=true # default
    sessions_sort=true # default
    connect_retries_on_failure=5 # default = 10
}

We are using ProxySQL version 1.3.2-1-gd71a745, codename Truls.

renecannao · 2017-06-19T23:45:11Z

I didn't reproduce it yet, but looking the code I think I can see the bug (references to 1.3.7, but any version is affected).
When the server come back as read_only=0 , it enters in this part of the code.
Because status is OFFLINE_HARD, it will take action:

dump mysql servers from runtime to database : this writes a lot in error log
run and UPDATE that doesn't change the status: this is the bug
load mysql servers from database to runtime : this also writes a lot in error log

A workaround is to DELETE the row in mysql_servers and run LOAD MYSQL SERVERS TO RUNTIME fast enough before Monitor module kicks in: this will avoid the need to restart.

This is a serious bug, and will be fixed ASAP.
Thanks.

ryanschwartz · 2017-06-20T20:33:24Z

A workaround is to DELETE the row in mysql_servers and run LOAD MYSQL SERVERS TO RUNTIME fast enough before Monitor module kicks in: this will avoid the need to restart.

MySQL [(none)]> select * from mysql_servers;delete from mysql_servers where hostgroup_id=0;LOAD MYSQL SERVERS TO RUNTIME;select * from mysql_servers;
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 0            | mul-db-d01-use1b | 3306 | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

MySQL [(none)]> select * from mysql_servers;
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 0            | mul-db-d01-use1b | 3306 | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

MySQL [(none)]> SELECT * FROM monitor.mysql_server_read_only_log ORDER BY time_start_us DESC LIMIT 10;
+------------------+------+------------------+-----------------+-----------+-------+
| hostname         | port | time_start_us    | success_time_us | read_only | error |
+------------------+------+------------------+-----------------+-----------+-------+
| mul-db-d01-use1b | 3306 | 1497990623646568 | 929             | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990622146604 | 831             | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990620646517 | 805             | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990619146401 | 802             | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990617646330 | 830             | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990616174053 | 1131            | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990614646253 | 1188            | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990613146128 | 987             | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990611645907 | 785             | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990610145960 | 874             | 0         | NULL  |
+------------------+------+------------------+-----------------+-----------+-------+
10 rows in set (0.00 sec)

Server returns in OFFLINE_HARD after ~1-2 seconds.

Restarting proxysql resolves the situation, and for the meantime we'll be increasing monitor_read_only_timeout to 1250ms.

Status of the server was not changed if: * was OFFLINE_HARD * was incorrectly present in myhgm.mysql_servers Added also a mutex to serialize the read_only actions

renecannao · 2017-06-24T22:18:27Z

Issue solved.
Thank you for the report

ryanschwartz · 2017-06-26T18:25:30Z

@renecannao, what release do you anticipate this bugfix will be included in?

renecannao · 2017-06-26T18:32:46Z

Both 1.3.8 and 1.4.1 .

@ryanschwartz , since you compile it yourself on Alpine, maybe you don't need to wait for an official release? There are just 3 commits from 1.3.7 to 1.3.8 : v1.3.7...v1.3.8-dev

ryanschwartz · 2017-06-26T18:36:48Z

@renecannao thank you for the update. We've moved all our containers to Ubuntu, alpine is so old school. ;-)

renecannao · 2017-06-26T18:40:35Z

That's a great news!! 😄

ryanschwartz · 2017-06-28T21:41:35Z

@renecannao - Issue persists on 1.3.8. ☹️

MySQL [(none)]> SELECT strftime('%s','now')\G select @@version;
*************************** 1. row ***************************
strftime('%s','now'): 1498685838
1 row in set (0.00 sec)

+------------------+
| @@version        |
+------------------+
| 1.3.8-1-g1a10b36 |
+------------------+
1 row in set (0.00 sec)

MySQL [(none)]> SELECT * FROM stats.stats_mysql_connection_pool;
+-----------+------------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host         | srv_port | status       | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+------------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 1         | mul-db-d01-use1b | 3306     | ONLINE       | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 265        |
| 1         | mul-db-d01-use1c | 3306     | ONLINE       | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 0          |
| 0         | mul-db-d01-use1b | 3306     | OFFLINE_HARD | 1        | 0        | 3      | 0       | 1064    | 135782          | 142849          | 265        |
+-----------+------------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
3 rows in set (0.00 sec)

MySQL [(none)]> select * from mysql_servers;
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 0            | mul-db-d01-use1b | 3306 | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

MySQL [(none)]> SELECT strftime('%s','now')\G
*************************** 1. row ***************************
strftime('%s','now'): 1498685838
1 row in set (0.00 sec)

MySQL [(none)]> select * from mysql_servers;delete from mysql_servers where hostgroup_id=0;LOAD MYSQL SERVERS TO RUNTIME; SELECT strftime('%s','now')\G select * from mysql_servers;
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 0            | mul-db-d01-use1b | 3306 | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

*************************** 1. row ***************************
strftime('%s','now'): 1498685838
1 row in set (0.00 sec)

+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

MySQL [(none)]> SELECT strftime('%s','now')\G select * from mysql_servers;
*************************** 1. row ***************************
strftime('%s','now'): 1498685838
1 row in set (0.00 sec)

+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

MySQL [(none)]> SELECT strftime('%s','now')\G select * from mysql_servers;
*************************** 1. row ***************************
strftime('%s','now'): 1498685838
1 row in set (0.00 sec)

+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

MySQL [(none)]> SELECT strftime('%s','now')\G select * from mysql_servers;
*************************** 1. row ***************************
strftime('%s','now'): 1498685839
1 row in set (0.00 sec)

+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

MySQL [(none)]> SELECT strftime('%s','now')\G select * from mysql_servers;
*************************** 1. row ***************************
strftime('%s','now'): 1498685839
1 row in set (0.00 sec)

+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

MySQL [(none)]> SELECT strftime('%s','now')\G select * from mysql_servers;
*************************** 1. row ***************************
strftime('%s','now'): 1498685839
1 row in set (0.00 sec)

+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 0            | mul-db-d01-use1b | 3306 | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

MySQL [(none)]> SELECT strftime('%s','now')\G select * from mysql_servers;
*************************** 1. row ***************************
strftime('%s','now'): 1498685840
1 row in set (0.00 sec)

+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 0            | mul-db-d01-use1b | 3306 | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

Happy to provide any log data that would be helpful. Ping and read_only log data immediately after the above: https://gist.github.com/ryanschwartz/d92082bce5b724908df4c5f153bf44a0

renecannao · 2017-06-28T22:01:06Z

That's disappointing.
Can you please share proxysql.log ?

ryanschwartz · 2017-06-28T22:15:53Z

Container log: container.log.gz

ryanschwartz · 2017-07-06T16:50:50Z

Any update on this issue @renecannao?

This issue is similar to #1039

This issue is similar to sysown#1039

relgames · 2023-11-06T21:57:27Z

Hi, we have a similar issue, where a server goes to OFFLINE_HARD due to unknown issue, but it is online:

2023-11-06 19:19:25 MySQL_Monitor.cpp:1913:monitor_group_replication_thread(): [ERROR] Got error. mmsd 0x7fd05b40d300 , MYSQL 0x7fd059600000 , FD 42 : Lost connection to MySQL server during query
2023-11-06 19:19:30 MySQL_HostGroups_Manager.cpp:4981:update_group_replication_set_offline(): [WARNING] Group Replication: setting host 10.112.8.95:6032 offline because: Lost connection to MySQL server during query
2023-11-06 19:19:30 MySQL_HostGroups_Manager.cpp:1709:commit(): [WARNING] Removed server at address 140532875575520, hostgroup 30, address 10.112.8.95 port 6032. Setting status OFFLINE HARD and immediately dropping all free connections. Used connections will be dropped when trying to use them

We run proxysql + percona with group replication in Kubernetes, and saw this happening during a stress test when we were killing nodes to see how it switches between primary/replica. But always 1 node at a time, to keep at least 2 percona pods running.

Could not find it in the documentation - is a node supposed to come back ONLINE from OFFLINE HARD when it's reachable again?

proxysql.log

renecannao · 2023-11-06T23:12:22Z

This is an issue that is over 6 years old and now completely out of context.
"We have a similar issue" is perhaps not really "similar" since you are referring to group replication that was not even supported when this issue was created.

where a server goes to OFFLINE_HARD due to unknown issue

I see several errors:

Lost connection to MySQL server at 'reading authorization packet', system error: 0
Lost connection to MySQL server at 'handshake: reading initial communication packet', system error: 111.

For reference, error 111 is: OS error code 111: Connection refused .
In other words, nothing was listening on that port.

is a node supposed to come back ONLINE from OFFLINE HARD when it's reachable again?

Yes, so probably the node it wasn't healthy.
Any evidence that it was healthy?

I am closing this issue because I do not think this last comment is relevant to this issue, and this issue itself is obsolete.
@relgames : if you have a reproducible test case, please feel free to create a new issue.
Thanks

relgames · 2023-11-07T10:35:19Z

Hi @renecannao, thanks for taking a look.

We actually run it in Kubernetes, and the test was to bring down the primary percona pod and then wait for other pods to take over. We use a custom script which gets pods from k8s, translates them to IP addresses and saves to mysql_server table. Not sure why it was done that way, I found a few issues mentioning DNS caching so probably it was a problem in 2021.

We will try to stop using the custom script and try to just use hostnames.

renecannao added a commit that referenced this issue Jun 24, 2017

Servers did not recover from RO=1 to RO=0 #1039

27e28a5

Status of the server was not changed if: * was OFFLINE_HARD * was incorrectly present in myhgm.mysql_servers Added also a mutex to serialize the read_only actions

renecannao added a commit that referenced this issue Jun 24, 2017

Servers did not recover from RO=1 to RO=0 #1039

08bc9f6

Status of the server was not changed if: * was OFFLINE_HARD * was incorrectly present in myhgm.mysql_servers Added also a mutex to serialize the read_only actions

renecannao closed this as completed Jun 25, 2017

renecannao reopened this Jul 7, 2017

renecannao mentioned this issue Jul 7, 2017

Immediatelly kill all client connections using an OFFLINE node #1085

Closed

renecannao added a commit that referenced this issue Mar 7, 2018

Server disappearing when RO=1 becomes RO=0

cfc89a6

This issue is similar to #1039

pondix pushed a commit to pondix/proxysql that referenced this issue Mar 7, 2018

Server disappearing when RO=1 becomes RO=0

ac81c25

This issue is similar to sysown#1039

renecannao closed this as completed Nov 6, 2023

renecannao mentioned this issue Nov 6, 2023

Proxysql marking node as OFFLINE_HARD, adds it to offline HG and never recovers even though node is shown as ONLINE #3865

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Healthy MySQL goes OFFLINE_HARD by no reason #1039

Healthy MySQL goes OFFLINE_HARD by no reason #1039

andreygolev commented Jun 2, 2017 •

edited by renecannao

Loading

renecannao commented Jun 2, 2017

andreygolev commented Jun 5, 2017

paulcarlucci commented Jun 5, 2017 via email

andreygolev commented Jun 6, 2017

renecannao commented Jun 6, 2017

andreygolev commented Jun 6, 2017

renecannao commented Jun 7, 2017

andreygolev commented Jun 7, 2017

ryanschwartz commented Jun 19, 2017 •

edited

Loading

renecannao commented Jun 19, 2017

ryanschwartz commented Jun 20, 2017

renecannao commented Jun 24, 2017

ryanschwartz commented Jun 26, 2017 •

edited

Loading

renecannao commented Jun 26, 2017

ryanschwartz commented Jun 26, 2017

renecannao commented Jun 26, 2017

ryanschwartz commented Jun 28, 2017

renecannao commented Jun 28, 2017

ryanschwartz commented Jun 28, 2017

ryanschwartz commented Jul 6, 2017

relgames commented Nov 6, 2023

renecannao commented Nov 6, 2023

relgames commented Nov 7, 2023

Healthy MySQL goes OFFLINE_HARD by no reason #1039

Healthy MySQL goes OFFLINE_HARD by no reason #1039

Comments

andreygolev commented Jun 2, 2017 • edited by renecannao Loading

renecannao commented Jun 2, 2017

andreygolev commented Jun 5, 2017

paulcarlucci commented Jun 5, 2017 via email

andreygolev commented Jun 6, 2017

renecannao commented Jun 6, 2017

andreygolev commented Jun 6, 2017

renecannao commented Jun 7, 2017

andreygolev commented Jun 7, 2017

ryanschwartz commented Jun 19, 2017 • edited Loading

renecannao commented Jun 19, 2017

ryanschwartz commented Jun 20, 2017

renecannao commented Jun 24, 2017

ryanschwartz commented Jun 26, 2017 • edited Loading

renecannao commented Jun 26, 2017

ryanschwartz commented Jun 26, 2017

renecannao commented Jun 26, 2017

ryanschwartz commented Jun 28, 2017

renecannao commented Jun 28, 2017

ryanschwartz commented Jun 28, 2017

ryanschwartz commented Jul 6, 2017

relgames commented Nov 6, 2023

renecannao commented Nov 6, 2023

relgames commented Nov 7, 2023

andreygolev commented Jun 2, 2017 •

edited by renecannao

Loading

ryanschwartz commented Jun 19, 2017 •

edited

Loading

ryanschwartz commented Jun 26, 2017 •

edited

Loading