Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthy MySQL goes OFFLINE_HARD by no reason #1039

Closed
andreygolev opened this issue Jun 2, 2017 · 23 comments
Closed

Healthy MySQL goes OFFLINE_HARD by no reason #1039

andreygolev opened this issue Jun 2, 2017 · 23 comments

Comments

@andreygolev
Copy link

andreygolev commented Jun 2, 2017

Hi!
Some strange thing happens sporadically by unknown yet reason.
There are 3 mysql servers in a pool. master + 2 slaves. Write queries are going to master. Read queries to slaves.
All mysql's are running behind 3 proxysql servers that are running on different servers.
Applications are connecting to proxysql via 127.0.0.1.

All setup can work without any issues let's say for a week.
After some unknown time, one of three proxysql's decides that master is dead and server is being marked as OFFLINE_HARD until I restart proxysql.
After restart, everything is good for another unknown period of time.
This happens from time to time on any of proxysql.

According to log file, there happens some timeout on checking read_only state and that's all is visible.

I also checked these queries and got empty set for both. Maybe because log retention is not long or maybe because of proxysql empties the log. Don't know yet.
select * from mysql_server_ping_log where hostname="172.31.55.104" and ping_error not null; Empty set (0.00 sec)
select * from mysql_server_connect_log where hostname="172.31.55.104" and connect_error not null; Empty set (0.00 sec)

2017-06-01 17:16:55 MySQL_Monitor.cpp:586:monitor_read_only_thread(): [ERROR] Timeout on read_only check for 172.31.55.104:3306 after 506ms. If the server is overload, increase mysql-monitor_read_only_timeout. Assuming read_only=1
2017-06-01 17:16:55 [INFO] Dumping current MySQL Servers structures for hostgroup ALL
HID: 1 , address: 172.31.68.0 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 1 , address: 172.31.55.104 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 1 , address: 172.31.92.141 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 0 , address: 172.31.55.104 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
2017-06-01 17:16:55 [INFO] Dumping current MySQL Servers structures for hostgroup ALL
HID: 1 , address: 172.31.68.0 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 1 , address: 172.31.55.104 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 1 , address: 172.31.92.141 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
HID: 0 , address: 172.31.55.104 , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 60 , use_ssl: 0 , max_latency_ms: 0 , comment:
2017-06-01 17:16:55 MySQL_HostGroups_Manager.cpp:530:commit(): [WARNING] Removed server at address 140088452248064, hostgroup 0, address 172.31.55.104 port 3306. Setting status OFFLINE HARD and immediately dropping all free connections. Used connections will be dropped when trying to use them
2017-06-01 17:16:55 [INFO] New mysql_replication_hostgroups table
writer_hostgroup: 0 , reader_hostgroup: 1,
2017-06-01 17:16:55 [INFO] New mysql_group_replication_hostgroups table
2017-06-01 17:16:55 MySQL_Session.cpp:2512:handler(): [ERROR] Detected an offline server during query: 172.31.55.104, 3306
2017-06-01 17:16:56 MySQL_Session.cpp:2512:handler(): [ERROR] Detected an offline server during query: 172.31.55.104, 3306
...

Could please anyone help what exactly makes proxysql think that server is dead? Maybe additional logging is required here in order to find out which one checks fails.
And also it's unknown how to make server alive back without restarting proxysql.
Thank you in advance.

@renecannao
Copy link
Contributor

@obion : proxysql doesn't believe that the server is dead, but is not able to verify that is a master (read_only=0), as this message reports:

2017-06-01 17:16:55 MySQL_Monitor.cpp:586:monitor_read_only_thread(): [ERROR] Timeout on read_only check for 172.31.55.104:3306 after 506ms. If the server is overload, increase mysql-monitor_read_only_timeout. Assuming read_only=1

You should:

  • increase mysql-monitor_read_only_timeout
  • check the read only check time in mysql_server_read_only_log

@andreygolev
Copy link
Author

@renecannao Thanks for an answer. Got it.
Is it right, that proxysql marks server offline_hard on a single failed check and only manual intervention can fix it?

@paulcarlucci
Copy link

paulcarlucci commented Jun 5, 2017 via email

@andreygolev
Copy link
Author

nope, nothing is visible on CPU metrics during this incident.

@renecannao
Copy link
Contributor

@obion : can you please share the error log? I want to dig into it a bit further.
Server 172.31.55.104 was set OFFLINE_HARD in hostgroup 0, but was still online in hostgroup 1 , therefore proxysql should have automatically bring it back to hostgroup when the next read_only check was going to return 0.
Also, what is the value of mysql-monitor_read_only_interval ?

@andreygolev
Copy link
Author

@renecannao
I hope you need this one log.
proxysql.txt
mysql-monitor_read_only_interval is 1500

@renecannao
Copy link
Contributor

@obion : the error log is full of dumps of the mysql_servers table.
I will investigate what could cause it.

@andreygolev
Copy link
Author

I thought that's just a status update :)
Thank you!

@ryanschwartz
Copy link

ryanschwartz commented Jun 19, 2017

+1 on this issue, we're seeing a server go into OFFLINE_HARD and not come back to ONLINE status even after ping checks are healthy. Interestingly, only the hostgroup 0 connection stays in OFFLINE_HARD - the hostgroup 1 entry shows ONLINE.

Here is some supporting data:
https://gist.github.com/ryanschwartz/4ad7d27f047d492a7a602de9a757e98b#file-gistfile1-txt-L272
https://gist.github.com/ryanschwartz/4ad7d27f047d492a7a602de9a757e98b#file-gistfile1-txt-L284

And these are the configuration values in our environment:

mysql_variables=
{
    threads=4
    max_connections=2048 # default
    default_query_delay=0 # default
    default_query_timeout=86400000 # default (ms)
    have_compress=true # default
    poll_timeout=2000 # default
    interfaces="0.0.0.0:6033;/tmp/proxysql.sock"
    default_schema="information_schema"
    stacksize=1048576 # default = 0
    server_version="5.5.30" # default
    connect_timeout_server=3000 # default = 1000 (ms)
    monitor_username="monitor"
    monitor_password="LOL-PASSWORD-HERE"
    monitor_history=600000 # default
    monitor_connect_interval=60000 # default = 120000 (ms)
    monitor_ping_interval=5000 # default = 60000 (ms)
    monitor_read_only_interval=1500 # default = 1000 (ms)
    monitor_read_only_timeout=500 # default = 800 (ms)
    ping_interval_server_msec=120000 # default = 60000 (ms)
    ping_timeout_server=500 # default = 200
    commands_stats=true # default
    sessions_sort=true # default
    connect_retries_on_failure=5 # default = 10
}

We are using ProxySQL version 1.3.2-1-gd71a745, codename Truls.

@renecannao
Copy link
Contributor

I didn't reproduce it yet, but looking the code I think I can see the bug (references to 1.3.7, but any version is affected).
When the server come back as read_only=0 , it enters in this part of the code.
Because status is OFFLINE_HARD, it will take action:

A workaround is to DELETE the row in mysql_servers and run LOAD MYSQL SERVERS TO RUNTIME fast enough before Monitor module kicks in: this will avoid the need to restart.

This is a serious bug, and will be fixed ASAP.
Thanks.

@ryanschwartz
Copy link

A workaround is to DELETE the row in mysql_servers and run LOAD MYSQL SERVERS TO RUNTIME fast enough before Monitor module kicks in: this will avoid the need to restart.

MySQL [(none)]> select * from mysql_servers;delete from mysql_servers where hostgroup_id=0;LOAD MYSQL SERVERS TO RUNTIME;select * from mysql_servers;
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 0            | mul-db-d01-use1b | 3306 | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

MySQL [(none)]> select * from mysql_servers;
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 0            | mul-db-d01-use1b | 3306 | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

MySQL [(none)]> SELECT * FROM monitor.mysql_server_read_only_log ORDER BY time_start_us DESC LIMIT 10;
+------------------+------+------------------+-----------------+-----------+-------+
| hostname         | port | time_start_us    | success_time_us | read_only | error |
+------------------+------+------------------+-----------------+-----------+-------+
| mul-db-d01-use1b | 3306 | 1497990623646568 | 929             | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990622146604 | 831             | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990620646517 | 805             | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990619146401 | 802             | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990617646330 | 830             | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990616174053 | 1131            | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990614646253 | 1188            | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990613146128 | 987             | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990611645907 | 785             | 0         | NULL  |
| mul-db-d01-use1b | 3306 | 1497990610145960 | 874             | 0         | NULL  |
+------------------+------+------------------+-----------------+-----------+-------+
10 rows in set (0.00 sec)

Server returns in OFFLINE_HARD after ~1-2 seconds.

Restarting proxysql resolves the situation, and for the meantime we'll be increasing monitor_read_only_timeout to 1250ms.

renecannao added a commit that referenced this issue Jun 24, 2017
Status of the server was not changed if:
* was OFFLINE_HARD
* was incorrectly present in myhgm.mysql_servers

Added also a mutex to serialize the read_only actions
renecannao added a commit that referenced this issue Jun 24, 2017
Status of the server was not changed if:
* was OFFLINE_HARD
* was incorrectly present in myhgm.mysql_servers

Added also a mutex to serialize the read_only actions
@renecannao
Copy link
Contributor

Issue solved.
Thank you for the report

@ryanschwartz
Copy link

ryanschwartz commented Jun 26, 2017

@renecannao, what release do you anticipate this bugfix will be included in?

@renecannao
Copy link
Contributor

Both 1.3.8 and 1.4.1 .

@ryanschwartz , since you compile it yourself on Alpine, maybe you don't need to wait for an official release? There are just 3 commits from 1.3.7 to 1.3.8 : v1.3.7...v1.3.8-dev

@ryanschwartz
Copy link

@renecannao thank you for the update. We've moved all our containers to Ubuntu, alpine is so old school. ;-)

@renecannao
Copy link
Contributor

That's a great news!! 😄

@ryanschwartz
Copy link

@renecannao - Issue persists on 1.3.8. ☹️

MySQL [(none)]> SELECT strftime('%s','now')\G select @@version;
*************************** 1. row ***************************
strftime('%s','now'): 1498685838
1 row in set (0.00 sec)

+------------------+
| @@version        |
+------------------+
| 1.3.8-1-g1a10b36 |
+------------------+
1 row in set (0.00 sec)

MySQL [(none)]> SELECT * FROM stats.stats_mysql_connection_pool;
+-----------+------------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host         | srv_port | status       | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+------------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 1         | mul-db-d01-use1b | 3306     | ONLINE       | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 265        |
| 1         | mul-db-d01-use1c | 3306     | ONLINE       | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 0          |
| 0         | mul-db-d01-use1b | 3306     | OFFLINE_HARD | 1        | 0        | 3      | 0       | 1064    | 135782          | 142849          | 265        |
+-----------+------------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
3 rows in set (0.00 sec)

MySQL [(none)]> select * from mysql_servers;
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 0            | mul-db-d01-use1b | 3306 | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

MySQL [(none)]> SELECT strftime('%s','now')\G
*************************** 1. row ***************************
strftime('%s','now'): 1498685838
1 row in set (0.00 sec)

MySQL [(none)]> select * from mysql_servers;delete from mysql_servers where hostgroup_id=0;LOAD MYSQL SERVERS TO RUNTIME; SELECT strftime('%s','now')\G select * from mysql_servers;
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 0            | mul-db-d01-use1b | 3306 | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

*************************** 1. row ***************************
strftime('%s','now'): 1498685838
1 row in set (0.00 sec)

+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

MySQL [(none)]> SELECT strftime('%s','now')\G select * from mysql_servers;
*************************** 1. row ***************************
strftime('%s','now'): 1498685838
1 row in set (0.00 sec)

+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

MySQL [(none)]> SELECT strftime('%s','now')\G select * from mysql_servers;
*************************** 1. row ***************************
strftime('%s','now'): 1498685838
1 row in set (0.00 sec)

+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

MySQL [(none)]> SELECT strftime('%s','now')\G select * from mysql_servers;
*************************** 1. row ***************************
strftime('%s','now'): 1498685839
1 row in set (0.00 sec)

+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

MySQL [(none)]> SELECT strftime('%s','now')\G select * from mysql_servers;
*************************** 1. row ***************************
strftime('%s','now'): 1498685839
1 row in set (0.00 sec)

+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
2 rows in set (0.00 sec)

MySQL [(none)]> SELECT strftime('%s','now')\G select * from mysql_servers;
*************************** 1. row ***************************
strftime('%s','now'): 1498685839
1 row in set (0.00 sec)

+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 0            | mul-db-d01-use1b | 3306 | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

MySQL [(none)]> SELECT strftime('%s','now')\G select * from mysql_servers;
*************************** 1. row ***************************
strftime('%s','now'): 1498685840
1 row in set (0.00 sec)

+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname         | port | status       | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | mul-db-d01-use1b | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | mul-db-d01-use1c | 3306 | ONLINE       | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 0            | mul-db-d01-use1b | 3306 | OFFLINE_HARD | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+------------------+------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
3 rows in set (0.00 sec)

Happy to provide any log data that would be helpful. Ping and read_only log data immediately after the above: https://gist.github.com/ryanschwartz/d92082bce5b724908df4c5f153bf44a0

@renecannao
Copy link
Contributor

That's disappointing.
Can you please share proxysql.log ?

@ryanschwartz
Copy link

Container log: container.log.gz

@ryanschwartz
Copy link

Any update on this issue @renecannao?

@renecannao renecannao reopened this Jul 7, 2017
renecannao added a commit that referenced this issue Mar 7, 2018
pondix pushed a commit to pondix/proxysql that referenced this issue Mar 7, 2018
@relgames
Copy link

relgames commented Nov 6, 2023

Hi, we have a similar issue, where a server goes to OFFLINE_HARD due to unknown issue, but it is online:

2023-11-06 19:19:25 MySQL_Monitor.cpp:1913:monitor_group_replication_thread(): [ERROR] Got error. mmsd 0x7fd05b40d300 , MYSQL 0x7fd059600000 , FD 42 : Lost connection to MySQL server during query
2023-11-06 19:19:30 MySQL_HostGroups_Manager.cpp:4981:update_group_replication_set_offline(): [WARNING] Group Replication: setting host 10.112.8.95:6032 offline because: Lost connection to MySQL server during query
2023-11-06 19:19:30 MySQL_HostGroups_Manager.cpp:1709:commit(): [WARNING] Removed server at address 140532875575520, hostgroup 30, address 10.112.8.95 port 6032. Setting status OFFLINE HARD and immediately dropping all free connections. Used connections will be dropped when trying to use them

We run proxysql + percona with group replication in Kubernetes, and saw this happening during a stress test when we were killing nodes to see how it switches between primary/replica. But always 1 node at a time, to keep at least 2 percona pods running.

Could not find it in the documentation - is a node supposed to come back ONLINE from OFFLINE HARD when it's reachable again?

proxysql.log

@renecannao
Copy link
Contributor

This is an issue that is over 6 years old and now completely out of context.
"We have a similar issue" is perhaps not really "similar" since you are referring to group replication that was not even supported when this issue was created.

where a server goes to OFFLINE_HARD due to unknown issue

I see several errors:

  • Lost connection to MySQL server at 'reading authorization packet', system error: 0
  • Lost connection to MySQL server at 'handshake: reading initial communication packet', system error: 111.

For reference, error 111 is: OS error code 111: Connection refused .
In other words, nothing was listening on that port.

is a node supposed to come back ONLINE from OFFLINE HARD when it's reachable again?

Yes, so probably the node it wasn't healthy.
Any evidence that it was healthy?

I am closing this issue because I do not think this last comment is relevant to this issue, and this issue itself is obsolete.
@relgames : if you have a reproducible test case, please feel free to create a new issue.
Thanks

@relgames
Copy link

relgames commented Nov 7, 2023

Hi @renecannao, thanks for taking a look.

We actually run it in Kubernetes, and the test was to bring down the primary percona pod and then wait for other pods to take over. We use a custom script which gets pods from k8s, translates them to IP addresses and saves to mysql_server table. Not sure why it was done that way, I found a few issues mentioning DNS caching so probably it was a problem in 2021.

We will try to stop using the custom script and try to just use hostnames.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants