Proxysql marking node as OFFLINE_HARD, adds it to offline HG and never recovers even though node is shown as ONLINE #3865

brakissgh · 2022-05-02T14:15:03Z

Hello,

we have a 3 node percona group replication cluster which we are accessing through proxysql.
1 Node is writer and reader 2 nodes are reader only.

The moment we run FLUSH TABLES against the underlying DB proxysql marks the node this is executed on as OFFLINE_HARD, moves it into the offline hostgroup and, what makes things worse, it never recovers that host.
The host is showing as ONLINE and viable candidate is true but it remains in the offline HG. The entry marked as OFFLINE_HARD in the reader HG eventually gets removed so just the one in the offline hostgroup remains.
Removing from mysql_servers and running LOAD TO RUNTIME does not fix the issue, the node gets readded to runtime with hostgroup still set to the offline HG. Restarting proxysql or the DB node seems to be the only thing that helps.

runtime_mysql_servers content reported from proxysql:
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 30 | proxysql.log | 3307 | 0 | OFFLINE_HARD | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 40 | 1.2.3.1 | 3307 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 10 | 1.2.3.3 | 3307 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 30 | 1.2.3.3 | 3307 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 30 | 1.2.3.2 | 3307 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+

proxysql version is proxysql-2.3.0-1.x86_64 (rpm obtained here: https://github.com/sysown/proxysql/releases/tag/v2.3.0)
same issue was observed with proxysql 2.2.0
OS is AlmaLinux release 8.5 (Arctic Sphynx)
percona version is 8.0.25-15

We can reliably reproduce this issue by running FLUSH TABLES on any of the 3 DB nodes.

proxysql.cnf.gz
proxysql.log.gz

brakissgh · 2022-06-02T08:33:55Z

Hello,

its been a month now since I opened this issue. Do you need more information? is this the wrong place for this type of thing?

shobhitrathore · 2022-09-06T16:34:52Z

I am also facing the same issue with ProxySQL version 2.4.1-1-g1ea371d with Group Replication !
When i killed proxysql monitor threads on backend servers, then it reconnects and marks the offline_hostgroup as ONLINE again !

kasabov · 2023-11-06T22:19:55Z

We see the same during a disaster recovery drills. A node fails proxysql monitor check and is put into OFFLINE_HARD from which it never recovers. I see no reason for a proxy software to never retry a node which we manually inserted into "mysql_servers" (and load to runtime ofc). Only relevant entries in the proxysql logs are:

2023-11-06 19:19:30 MySQL_HostGroups_Manager.cpp:1709:commit(): [WARNING] Removed server at address 140532875575520, hostgroup 30, address x.x.8.95 port 6032. Setting status OFFLINE HARD and immediately dropping all free connections. Used connections will be dropped when trying to use them
2023-11-06 19:24:32 MySQL_Session.cpp:2566:handler_again___status_SETTING_GENERIC_VARIABLE(): [ERROR] Detected a broken connection while while setting character_set_results on (30,x.x.x.x,6032,2804) , user xxxxxx , last_used 1114225825ms ago : 9004, Detected offline server prior to statement execution

renecannao · 2023-11-06T23:19:33Z

Only relevant entries in the proxysql logs are ...

I assume this is the log here:
#1039 (comment)

Well, there is a lot more relevant entries...

kasabov · 2023-11-07T12:39:03Z

@renecannao Anything that will explain why an offline server is never ever retried?

renecannao · 2023-11-07T14:40:56Z

@kasabov : what are the steps to reproduce this?
Are you sure the server was never retried? What was the content of the tables in the monitor schema during the alleged issue?

kasabov · 2023-11-13T15:13:22Z

We're still working out the exact steps to reproduce this. Happens with both ProxySQL version 2.4.5 and 2.5.5.

This is from another reproduced case from today; the only relevant entries from 'monitor' database are lots of these in 'mysql_server_group_replication_log':

| percona-0.percona-instances | 6032 | 1699888568299348 | 4154            | YES              | NO        | 0                   | NULL  |
| percona-1.percona-instances | 6032 | 1699888733302503 | 3850            | YES              | YES       | 0                   | NULL  |

I'm not saying that the server was never retried. I'm asking why stop retrying. At some point the server is online, but I never see it anymore in the "runtime_mysql_servers" table. Am I wrongly assuming that it should have an entry there at all times (with any status)? The solution to this is manually reloading the mysql_servers to runtime.

dnl555 · 2024-04-21T10:06:59Z

I am having exactly the same issue, all my nodes were put in the offline hg and never came back. After restart my proxysql pods everything worked again.

renecannao · 2024-04-21T10:08:38Z

You all still running 2.3.0 ?

renecannao · 2024-04-21T10:11:45Z

Closing this issue, it is absolutely outdated.

If you are facing a similar issue, please follow "New issue" template and provide all the detailed required information.
Before opening a ticket, please carefully read the error log: it is very verbose on what proxysql performs according to monitoring events

lalitpercona mentioned this issue Dec 1, 2022

MGR monitor view ERR #2464

Open

renecannao closed this as completed Apr 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proxysql marking node as OFFLINE_HARD, adds it to offline HG and never recovers even though node is shown as ONLINE #3865

Proxysql marking node as OFFLINE_HARD, adds it to offline HG and never recovers even though node is shown as ONLINE #3865

brakissgh commented May 2, 2022 •

edited

Loading

brakissgh commented Jun 2, 2022

shobhitrathore commented Sep 6, 2022

kasabov commented Nov 6, 2023 •

edited

Loading

renecannao commented Nov 6, 2023

kasabov commented Nov 7, 2023

renecannao commented Nov 7, 2023

kasabov commented Nov 13, 2023 •

edited

Loading

dnl555 commented Apr 21, 2024

renecannao commented Apr 21, 2024

renecannao commented Apr 21, 2024

Proxysql marking node as OFFLINE_HARD, adds it to offline HG and never recovers even though node is shown as ONLINE #3865

Proxysql marking node as OFFLINE_HARD, adds it to offline HG and never recovers even though node is shown as ONLINE #3865

Comments

brakissgh commented May 2, 2022 • edited Loading

brakissgh commented Jun 2, 2022

shobhitrathore commented Sep 6, 2022

kasabov commented Nov 6, 2023 • edited Loading

renecannao commented Nov 6, 2023

kasabov commented Nov 7, 2023

renecannao commented Nov 7, 2023

kasabov commented Nov 13, 2023 • edited Loading

dnl555 commented Apr 21, 2024

renecannao commented Apr 21, 2024

renecannao commented Apr 21, 2024

brakissgh commented May 2, 2022 •

edited

Loading

kasabov commented Nov 6, 2023 •

edited

Loading

kasabov commented Nov 13, 2023 •

edited

Loading