Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxysql marking node as OFFLINE_HARD, adds it to offline HG and never recovers even though node is shown as ONLINE #3865

Closed
brakissgh opened this issue May 2, 2022 · 10 comments

Comments

@brakissgh
Copy link

brakissgh commented May 2, 2022

Hello,

we have a 3 node percona group replication cluster which we are accessing through proxysql.
1 Node is writer and reader 2 nodes are reader only.

The moment we run FLUSH TABLES against the underlying DB proxysql marks the node this is executed on as OFFLINE_HARD, moves it into the offline hostgroup and, what makes things worse, it never recovers that host.
The host is showing as ONLINE and viable candidate is true but it remains in the offline HG. The entry marked as OFFLINE_HARD in the reader HG eventually gets removed so just the one in the offline hostgroup remains.
Removing from mysql_servers and running LOAD TO RUNTIME does not fix the issue, the node gets readded to runtime with hostgroup still set to the offline HG. Restarting proxysql or the DB node seems to be the only thing that helps.

runtime_mysql_servers content reported from proxysql:
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 30 | proxysql.log | 3307 | 0 | OFFLINE_HARD | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 40 | 1.2.3.1 | 3307 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 10 | 1.2.3.3 | 3307 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 30 | 1.2.3.3 | 3307 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 30 | 1.2.3.2 | 3307 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
+--------------+---------------+------+-----------+--------------+--------+-------------+-----------------+---------------------+---------+----------------+---------+

proxysql version is proxysql-2.3.0-1.x86_64 (rpm obtained here: https://github.com/sysown/proxysql/releases/tag/v2.3.0)
same issue was observed with proxysql 2.2.0
OS is AlmaLinux release 8.5 (Arctic Sphynx)
percona version is 8.0.25-15

We can reliably reproduce this issue by running FLUSH TABLES on any of the 3 DB nodes.

proxysql.cnf.gz
proxysql.log.gz

@brakissgh
Copy link
Author

Hello,

its been a month now since I opened this issue. Do you need more information? is this the wrong place for this type of thing?

@shobhitrathore
Copy link

I am also facing the same issue with ProxySQL version 2.4.1-1-g1ea371d with Group Replication !
When i killed proxysql monitor threads on backend servers, then it reconnects and marks the offline_hostgroup as ONLINE again !

@kasabov
Copy link

kasabov commented Nov 6, 2023

We see the same during a disaster recovery drills. A node fails proxysql monitor check and is put into OFFLINE_HARD from which it never recovers. I see no reason for a proxy software to never retry a node which we manually inserted into "mysql_servers" (and load to runtime ofc). Only relevant entries in the proxysql logs are:

2023-11-06 19:19:30 MySQL_HostGroups_Manager.cpp:1709:commit(): [WARNING] Removed server at address 140532875575520, hostgroup 30, address x.x.8.95 port 6032. Setting status OFFLINE HARD and immediately dropping all free connections. Used connections will be dropped when trying to use them
2023-11-06 19:24:32 MySQL_Session.cpp:2566:handler_again___status_SETTING_GENERIC_VARIABLE(): [ERROR] Detected a broken connection while while setting character_set_results on (30,x.x.x.x,6032,2804) , user xxxxxx , last_used 1114225825ms ago : 9004, Detected offline server prior to statement execution

@renecannao
Copy link
Contributor

Only relevant entries in the proxysql logs are ...

I assume this is the log here:
#1039 (comment)

Well, there is a lot more relevant entries...

@kasabov
Copy link

kasabov commented Nov 7, 2023

@renecannao Anything that will explain why an offline server is never ever retried?

@renecannao
Copy link
Contributor

@kasabov : what are the steps to reproduce this?
Are you sure the server was never retried? What was the content of the tables in the monitor schema during the alleged issue?

@kasabov
Copy link

kasabov commented Nov 13, 2023

We're still working out the exact steps to reproduce this. Happens with both ProxySQL version 2.4.5 and 2.5.5.

This is from another reproduced case from today; the only relevant entries from 'monitor' database are lots of these in 'mysql_server_group_replication_log':

| percona-0.percona-instances | 6032 | 1699888568299348 | 4154            | YES              | NO        | 0                   | NULL  |
| percona-1.percona-instances | 6032 | 1699888733302503 | 3850            | YES              | YES       | 0                   | NULL  |

I'm not saying that the server was never retried. I'm asking why stop retrying. At some point the server is online, but I never see it anymore in the "runtime_mysql_servers" table. Am I wrongly assuming that it should have an entry there at all times (with any status)? The solution to this is manually reloading the mysql_servers to runtime.

@dnl555
Copy link

dnl555 commented Apr 21, 2024

I am having exactly the same issue, all my nodes were put in the offline hg and never came back. After restart my proxysql pods everything worked again.

@renecannao
Copy link
Contributor

You all still running 2.3.0 ?

@renecannao
Copy link
Contributor

Closing this issue, it is absolutely outdated.

If you are facing a similar issue, please follow "New issue" template and provide all the detailed required information.
Before opening a ticket, please carefully read the error log: it is very verbose on what proxysql performs according to monitoring events

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants