-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Healthy MySQL goes OFFLINE_HARD by no reason #1039
Comments
@obion : proxysql doesn't believe that the server is dead, but is not able to verify that is a master (
You should:
|
@renecannao Thanks for an answer. Got it. |
When this occurs, does your proxysql process go to 100% CPU utilization?
…On Jun 5, 2017 3:24 PM, "obion" ***@***.***> wrote:
@renecannao <https://github.com/renecannao> Thanks for an answer. Got it.
Is it right, that proxysql marks server offline_hard on a *single* failed
check and only manual intervention can fix it?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1039 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AVCtPKkAVWT0UFcm8EX8Ky64kQhuBisIks5sBFXggaJpZM4Nt5oi>
.
|
nope, nothing is visible on CPU metrics during this incident. |
@obion : can you please share the error log? I want to dig into it a bit further. |
@renecannao |
@obion : the error log is full of dumps of the |
I thought that's just a status update :) |
+1 on this issue, we're seeing a server go into Here is some supporting data: And these are the configuration values in our environment:
We are using |
I didn't reproduce it yet, but looking the code I think I can see the bug (references to 1.3.7, but any version is affected).
A workaround is to This is a serious bug, and will be fixed ASAP. |
Server returns in Restarting proxysql resolves the situation, and for the meantime we'll be increasing |
Status of the server was not changed if: * was OFFLINE_HARD * was incorrectly present in myhgm.mysql_servers Added also a mutex to serialize the read_only actions
Status of the server was not changed if: * was OFFLINE_HARD * was incorrectly present in myhgm.mysql_servers Added also a mutex to serialize the read_only actions
Issue solved. |
@renecannao, what release do you anticipate this bugfix will be included in? |
Both 1.3.8 and 1.4.1 . @ryanschwartz , since you compile it yourself on Alpine, maybe you don't need to wait for an official release? There are just 3 commits from 1.3.7 to 1.3.8 : v1.3.7...v1.3.8-dev |
@renecannao thank you for the update. We've moved all our containers to Ubuntu, alpine is so old school. ;-) |
That's a great news!! 😄 |
@renecannao - Issue persists on 1.3.8.
Happy to provide any log data that would be helpful. Ping and read_only log data immediately after the above: https://gist.github.com/ryanschwartz/d92082bce5b724908df4c5f153bf44a0 |
That's disappointing. |
Container log: container.log.gz |
Any update on this issue @renecannao? |
This issue is similar to sysown#1039
Hi, we have a similar issue, where a server goes to OFFLINE_HARD due to unknown issue, but it is online:
We run proxysql + percona with group replication in Kubernetes, and saw this happening during a stress test when we were killing nodes to see how it switches between primary/replica. But always 1 node at a time, to keep at least 2 percona pods running. Could not find it in the documentation - is a node supposed to come back ONLINE from OFFLINE HARD when it's reachable again? |
This is an issue that is over 6 years old and now completely out of context.
I see several errors:
For reference, error 111 is:
Yes, so probably the node it wasn't healthy. I am closing this issue because I do not think this last comment is relevant to this issue, and this issue itself is obsolete. |
Hi @renecannao, thanks for taking a look. We actually run it in Kubernetes, and the test was to bring down the primary percona pod and then wait for other pods to take over. We use a custom script which gets pods from k8s, translates them to IP addresses and saves to mysql_server table. Not sure why it was done that way, I found a few issues mentioning DNS caching so probably it was a problem in 2021. We will try to stop using the custom script and try to just use hostnames. |
Hi!
Some strange thing happens sporadically by unknown yet reason.
There are 3 mysql servers in a pool. master + 2 slaves. Write queries are going to master. Read queries to slaves.
All mysql's are running behind 3 proxysql servers that are running on different servers.
Applications are connecting to proxysql via 127.0.0.1.
All setup can work without any issues let's say for a week.
After some unknown time, one of three proxysql's decides that master is dead and server is being marked as OFFLINE_HARD until I restart proxysql.
After restart, everything is good for another unknown period of time.
This happens from time to time on any of proxysql.
According to log file, there happens some timeout on checking read_only state and that's all is visible.
I also checked these queries and got empty set for both. Maybe because log retention is not long or maybe because of proxysql empties the log. Don't know yet.
select * from mysql_server_ping_log where hostname="172.31.55.104" and ping_error not null; Empty set (0.00 sec)
select * from mysql_server_connect_log where hostname="172.31.55.104" and connect_error not null; Empty set (0.00 sec)
Could please anyone help what exactly makes proxysql think that server is dead? Maybe additional logging is required here in order to find out which one checks fails.
And also it's unknown how to make server alive back without restarting proxysql.
Thank you in advance.
The text was updated successfully, but these errors were encountered: