Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replica vttablet with heartbeat enabled will never recover from a MySql outtage #4673

Closed
jschlather opened this issue Feb 26, 2019 · 0 comments · Fixed by #4689
Closed

Replica vttablet with heartbeat enabled will never recover from a MySql outtage #4673

jschlather opened this issue Feb 26, 2019 · 0 comments · Fixed by #4689

Comments

@jschlather
Copy link
Contributor

Overview of the Issue

While chaos testing at Hubspot we discovered that if the MySql process was killed and restarted for a vttablet replica that the vttablet would never reconnect. After some investigation, we found this was due to the way that healthcheck.go verifies the health. The healthcheck is powered by the hearbeat reporter. The heartbeat reporter fetches the latest from the reader. Which caches either the last value or an error. In the case that MySql becomes unreachable, the QueryService will shutdown and will stop the reader. If the reader has received an error on the last query it ran, then it will always return this error until it gets a new reading. But the healthcheck is using this value to determine if the QueryService can be restarted and the heartbeat won’t get a new value until the reader is restarted, which won’t happen until the QueryService is restarted. So, the service never recovers.

Reproduction Steps

Deploy a vttablet replica with heartbeat enabled and then kill the mysql process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant