vttablet /healthz reports 200 / "ok" when not connected to MySQL yet #8237

aquarapid · 2021-06-02T05:43:21Z

In a situation where we start MySQLd under vttablet control (e.g. with -restore_from_backup), as we do in the Vitess Operator, we found the following issue.

If you startup vttablet, while running a /healthz check in a tight loop, e.g.:

while /bin/true ; do echo -n `date +"%T.%N"` ; echo -n " " ; curl -m 0.25 -s http://localhost:15100/healthz ; sleep 0.25 ; done

You will observe something like (after the HTTP port becomes live, of course):

14:56:21.803939901 ok
14:56:22.070264141 ok
14:56:22.337504430 ok
14:56:22.603866197 ok
14:56:22.869842826 500 internal server error: vttablet is not serving
14:56:23.133949851 ok

If you correlate this with the vttablet logs, you will see that first 4 ok checks are from before the tablet transitioned to SERVING, and the state was "NotConnected" (i.e. it was waiting for MySQL to come up).

Normally this would not matter. However, if:

MySQL does not come up at all (corruption)
MySQL takes a long time to come up (extended recovery, upgrade activity)
the /healthz being ok can cause the Vitess Operator to declare the vttablet pod as running, which can have consequences for the operator's rollout of tablet changes, resulting in the tearing down of the next vttablet pod in a shard before the previous one became truly ready.

The problem seems to be in the code here:

https://github.com/vitessio/vitess/blob/main/go/vt/vttablet/tabletserver/tabletserver.go#L1542

where we will only report /healthz as unhealthy if wantState == StateServing; but not if wantState is StateNotConnected

The text was updated successfully, but these errors were encountered:

aquarapid added the Type: Bug label Jun 2, 2021

aquarapid self-assigned this Jun 2, 2021

aquarapid mentioned this issue Jun 2, 2021

/healthz should not report ok when vttablet is not connected to mysql #8238

Merged

harshit-gangal closed this as completed in #8238 Jun 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vttablet /healthz reports 200 / "ok" when not connected to MySQL yet #8237

vttablet /healthz reports 200 / "ok" when not connected to MySQL yet #8237

aquarapid commented Jun 2, 2021

vttablet /healthz reports 200 / "ok" when not connected to MySQL yet #8237

vttablet /healthz reports 200 / "ok" when not connected to MySQL yet #8237

Comments

aquarapid commented Jun 2, 2021