-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
many keepalive watchdog fired log in tiflash error log #4192
Comments
Is this error only seen when TLS enabled? |
Not sure, need to test on non-tls TiDB cluster. |
test on nightly(2021-03-13) TiDB cluster, with tls on
172.16.5.81:7535 is TiKV server, ipv4:172.16.5.82:9635 is TiFlash server. with tls off, does not found these errors. So it seems another problem with tls enabled. |
The root cause is in client-c, we set Some possible solutions:
I prefer to use the first solution because
After some discussion with @zanmato1984 and @fuzhe1989, I decide to set the timeout of keepalive ping to 8000ms as a quick fix, and if we update |
I will do cherry-pick to 5.X |
Unfortunately, after set
And since the keep_alive_timeout is 8 seconds, so if the ping ack event is not handled at the first time it is polled, there will be a high probability of timeout. Since GRPC 1.31.0, there is no limit of handling event number in |
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
set up a cluster with tls enable, then run mpp queries
2. What did you expect to see? (Required)
the error log should be clean
3. What did you see instead (Required)
many keepalive watchdog fired log in tiflash error log
It seems the log is harmless since the query is not affected, but we still need to find the root cause, and avoid this error.
4. What is your TiFlash version? (Required)
master @ a605801
The text was updated successfully, but these errors were encountered: