Question: debug failed backends checks #1236

NOmri · 2020-09-06T11:36:27Z

Lately we've upgrade our orchestrator instance to ubuntu18 with version 3.1.4
since then we experience a lot of failed checked to the backends which getting sorted out in one of the next checks.
the backends are up and the failures of the checks seems random.
this creates a-lot of confusion because checking the ui makes it impossible to know what is the status of the clusters

How can we debug it further to understand why the checks are failing?
attached orchestrator conf (user and pass removed)
orc.conf.txt

Thanks!

shlomi-noach · 2020-09-06T12:12:42Z

First, please run with --debug --stack and see if you spot any interesting error messages.

Next, do you perhaps have a low setting for open file limit (ulimit -n)? If you have many servers in your topologies, then consider increasing nofile to some higher vlaue, e.g. 8192

Last, let's try increasing some connection timeouts? Possibly your network times out. The default values are actually pretty permissive, but worth testing.

MySQLConnectTimeoutSeconds: default 2s
MySQLDiscoveryReadTimeoutSeconds: default 10s

NOmri · 2020-09-06T12:54:11Z

Thanks! i will check it out

NOmri · 2020-09-07T05:18:24Z

Thanks Shlomi, although ulimit -n is 63536
The service itself had only 1024-4096 increasing it to 8192 seems to solve the issue.
Thanks!

NOmri closed this as completed Sep 7, 2020

nivedreddy mentioned this issue Jun 19, 2021

Adding systemd open file limit #1372

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: debug failed backends checks #1236

Question: debug failed backends checks #1236

NOmri commented Sep 6, 2020

shlomi-noach commented Sep 6, 2020

NOmri commented Sep 6, 2020

NOmri commented Sep 7, 2020

Question: debug failed backends checks #1236

Question: debug failed backends checks #1236

Comments

NOmri commented Sep 6, 2020

shlomi-noach commented Sep 6, 2020

NOmri commented Sep 6, 2020

NOmri commented Sep 7, 2020