Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Question: debug failed backends checks #1236

Closed
NOmri opened this issue Sep 6, 2020 · 3 comments
Closed

Question: debug failed backends checks #1236

NOmri opened this issue Sep 6, 2020 · 3 comments

Comments

@NOmri
Copy link

NOmri commented Sep 6, 2020

Lately we've upgrade our orchestrator instance to ubuntu18 with version 3.1.4
since then we experience a lot of failed checked to the backends which getting sorted out in one of the next checks.
the backends are up and the failures of the checks seems random.
this creates a-lot of confusion because checking the ui makes it impossible to know what is the status of the clusters

How can we debug it further to understand why the checks are failing?
attached orchestrator conf (user and pass removed)
orc.conf.txt

Thanks!

@shlomi-noach
Copy link
Collaborator

First, please run with --debug --stack and see if you spot any interesting error messages.

Next, do you perhaps have a low setting for open file limit (ulimit -n)? If you have many servers in your topologies, then consider increasing nofile to some higher vlaue, e.g. 8192

Last, let's try increasing some connection timeouts? Possibly your network times out. The default values are actually pretty permissive, but worth testing.

  • MySQLConnectTimeoutSeconds: default 2s
  • MySQLDiscoveryReadTimeoutSeconds: default 10s

@NOmri
Copy link
Author

NOmri commented Sep 6, 2020

Thanks! i will check it out

@NOmri
Copy link
Author

NOmri commented Sep 7, 2020

Thanks Shlomi, although ulimit -n is 63536
The service itself had only 1024-4096 increasing it to 8192 seems to solve the issue.
Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants