Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cassandra allow checking for up/down nodes from multiple hosts. #202

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dmilcevski
Copy link

This pull-request allows for checking multiple Cassandra hosts in a cluster in a case where one host is down, the other hosts might provide the information how many nodes are up or down. If you check only one host, and it happen to be that this host is down, you cannot get notifications. The fixes allow for checking for multiple addresses, and if all hosts in the set are down, then a CRITICAL message is returned.

This pull request also requires changes in the Nodetool.pm file. I will create a separate pull request for this. The changes there are to allow skipping connection refused from down hosts, but still able to return OK if the number of up hosts is bigger then the threshold.

… hosts might provide the desired status information. If all hosts are down, a CRITICAL message is returned.
@dmilcevski dmilcevski changed the title Cherry branch Cassandra allow checking for up/down nodes from multiple hosts. Aug 24, 2018
@HariSekhon
Copy link
Owner

HariSekhon commented Dec 31, 2018

Thanks very much for the pull requests.

This can actually be solved in a simpler and more generic way by using either a Load Balancer (HAProxy is free and config is ready-made and available below, which is also a sub-repo to this one and used in many CI tests):

https://github.com/HariSekhon/haproxy-configs/blob/master/cassandra-jmx.cfg

or via a dynamic query to find_active_server.py in a subshell to pass to any Nagios Plugin, so you don't have to rewrite and complicate existing Nagios Plugins for technologies that later added Active/Passive Master HA like Hadoop etc.

https://github.com/HariSekhon/devops-python-tools/blob/master/find_active_server.py

The load balanced method is better in that it reduces the number of queries being sent by tools like Nagios Plugins when one or more of the nodes are offline (it's common on larger clusters to have some nodes offline for maintenance, disk replacements, patches, upgrades etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants