Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnatsd: -health and -rank options #435

Closed
wants to merge 1 commit into from

Conversation

glycerine
Copy link

@glycerine glycerine commented Feb 11, 2017

This is about half done. There's a bunch of polish (especially
around all the verbose and ugly logging the election
produces) and some tests are still left to do.

Still, one can see the basic architecture. And its that basic integration
architecture/approach that I wanted to get feedback on early. In
particular and specifically, if someone would vet the additions to server.go,
that's the integration point and the most likely place where I've
missed something important.

TODO:
[ ] get ServerRank actually configured, and see if ServerLocation() can be eliminated or finished out.
[ ] write up the leader election algorithm used here. It is ultra simple but still needs documenting.
[ ] write tests for the election
[ ] finish getting the health info nats-top is used to into a regular nats-topic
[ ] hook up that output to nats-top

eventual commit message, proposed:

Cluster health monitoring and
leader election and priority
assignment are made available
with the -health and -rank
flags to gnatsd.

The InternalClient interface offers
plugin capability for running
internal clients.

Allows nats-top to monitor the health
of the whole cluster.

Fixes #433 and nats-io/nats-top#31

@glycerine
Copy link
Author

NB, The CI checks will fail because the corresponding glycerine/go-nats minhealth branch is used (https://github.com/glycerine/go-nats/tree/minhealth if you want to pull them both to manually build). I'll polish the go-nats additions, which are small, and do a separate PR for them so that CI is useful here again.

@glycerine
Copy link
Author

glycerine commented Feb 12, 2017

the PR for go-nats that will let you manually build this: https://github.com/nats-io/go-nats/pull/265

status update:
[x] get HealthRank actually configured
[x] write up the leader election algorithm. Done: https://github.com/glycerine/gnatsd/blob/minhealth/health/ALGORITHM.md
[x] write tests for the election. Done.
[ ] finish getting the health info nats-top is used to into a regular nats-topic
[ ] hook up that output to nats-top

@glycerine glycerine force-pushed the minhealth branch 14 times, most recently from 427238f to 7385177 Compare February 14, 2017 04:04
@glycerine
Copy link
Author

glycerine commented Feb 14, 2017

The dependency, https://github.com/nats-io/go-nats/pull/265, is ready for review. Travis can work on this PR again once that one is merged.

@glycerine
Copy link
Author

glycerine commented Feb 14, 2017

Status update: this is in good shape now. It doesn't have the nats-top part done yet, but the election and health monitoring agent are tested and polished.

@glycerine glycerine force-pushed the minhealth branch 3 times, most recently from ff74b9c to a4e017e Compare February 14, 2017 04:53
@glycerine glycerine changed the title gnatsd: -health and -rank options -- not ready for merge gnatsd: -health and -rank options Feb 14, 2017
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
@derekcollison
Copy link
Member

Closing this due to age.

@varunpalekar
Copy link

We need this feature of health check

@derekcollison
Copy link
Member

Help me understand what you are trying to do and what problem this is solving?

@varunpalekar
Copy link

Sorry, it gets lost in my todos. That time I was testing nats-server helm chart for deployment on kubernetes without using nats operators.
So facing some issue with healthcheck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cluster health monitoring - uses for system internal/read-only topics
3 participants