Skip to content

Commit

Permalink
gnatsd: -health, -rank, -lease, -beat
Browse files Browse the repository at this point in the history
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes #433
  • Loading branch information
glycerine committed Feb 14, 2017
1 parent 37ce250 commit 7385177
Show file tree
Hide file tree
Showing 22 changed files with 2,893 additions and 17 deletions.
148 changes: 148 additions & 0 deletions health/ALGORITHM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# The ALLCALL leader election algorithm.

Jason E. Aten

February 2017

definition, with example value.
-----------

Let heartBeat = 3 sec. This is how frequently
we will assess cluster health by
sending out an allcall ping.

givens
--------

* Given: Let each server have a numeric integer rank, that is distinct
and unique to that server. If necessary an extremely long
true random number is used to break ties between server ranks, so
that we may assert, with probability 1, that all ranks are distinct.
Server can be put into a strict total order.

* Rule: The lower the rank is preferred for being the leader.


ALLCALL Algorithm
===========================

### I. In a continuous loop

The server always accepts and respond
to allcall broadcasts from other cluster members.

The allcall() ping asks each member of
the server cluster to reply. Replies
provide the responder's own assigned rank
and identity.

### II. Election

Election are computed locally, after accumulating
responses.

After issuing an allcall, the server
listens for a heartbeat interval.

At the end of the interval, it sorts
the respondents by rank. Since
the replies are on a broadcast
channel, more than one simultaneous
allcall reply may be incorporated into
the set of respondents.

The lowest ranking server is elected
as leader.

## Safety/Convergence: ALLCALL converges to one leader

Suppose two nodes are partitioned and so both are leaders on
their own side of the network. Then suppose the network
is joined again, so the two leaders are brought together
by a healing of the network, or by adding a new link
between the networks. The two nodes exchange Ids and
ranks via responding to allcalls, and compute
who is the new leader.

Hence the two leader situation persists for at
most one heartbeat term after the network join.

## Liveness: a leader will be chosen

Given the total order among nodes, exactly one
will be lowest rank and thus be the preferred
leader. If the leader fails, the next
heartbeat of allcall will omit that
server from the candidate list, and
the next ranking server will be chosen.

Hence, with at least one live
node, the system can run for at most one
heartbeat term before electing a leader.
Since there is a total order on
all live (non-failed) servers, only
one will be chosen.

## commentary

ALLCALL does not guarantee that there will
never be more than one leader. Availability
in the face of network partition is
desirable in many cases, and ALLCALL is
appropriate for these. This is congruent
with Nats design as an always-on system.

ALLCALL does not guarantee that a
leader will always be present, but
with live nodes it does provide
that the cluster will have a leader
after one heartbeat term has
been initiated and completed.

By design, ALLCALL functions well
in a cluster with any number of nodes.
One and two nodes, or an even number
of nodes, will work just fine.

Compared to quorum based elections
like raft and paxos, where an odd
number of at least three
nodes is required to make progress,
this can be very desirable.

ALLCALL is appropriate for AP,
rather than CP, style systems, where
availability is more important
than having a single writer. When
writes are idempotent or deduplicated
downstream, this is typically preferred.

prior art
----------

ALLCALL is a simplified version of
the well known Bully Algorithm[1][2]
for election in a distributed system
of arbitrary graph.

ALLCALL does less bullying, and lets
nodes arrive at their own conclusions.
The essential broadcast and ranking
mechanism, however, is identical.

[1] Hector Garcia-Molina, Elections in a
Distributed Computing System, IEEE
Transactions on Computers,
Vol. C-31, No. 1, January (1982) 48–59

[2] https://en.wikipedia.org/wiki/Bully_algorithm

implementation
------------

ALLCALL is implented on top of
the Nats (https://nats.io) system, see the health/
subdirectory of

https://github.com/nats-io/gnatsd

77 changes: 77 additions & 0 deletions health/agent.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
package health

import (
"fmt"
"net"
"time"

"github.com/nats-io/gnatsd/server"
)

// Agent implements the InternalClient interface.
// It provides health status checks and
// leader election from among the candidate
// gnatsd instances in a cluster.
type Agent struct {
opts *server.Options
mship *Membership
}

// NewAgent makes a new Agent.
func NewAgent(opts *server.Options) *Agent {
return &Agent{
opts: opts,
}
}

// Name should identify the internal client for logging.
func (h *Agent) Name() string {
return "health-agent"
}

// Start makes an internal
// entirely in-process client that monitors
// cluster health and manages group
// membership functions.
//
func (h *Agent) Start(
info server.Info,
opts server.Options,
logger server.Logger,

) (net.Conn, error) {

// To keep the health client fast and its traffic
// internal-only, we use an bi-directional,
// in-memory version of a TCP stream.
//
// The buffers really do have to be of
// sufficient size, or we will
// deadlock/livelock the system.
//
cli, srv, err := NewInternalClientPair()
if err != nil {
return nil, fmt.Errorf("NewInternalClientPair() returned error: %s", err)
}

rank := opts.HealthRank
beat := opts.HealthBeat
lease := opts.HealthLease

cfg := &MembershipCfg{
MaxClockSkew: time.Second,
BeatDur: beat,
LeaseTime: lease,
MyRank: rank,
CliConn: cli,
Log: logger,
}
h.mship = NewMembership(cfg)
go h.mship.Start()
return srv, nil
}

// Stop halts the background goroutine.
func (h *Agent) Stop() {
h.mship.Stop()
}
89 changes: 89 additions & 0 deletions health/aloc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
package health

import (
"encoding/json"
"os"
"time"

"github.com/nats-io/go-nats"
)

// AgentLoc conveys to interested parties
// the Id and location of one gnatsd
// server in the cluster.
type AgentLoc struct {
ID string `json:"serverId"`
Host string `json:"host"`
Port int `json:"port"`

// Are we the leader?
IsLeader bool `json:"leader"`

// LeaseExpires is zero for any
// non-leader. For the leader,
// LeaseExpires tells you when
// the leaders lease expires.
LeaseExpires time.Time `json:"leaseExpires"`

// lower rank is leader until lease
// expires. Ties are broken by ID.
// Rank should be assignable on the
// gnatsd command line with -rank to
// let the operator prioritize
// leadership for certain hosts.
Rank int `json:"rank"`

// Pid or process id is the only
// way to tell apart two processes
// sometimes, if they share the
// same nats server.
//
// Pid is the one difference between
// a nats.ServerLoc and a health.AgentLoc.
//
Pid int `json:"pid"`
}

func (s *AgentLoc) String() string {
by, err := json.Marshal(s)
panicOn(err)
return string(by)
}

func (s *AgentLoc) fromBytes(by []byte) error {
return json.Unmarshal(by, s)
}

func alocEqual(a, b *AgentLoc) bool {
aless := AgentLocLessThan(a, b)
bless := AgentLocLessThan(b, a)
return !aless && !bless
}

func slocEqualIgnoreLease(a, b *AgentLoc) bool {
a0 := *a
b0 := *b
a0.LeaseExpires = time.Time{}
a0.IsLeader = false
b0.LeaseExpires = time.Time{}
b0.IsLeader = false

aless := AgentLocLessThan(&a0, &b0)
bless := AgentLocLessThan(&b0, &a0)
return !aless && !bless
}

// the 2 types should be kept in sync.
// We return a brand new &AgentLoc{}
// with contents filled from loc.
func natsLocConvert(loc *nats.ServerLoc) *AgentLoc {
return &AgentLoc{
ID: loc.ID,
Host: loc.Host,
Port: loc.Port,
IsLeader: loc.IsLeader,
LeaseExpires: loc.LeaseExpires,
Rank: loc.Rank,
Pid: os.Getpid(),
}
}
Loading

0 comments on commit 7385177

Please sign in to comment.