Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Support for Consul Health Checks #1343

Merged
merged 21 commits into from
Oct 27, 2015
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
80ad971
Implemented Docker health checks
diptanu Oct 22, 2015
24ed164
Merge branch 'master' into f-docker-check
diptanu Oct 23, 2015
71ede8a
Implemented Docker health checks
diptanu Oct 22, 2015
83db728
Registering the Exec with Docker Daemon everytime the check is invoked
diptanu Oct 26, 2015
2632bb3
Fixed merge conflicts
diptanu Oct 26, 2015
d695012
Adding a debug log to indicate the exit code of failed check
diptanu Oct 26, 2015
40f72a8
Marking the state of a service as critical if the Docker Daemon doesn…
diptanu Oct 26, 2015
31cdf4f
Added some tests for docker check
diptanu Oct 26, 2015
809e9f5
Extracted the logic of figuring out the shell and fixing the logic to…
diptanu Oct 26, 2015
5f8f531
Defaulting to Monitor check
diptanu Oct 26, 2015
4c1818e
Collect and truncate the output from docker exec
diptanu Oct 26, 2015
b4af7f4
Updated the comment for CheckType
diptanu Oct 26, 2015
f5f5ed0
Making sure the script is not empty if it's a docker check
diptanu Oct 26, 2015
423f7fb
Not adding the docker check if we couldn't create the client
diptanu Oct 26, 2015
9efbd1a
Fixed the Fake Docker client to simulate Exec start failures
diptanu Oct 26, 2015
5827865
Added a test for exit code 1 with docker exec
diptanu Oct 26, 2015
f0c783d
Added a test to check if we are properly truncating docker exec outputs
diptanu Oct 27, 2015
1e240b5
Fixed the tests
diptanu Oct 27, 2015
471442e
Making an explicit check to test whether a check is of type Monitor
diptanu Oct 27, 2015
2fdcf1a
Added a test for selecting shell from env
diptanu Oct 27, 2015
3d68d06
Forcing the Env variable to empty while testing the default shell logic
diptanu Oct 27, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 26 additions & 1 deletion command/agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,9 @@ type Agent struct {
// checkTTLs maps the check ID to an associated check TTL
checkTTLs map[string]*CheckTTL

// checkDockers maps the check ID to an associated Docker Exec based check
checkDockers map[string]*CheckDocker

// checkLock protects updates to the check* maps
checkLock sync.Mutex

Expand Down Expand Up @@ -151,6 +154,7 @@ func Create(config *Config, logOutput io.Writer) (*Agent, error) {
checkTTLs: make(map[string]*CheckTTL),
checkHTTPs: make(map[string]*CheckHTTP),
checkTCPs: make(map[string]*CheckTCP),
checkDockers: make(map[string]*CheckDocker),
eventCh: make(chan serf.UserEvent, 1024),
eventBuf: make([]*UserEvent, 256),
shutdownCh: make(chan struct{}),
Expand Down Expand Up @@ -905,7 +909,7 @@ func (a *Agent) AddCheck(check *structs.HealthCheck, chkType *CheckType, persist
tcp.Start()
a.checkTCPs[check.CheckID] = tcp

} else {
} else if chkType.IsMonitor() {
if existing, ok := a.checkMonitors[check.CheckID]; ok {
existing.Stop()
}
Expand All @@ -924,6 +928,27 @@ func (a *Agent) AddCheck(check *structs.HealthCheck, chkType *CheckType, persist
}
monitor.Start()
a.checkMonitors[check.CheckID] = monitor
} else if chkType.IsDocker() {
if existing, ok := a.checkDockers[check.CheckID]; ok {
existing.Stop()
}
if chkType.Interval < MinInterval {
a.logger.Println(fmt.Sprintf("[WARN] agent: check '%s' has interval below minimum of %v",
check.CheckID, MinInterval))
chkType.Interval = MinInterval
}

dockerCheck := &CheckDocker{
Notify: &a.state,
CheckID: check.CheckID,
DockerContainerId: chkType.DockerContainerId,
Shell: chkType.Shell,
Script: chkType.Script,
Interval: chkType.Interval,
Logger: a.logger,
}
dockerCheck.Start()
a.checkDockers[check.CheckID] = dockerCheck
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a final else that returns an error that the type is unknown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slackpad The previous behaviour defaulted to Script check, perhaps we should keep that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would be ok, too.

}

Expand Down
151 changes: 145 additions & 6 deletions command/agent/check.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,14 @@ import (
"log"
"net"
"net/http"
"os"
"os/exec"
"sync"
"syscall"
"time"

"github.com/armon/circbuf"
docker "github.com/fsouza/go-dockerclient"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised there's no official client but didn't see one. There's another by a person that says they work for Docker that's popular.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using this in Nomad and Terraform. The code and docs are great and it's well-supported. 👍

"github.com/hashicorp/consul/consul/structs"
"github.com/hashicorp/go-cleanhttp"
)
Expand All @@ -38,10 +40,12 @@ const (
// Only one of the types needs to be provided
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is now a little stale.

// TTL or Script/Interval or HTTP/Interval or TCP/Interval
type CheckType struct {
Script string
HTTP string
TCP string
Interval time.Duration
Script string
HTTP string
TCP string
Interval time.Duration
DockerContainerId string
Shell string

Timeout time.Duration
TTL time.Duration
Expand All @@ -54,7 +58,7 @@ type CheckTypes []*CheckType

// Valid checks if the CheckType is valid
func (c *CheckType) Valid() bool {
return c.IsTTL() || c.IsMonitor() || c.IsHTTP() || c.IsTCP()
return c.IsTTL() || c.IsMonitor() || c.IsHTTP() || c.IsTCP() || c.IsDocker()
}

// IsTTL checks if this is a TTL type
Expand All @@ -64,7 +68,7 @@ func (c *CheckType) IsTTL() bool {

// IsMonitor checks if this is a Monitor type
func (c *CheckType) IsMonitor() bool {
return c.Script != "" && c.Interval != 0
return c.Script != "" && c.DockerContainerId == "" && c.Interval != 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this made me realize how sloppy the current else is. We should probably add the script check back to Docker and make an explicit check for IsMonitor and have an else that fails. Right now we will roll into creating a monitor when these requirements aren't met. Sorry for the flip-flop on this one.

}

// IsHTTP checks if this is a HTTP type
Expand All @@ -77,6 +81,10 @@ func (c *CheckType) IsTCP() bool {
return c.TCP != "" && c.Interval != 0
}

func (c *CheckType) IsDocker() bool {
return c.DockerContainerId != "" && c.Shell != "" && c.Interval != 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't want the shell check here since it's treated as optional down below.

}

// CheckNotifier interface is used by the CheckMonitor
// to notify when a check has a status update. The update
// should take care to be idempotent.
Expand Down Expand Up @@ -493,3 +501,134 @@ func (c *CheckTCP) check() {
c.Logger.Printf("[DEBUG] agent: check '%v' is passing", c.CheckID)
c.Notify.UpdateCheck(c.CheckID, structs.HealthPassing, fmt.Sprintf("TCP connect %s: Success", c.TCP))
}

// A custom interface since go-dockerclient doesn't have one
// We will use this interface in our test to inject a fake client
type DockerClient interface {
CreateExec(docker.CreateExecOptions) (*docker.Exec, error)
StartExec(string, docker.StartExecOptions) error
InspectExec(string) (*docker.ExecInspect, error)
}

// CheckDocker is used to periodically invoke a script to
// determine the health of an application running inside a
// Docker Container. We assume that the script is compatible
// with nagios plugins and expects the output in the same format.
type CheckDocker struct {
Notify CheckNotifier
CheckID string
Script string
DockerContainerId string
Shell string
Interval time.Duration
Logger *log.Logger

dockerClient DockerClient
cmd []string
stop bool
stopCh chan struct{}
stopLock sync.Mutex
}

// Start is used to start checks.
// Docker Checks runs until stop is called
func (c *CheckDocker) Start() {
c.stopLock.Lock()
defer c.stopLock.Unlock()

//create the docker client
if c.dockerClient == nil {
var err error
c.dockerClient, err = docker.NewClientFromEnv()
if err != nil {
c.Logger.Println("[DEBUG] Error creating the Docker Client : %s", err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might crash later because the client is nil. Should probably return.

}
}

//figure out the shell
if c.Shell == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth putting this into a helper function and letting normal monitors support the new Shell config option as well. It'll default to the safe empty string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add support for specifying shells in normal monitors in a separate PR.

if otherShell := os.Getenv("SHELL"); otherShell != "" {
c.Shell = otherShell
} else {
c.Shell = "/bin/sh"
}
}

c.cmd = []string{c.Shell, "-c", c.Script}

c.stop = false
c.stopCh = make(chan struct{})
go c.run()
}

// Stop is used to stop a docker check.
func (c *CheckDocker) Stop() {
c.stopLock.Lock()
defer c.stopLock.Unlock()
if !c.stop {
c.stop = true
close(c.stopCh)
}
}

// run is invoked by a goroutine to run until Stop() is called
func (c *CheckDocker) run() {
// Get the randomized initial pause time
initialPauseTime := randomStagger(c.Interval)
c.Logger.Printf("[DEBUG] agent: pausing %v before first invocation of %s -c %s in container %s", initialPauseTime, c.Shell, c.Script, c.DockerContainerId)
next := time.After(initialPauseTime)
for {
select {
case <-next:
c.check()
next = time.After(c.Interval)
case <-c.stopCh:
return
}
}
}

func (c *CheckDocker) check() {
//Set up the Exec since
execOpts := docker.CreateExecOptions{
AttachStdin: false,
AttachStdout: true,
AttachStderr: true,
Tty: false,
Cmd: c.cmd,
Container: c.DockerContainerId,
}
var (
exec *docker.Exec
err error
)
if exec, err = c.dockerClient.CreateExec(execOpts); err == nil {
exec = exec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this do anything since it got assigned above? Could probably drop this clause and change the logic to err != nil.

} else {
c.Logger.Printf("[DEBUG] agent: Error while creating Exec: %s", err.Error())
c.Notify.UpdateCheck(c.CheckID, structs.HealthCritical, fmt.Sprintf("Unable to create Exec, error: %s", err.Error()))
return
}

err = c.dockerClient.StartExec(exec.ID, docker.StartExecOptions{Detach: false, Tty: false})
if err != nil {
c.Logger.Printf("[DEBUG] Error in executing health checks: %s", err.Error())
c.Notify.UpdateCheck(c.CheckID, structs.HealthCritical, fmt.Sprintf("Unable to start exec: %s", err.Error()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you capitalized "Exec" in the other error messages.

return
}

execInfo, err := c.dockerClient.InspectExec(exec.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a random curiosity question - is this state the Docker holds onto that we need to clear, or is it something maintained by the client layer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slackpad This is something I have been wondering too. Not sure when does Docker GC the Execs that users have created. The API doesn't say anything about it. This is how docker client does it too based on https://github.com/docker/docker/blob/master/api/client/exec.go.

if err != nil {
c.Logger.Printf("[DEBUG] Error in inspecting check result : %s", err.Error())
c.Notify.UpdateCheck(c.CheckID, structs.HealthCritical, fmt.Sprintf("Unable to inspect Exec: %s", err.Error()))
return
}

if execInfo.ExitCode == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the check output? A (truncated) version of the check output is normally passed in to UpdateCheck.

c.Notify.UpdateCheck(c.CheckID, structs.HealthPassing, fmt.Sprintf("Script execution %s: Success", c.Script))
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An exit code of 1 should set it to warning, not critical.

c.Logger.Printf("[DEBUG] Check failed with exit code: %d", execInfo.ExitCode)
c.Notify.UpdateCheck(c.CheckID, structs.HealthCritical, fmt.Sprintf("Script execution faied with exit code: %s", execInfo.ExitCode))
}

}
Loading