Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS cert provisioning #5597

Merged
merged 102 commits into from
Jun 27, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
686e032
add auto_encrypt.tls to configuration
hanshasselberg Apr 1, 2019
db85507
incomingRPCConfig is never nil
hanshasselberg Apr 1, 2019
20c229b
wip
hanshasselberg Apr 1, 2019
19374ba
tlsendpoint without verifyincoming
hanshasselberg Apr 2, 2019
a3f630f
augment reply
hanshasselberg Apr 3, 2019
47e211a
include roots in response
hanshasselberg Apr 3, 2019
76bb350
fix build
hanshasselberg Apr 3, 2019
51fe2e4
send manual ca along
hanshasselberg Apr 3, 2019
4d9a63d
track connect ca on servers
hanshasselberg Apr 4, 2019
4254e67
progress
hanshasselberg Apr 5, 2019
3274fab
progress
hanshasselberg Apr 5, 2019
8bebc3d
lots of progress
hanshasselberg Apr 8, 2019
36ffb49
better error message
hanshasselberg Apr 8, 2019
018f027
:boom:
hanshasselberg Apr 8, 2019
4e144aa
add gossip option, but do not actually send it
hanshasselberg Apr 8, 2019
bda32bf
fix configuration
hanshasselberg Apr 8, 2019
b99f718
copy config to server config
hanshasselberg Apr 8, 2019
39ab392
more tests and better names
hanshasselberg Apr 9, 2019
e6b96ff
small things
hanshasselberg Apr 9, 2019
387b1cb
rename to autoEncrypt
hanshasselberg Apr 9, 2019
d2d3c68
more changes
hanshasselberg Apr 9, 2019
29597e9
refactor
hanshasselberg Apr 10, 2019
7b35545
client and server mode
hanshasselberg Apr 12, 2019
c93d796
more stuff
hanshasselberg Apr 12, 2019
a920533
enable connect on the server when using auto_encrypt
hanshasselberg Apr 12, 2019
1c33719
provide an empty cert to avoid panic.
hanshasselberg Apr 12, 2019
3f251a8
verify_server_hostname
hanshasselberg Apr 12, 2019
e872758
stuff and more stuff
hanshasselberg Apr 12, 2019
87c42ac
remove unused code
hanshasselberg Apr 12, 2019
ea03647
errfn
hanshasselberg Apr 16, 2019
d512e3d
watch certs
hanshasselberg Apr 17, 2019
72aab1f
small things
hanshasselberg Apr 17, 2019
bb2797b
more things
hanshasselberg Apr 17, 2019
a52d184
enable verify_server_hostname if ca set.
hanshasselberg Apr 18, 2019
acddee2
fix some test
hanshasselberg Apr 18, 2019
8606348
Revert "enable verify_server_hostname if ca set."
hanshasselberg Apr 18, 2019
c892618
do not turn on auto encrypt by default in tests
hanshasselberg Apr 18, 2019
0d06c66
configure auto encrypt for test
hanshasselberg Apr 18, 2019
6af7f6c
remove stuff
hanshasselberg Apr 18, 2019
a4266e5
docs
hanshasselberg Apr 18, 2019
4726854
stop tracking caroots when shutting down
hanshasselberg Apr 18, 2019
c43bce7
remove gossip
hanshasselberg Apr 23, 2019
7725b24
wip
hanshasselberg Apr 23, 2019
180937f
stuff
hanshasselberg Apr 24, 2019
34083d1
tests for retryJoin
hanshasselberg Apr 24, 2019
cc8c2f6
some tests
hanshasselberg Apr 24, 2019
3ca6d21
test
hanshasselberg Apr 25, 2019
53127b1
add agent uri
hanshasselberg Apr 25, 2019
26e9e87
better
hanshasselberg Apr 26, 2019
335771f
more docs
hanshasselberg Apr 28, 2019
2e40459
feedback
hanshasselberg Apr 28, 2019
b484e5e
feedback
hanshasselberg Apr 28, 2019
10d1c31
only track when allowtls
hanshasselberg Apr 28, 2019
5770431
fix couple of things
hanshasselberg Apr 29, 2019
ebe5723
remove import
hanshasselberg Apr 29, 2019
58a8a58
return
hanshasselberg Apr 29, 2019
02bdfc2
check error
hanshasselberg Apr 29, 2019
b4a2189
log afterwards
hanshasselberg Apr 29, 2019
53713c1
add test for AutoEncryptAllowTLS
hanshasselberg Apr 29, 2019
c01eea2
undo test changes
hanshasselberg Apr 29, 2019
648fc7f
newline
hanshasselberg Apr 29, 2019
651dbd4
improve test
hanshasselberg Apr 29, 2019
b452843
retry
hanshasselberg Apr 30, 2019
500e135
changing the response to prepare for cache prepopulation.
hanshasselberg Apr 30, 2019
ea8ea33
fix wrong merge.
hanshasselberg May 2, 2019
79745c7
return when shutting down
hanshasselberg May 2, 2019
d6755cc
add ref about our retry implementation
hanshasselberg May 2, 2019
836c5a0
prepopulate
hanshasselberg May 6, 2019
289e168
cancel more
hanshasselberg May 6, 2019
7b4793d
one func is enough
hanshasselberg May 6, 2019
abba6eb
comments
hanshasselberg May 6, 2019
8ab9117
wip
hanshasselberg May 7, 2019
fad1e28
use ResolveTCPAddr to resolve dns entries
hanshasselberg May 7, 2019
ac880d5
also show server auto_encrypt flag
hanshasselberg May 7, 2019
240662e
temp hack to help debug
hanshasselberg May 8, 2019
71ca588
attempt at resolving addr like memberlist does it
hanshasselberg May 9, 2019
a37cbdf
resolving
hanshasselberg May 13, 2019
8c34082
adds tests
hanshasselberg May 20, 2019
873b8b8
forgot certs
hanshasselberg May 20, 2019
4ac3ae9
comments
hanshasselberg May 21, 2019
4427f37
move output
hanshasselberg Jun 1, 2019
dc8dac3
logging changes
hanshasselberg Jun 4, 2019
9d55979
can interrupt auto_encrypt
hanshasselberg Jun 6, 2019
461889d
add warning in case auto_connect.allow_tls is enabled, but RPC TLS is…
hanshasselberg Jun 6, 2019
9fce82f
make it build again!
hanshasselberg Jun 6, 2019
12725cc
fix tests
hanshasselberg Jun 6, 2019
3aa3e14
use NodeWrite instead of AgentWrite
hanshasselberg Jun 17, 2019
147da47
feedback
hanshasselberg Jun 18, 2019
def3a78
continue on error
hanshasselberg Jun 25, 2019
e9a7aa1
restructure a little, so that forwarding works properly.
hanshasselberg Jun 26, 2019
6f84089
recover from expired certs
hanshasselberg Jun 26, 2019
0f68b2c
error when uri is not service or agent
hanshasselberg Jun 26, 2019
8f2186b
fix paragraph
hanshasselberg Jun 26, 2019
7332a40
fix bad merge
hanshasselberg Jun 27, 2019
d21e29b
add docs to options
hanshasselberg Jun 27, 2019
ed5178b
address pr feedback
hanshasselberg Jun 27, 2019
79a6c90
SignedResponse
hanshasselberg Jun 27, 2019
2df1737
SignedResponse
hanshasselberg Jun 27, 2019
1b8e907
Apply suggestions from code review
hanshasselberg Jun 27, 2019
c8d2e12
reorganize options
hanshasselberg Jun 27, 2019
7f6532f
t.run
hanshasselberg Jun 27, 2019
91da2d5
add files
hanshasselberg Jun 27, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion agent/acl_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ func NewTestACLAgent(name string, hcl string, resolveFn func(string) (acl.Author
config.Source{Name: a.Name + ".data_dir", Format: "hcl", Data: hclDataDir},
)

agent, err := New(a.Config)
agent, err := New(a.Config, nil)
if err != nil {
panic(fmt.Sprintf("Error creating agent: %v", err))
}
Expand Down
243 changes: 207 additions & 36 deletions agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,12 @@ const (
"but no reason was provided. This is a default message."
defaultServiceMaintReason = "Maintenance mode is enabled for this " +
"service, but no reason was provided. This is a default message."

// ID of the roots watch
rootsWatchID = "roots"

// ID of the leaf watch
leafWatchID = "leaf"
)

type configSource int
Expand Down Expand Up @@ -202,6 +208,8 @@ type Agent struct {
shutdownCh chan struct{}
shutdownLock sync.Mutex

InterruptStartCh chan struct{}

// joinLANNotifier is called after a successful JoinLAN.
joinLANNotifier notifier

Expand Down Expand Up @@ -264,40 +272,48 @@ type Agent struct {
persistedTokensLock sync.RWMutex
}

func New(c *config.RuntimeConfig) (*Agent, error) {
func New(c *config.RuntimeConfig, logger *log.Logger) (*Agent, error) {
if c.Datacenter == "" {
return nil, fmt.Errorf("Must configure a Datacenter")
}
if c.DataDir == "" && !c.DevMode {
return nil, fmt.Errorf("Must configure a DataDir")
}

a := &Agent{
config: c,
checkReapAfter: make(map[types.CheckID]time.Duration),
checkMonitors: make(map[types.CheckID]*checks.CheckMonitor),
checkTTLs: make(map[types.CheckID]*checks.CheckTTL),
checkHTTPs: make(map[types.CheckID]*checks.CheckHTTP),
checkTCPs: make(map[types.CheckID]*checks.CheckTCP),
checkGRPCs: make(map[types.CheckID]*checks.CheckGRPC),
checkDockers: make(map[types.CheckID]*checks.CheckDocker),
checkAliases: make(map[types.CheckID]*checks.CheckAlias),
eventCh: make(chan serf.UserEvent, 1024),
eventBuf: make([]*UserEvent, 256),
joinLANNotifier: &systemd.Notifier{},
reloadCh: make(chan chan error),
retryJoinCh: make(chan error),
shutdownCh: make(chan struct{}),
endpoints: make(map[string]string),
tokens: new(token.Store),
}
a.serviceManager = NewServiceManager(a)
a := Agent{
config: c,
checkReapAfter: make(map[types.CheckID]time.Duration),
checkMonitors: make(map[types.CheckID]*checks.CheckMonitor),
checkTTLs: make(map[types.CheckID]*checks.CheckTTL),
checkHTTPs: make(map[types.CheckID]*checks.CheckHTTP),
checkTCPs: make(map[types.CheckID]*checks.CheckTCP),
checkGRPCs: make(map[types.CheckID]*checks.CheckGRPC),
checkDockers: make(map[types.CheckID]*checks.CheckDocker),
checkAliases: make(map[types.CheckID]*checks.CheckAlias),
eventCh: make(chan serf.UserEvent, 1024),
eventBuf: make([]*UserEvent, 256),
joinLANNotifier: &systemd.Notifier{},
reloadCh: make(chan chan error),
retryJoinCh: make(chan error),
shutdownCh: make(chan struct{}),
InterruptStartCh: make(chan struct{}),
endpoints: make(map[string]string),
tokens: new(token.Store),
logger: logger,
}
a.serviceManager = NewServiceManager(&a)

if err := a.initializeACLs(); err != nil {
return nil, err
}

return a, nil
// Retrieve or generate the node ID before setting up the rest of the
// agent, which depends on it.
if err := a.setupNodeID(c); err != nil {
return nil, fmt.Errorf("Failed to setup node ID: %v", err)
}

return &a, nil
}

func LocalConfig(cfg *config.RuntimeConfig) local.Config {
Expand Down Expand Up @@ -348,20 +364,6 @@ func (a *Agent) Start() error {

c := a.config

logOutput := a.LogOutput
if a.logger == nil {
if logOutput == nil {
logOutput = os.Stderr
}
a.logger = log.New(logOutput, "", log.LstdFlags)
}

// Retrieve or generate the node ID before setting up the rest of the
// agent, which depends on it.
if err := a.setupNodeID(c); err != nil {
return fmt.Errorf("Failed to setup node ID: %v", err)
}

// Warn if the node name is incompatible with DNS
if InvalidDnsRe.MatchString(a.config.NodeName) {
a.logger.Printf("[WARN] agent: Node name %q will not be discoverable "+
Expand Down Expand Up @@ -433,6 +435,21 @@ func (a *Agent) Start() error {
// populated from above.
a.registerCache()

if a.config.AutoEncryptTLS && !a.config.ServerMode {
reply, err := a.setupClientAutoEncrypt()
if err != nil {
return fmt.Errorf("AutoEncrypt failed: %s", err)
}
rootsReq, leafReq, err := a.setupClientAutoEncryptCache(reply)
if err != nil {
return fmt.Errorf("AutoEncrypt failed: %s", err)
}
if err = a.setupClientAutoEncryptWatching(rootsReq, leafReq); err != nil {
return fmt.Errorf("AutoEncrypt failed: %s", err)
}
a.logger.Printf("[INFO] AutoEncrypt: upgraded to TLS")
}

// Load checks/services/metadata.
if err := a.loadServices(c); err != nil {
return err
Expand Down Expand Up @@ -532,6 +549,158 @@ func (a *Agent) Start() error {
return nil
}

func (a *Agent) setupClientAutoEncrypt() (*structs.SignedResponse, error) {
client := a.delegate.(*consul.Client)

addrs := a.config.StartJoinAddrsLAN
disco, err := newDiscover()
if err != nil && len(addrs) == 0 {
return nil, err
}
addrs = append(addrs, retryJoinAddrs(disco, "LAN", a.config.RetryJoinLAN, a.logger)...)

reply, priv, err := client.RequestAutoEncryptCerts(addrs, a.config.ServerPort, a.tokens.AgentToken(), a.InterruptStartCh)
if err != nil {
return nil, err
}

connectCAPems := []string{}
for _, ca := range reply.ConnectCARoots.Roots {
connectCAPems = append(connectCAPems, ca.RootCert)
}
if err := a.tlsConfigurator.UpdateAutoEncrypt(reply.ManualCARoots, connectCAPems, reply.IssuedCert.CertPEM, priv, reply.VerifyServerHostname); err != nil {
return nil, err
}
return reply, nil

}

func (a *Agent) setupClientAutoEncryptCache(reply *structs.SignedResponse) (*structs.DCSpecificRequest, *cachetype.ConnectCALeafRequest, error) {
rootsReq := &structs.DCSpecificRequest{
Datacenter: a.config.Datacenter,
QueryOptions: structs.QueryOptions{Token: a.tokens.AgentToken()},
}

// prepolutate roots cache
rootRes := cache.FetchResult{Value: &reply.ConnectCARoots, Index: reply.ConnectCARoots.QueryMeta.Index}
if err := a.cache.Prepopulate(cachetype.ConnectCARootName, rootRes, a.config.Datacenter, a.tokens.AgentToken(), rootsReq.CacheInfo().Key); err != nil {
return nil, nil, err
}

leafReq := &cachetype.ConnectCALeafRequest{
Datacenter: a.config.Datacenter,
Token: a.tokens.AgentToken(),
Agent: a.config.NodeName,
}

// prepolutate leaf cache
certRes := cache.FetchResult{Value: &reply.IssuedCert, Index: reply.ConnectCARoots.QueryMeta.Index}
if err := a.cache.Prepopulate(cachetype.ConnectCALeafName, certRes, a.config.Datacenter, a.tokens.AgentToken(), leafReq.Key()); err != nil {
return nil, nil, err
}
return rootsReq, leafReq, nil
}

func (a *Agent) setupClientAutoEncryptWatching(rootsReq *structs.DCSpecificRequest, leafReq *cachetype.ConnectCALeafRequest) error {
// setup watches
ch := make(chan cache.UpdateEvent, 10)
ctx, cancel := context.WithCancel(context.Background())

// Watch for root changes
err := a.cache.Notify(ctx, cachetype.ConnectCARootName, rootsReq, rootsWatchID, ch)
if err != nil {
cancel()
return err
}

// Watch the leaf cert
err = a.cache.Notify(ctx, cachetype.ConnectCALeafName, leafReq, leafWatchID, ch)
if err != nil {
cancel()
return err
}

// Setup actions in case the watches are firing.
go func() {
for {
select {
case <-a.shutdownCh:
cancel()
return
case <-ctx.Done():
return
case u := <-ch:
switch u.CorrelationID {
case rootsWatchID:
roots, ok := u.Result.(*structs.IndexedCARoots)
if !ok {
err := fmt.Errorf("invalid type for roots response: %T", u.Result)
a.logger.Printf("[ERR] %s watch error: %s", u.CorrelationID, err)
continue
}
pems := []string{}
for _, root := range roots.Roots {
pems = append(pems, root.RootCert)
}
a.tlsConfigurator.UpdateAutoEncryptCA(pems)
case leafWatchID:
leaf, ok := u.Result.(*structs.IssuedCert)
if !ok {
err := fmt.Errorf("invalid type for leaf response: %T", u.Result)
a.logger.Printf("[ERR] %s watch error: %s", u.CorrelationID, err)
continue
}
a.tlsConfigurator.UpdateAutoEncryptCert(leaf.CertPEM, leaf.PrivateKeyPEM)
mkeeler marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
}()

// Setup safety net in case the auto_encrypt cert doesn't get renewed
// in time. The agent would be stuck in that case because the watches
// never use the AutoEncrypt.Sign endpoint.
go func() {
for {

// Check 10sec after cert expires. The agent cache
// should be handling the expiration and renew before
// it.
// If there is no cert, AutoEncryptCertNotAfter returns
// a value in the past which immediately triggers the
// renew, but this case shouldn't happen because at
// this point, auto_encrypt was just being setup
// successfully.
interval := a.tlsConfigurator.AutoEncryptCertNotAfter().Sub(time.Now().Add(10 * time.Second))
a.logger.Printf("[DEBUG] AutoEncrypt: client certificate expiration check in %s", interval)
select {
case <-a.shutdownCh:
return
case <-time.After(interval):
// check auto encrypt client cert expiration
if a.tlsConfigurator.AutoEncryptCertExpired() {
a.logger.Printf("[DEBUG] AutoEncrypt: client certificate expired.")
reply, err := a.setupClientAutoEncrypt()
if err != nil {
a.logger.Printf("[ERR] AutoEncrypt: client certificate expired, failed to renew: %s", err)
// in case of an error, try again in one minute
interval = time.Minute
continue
}
_, _, err = a.setupClientAutoEncryptCache(reply)
if err != nil {
a.logger.Printf("[ERR] AutoEncrypt: client certificate expired, failed to populate cache: %s", err)
// in case of an error, try again in one minute
interval = time.Minute
continue
}
}
}
}
}()

return nil
}

func (a *Agent) listenAndServeGRPC() error {
if len(a.config.GRPCAddrs) < 1 {
return nil
Expand Down Expand Up @@ -1088,6 +1257,8 @@ func (a *Agent) consulConfig() (*consul.Config, error) {
base.TLSCipherSuites = a.config.TLSCipherSuites
base.TLSPreferServerCipherSuites = a.config.TLSPreferServerCipherSuites

base.AutoEncryptAllowTLS = a.config.AutoEncryptAllowTLS

// Copy the Connect CA bootstrap config
if a.config.ConnectEnabled {
base.ConnectEnabled = true
Expand Down
17 changes: 17 additions & 0 deletions agent/agent_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3871,3 +3871,20 @@ func TestAgent_ReloadConfigTLSConfigFailure(t *testing.T) {
require.Len(t, tlsConf.ClientCAs.Subjects(), 1)
require.Len(t, tlsConf.RootCAs.Subjects(), 1)
}

func TestAgent_consulConfig(t *testing.T) {
t.Parallel()
dataDir := testutil.TempDir(t, "agent") // we manage the data dir
defer os.RemoveAll(dataDir)
hcl := `
data_dir = "` + dataDir + `"
verify_incoming = true
ca_file = "../test/ca/root.cer"
cert_file = "../test/key/ourdomain.cer"
key_file = "../test/key/ourdomain.key"
auto_encrypt { allow_tls = true }
`
a := NewTestAgent(t, t.Name(), hcl)
defer a.Shutdown()
require.True(t, a.consulConfig().AutoEncryptAllowTLS)
}
167 changes: 95 additions & 72 deletions agent/bindata_assetfs.go

Large diffs are not rendered by default.

36 changes: 28 additions & 8 deletions agent/cache-types/connect_ca_leaf.go
Original file line number Diff line number Diff line change
Expand Up @@ -501,12 +501,23 @@ func (c *ConnectCALeaf) generateNewLeaf(req *ConnectCALeafRequest,
return result, errors.New("cluster has no CA bootstrapped yet")
}

// Build the service ID
serviceID := &connect.SpiffeIDService{
Host: roots.TrustDomain,
Datacenter: req.Datacenter,
Namespace: "default",
Service: req.Service,
// Build the cert uri
var id connect.CertURI
if req.Service != "" {
id = &connect.SpiffeIDService{
Host: roots.TrustDomain,
Datacenter: req.Datacenter,
Namespace: "default",
Service: req.Service,
}
} else if req.Agent != "" {
id = &connect.SpiffeIDAgent{
Host: roots.TrustDomain,
Datacenter: req.Datacenter,
Agent: req.Agent,
}
} else {
return result, errors.New("URI must be either service or agent")
}
freddygv marked this conversation as resolved.
Show resolved Hide resolved

// Create a new private key
Expand All @@ -516,7 +527,7 @@ func (c *ConnectCALeaf) generateNewLeaf(req *ConnectCALeafRequest,
}

// Create a CSR.
csr, err := connect.CreateCSR(serviceID, pk)
csr, err := connect.CreateCSR(id, pk)
if err != nil {
return result, err
}
Expand Down Expand Up @@ -606,14 +617,23 @@ type ConnectCALeafRequest struct {
Token string
Datacenter string
Service string // Service name, not ID
Agent string // Agent name, not ID
MinQueryIndex uint64
MaxQueryTime time.Duration
}

func (r *ConnectCALeafRequest) Key() string {
if len(r.Agent) > 0 {
return fmt.Sprintf("agent:%s", r.Agent)
}

return fmt.Sprintf("service:%s", r.Service)
}

func (r *ConnectCALeafRequest) CacheInfo() cache.RequestInfo {
return cache.RequestInfo{
Token: r.Token,
Key: r.Service,
Key: r.Key(),
Datacenter: r.Datacenter,
MinIndex: r.MinQueryIndex,
Timeout: r.MaxQueryTime,
Expand Down
Loading