Skip to content

Commit

Permalink
xds controller: setup watches for and compute leaf cert references in…
Browse files Browse the repository at this point in the history
… ProxyStateTemplate, and wire up leaf cert manager dependency (#18756)

* Refactors the leafcert package to not have a dependency on agent/consul and agent/cache to avoid import cycles. This way the xds controller can just import the leafcert package to use the leafcert manager.

The leaf cert logic in the controller:
* Sets up watches for leaf certs that are referenced in the ProxyStateTemplate (which generates the leaf certs too).
* Gets the leaf cert from the leaf cert cache
* Stores the leaf cert in the ProxyState that's pushed to xds
* For the cert watches, this PR also uses a bimapper + a thin wrapper to map leaf cert events to related ProxyStateTemplates

Since bimapper uses a resource.Reference or resource.ID to map between two resource types, I've created an internal type for a leaf certificate to use for the resource.Reference, since it's not a v2 resource.
The wrapper allows mapping events to resources (as opposed to mapping resources to resources)

The controller tests:
Unit: Ensure that we resolve leaf cert references
Lifecycle: Ensure that when the CA is updated, the leaf cert is as well

Also adds a new spiffe id type, and adds workload identity and workload identity URI to leaf certs. This is so certs are generated with the new workload identity based SPIFFE id.

* Pulls out some leaf cert test helpers into a helpers file so it
can be used in the xds controller tests.
* Wires up leaf cert manager dependency
* Support getting token from proxytracker
* Add workload identity spiffe id type to the authorize and sign functions



---------

Co-authored-by: John Murret <john.murret@hashicorp.com>
  • Loading branch information
ndhanushkodi and jmurret authored Sep 12, 2023
1 parent 89e6725 commit 78b170a
Show file tree
Hide file tree
Showing 42 changed files with 1,302 additions and 399 deletions.
3 changes: 2 additions & 1 deletion agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ import (
"sync/atomic"
"time"

"github.com/hashicorp/consul/lib/stringslice"

"github.com/armon/go-metrics"
"github.com/armon/go-metrics/prometheus"
"github.com/hashicorp/go-connlimit"
Expand Down Expand Up @@ -71,7 +73,6 @@ import (
"github.com/hashicorp/consul/lib/file"
"github.com/hashicorp/consul/lib/mutex"
"github.com/hashicorp/consul/lib/routine"
"github.com/hashicorp/consul/lib/stringslice"
"github.com/hashicorp/consul/logging"
"github.com/hashicorp/consul/proto-public/pbresource"
"github.com/hashicorp/consul/proto/private/pboperator"
Expand Down
3 changes: 2 additions & 1 deletion agent/cache-types/connect_ca_root.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,12 @@ import (
"fmt"

"github.com/hashicorp/consul/agent/cache"
"github.com/hashicorp/consul/agent/cacheshim"
"github.com/hashicorp/consul/agent/structs"
)

// Recommended name for registration.
const ConnectCARootName = "connect-ca-root"
const ConnectCARootName = cacheshim.ConnectCARootName

// ConnectCARoot supports fetching the Connect CA roots. This is a
// straightforward cache type since it only has to block on the given
Expand Down
28 changes: 2 additions & 26 deletions agent/cache/cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ import (
"golang.org/x/time/rate"

"github.com/hashicorp/consul/acl"
"github.com/hashicorp/consul/agent/cacheshim"
"github.com/hashicorp/consul/lib"
"github.com/hashicorp/consul/lib/ttlcache"
)
Expand Down Expand Up @@ -172,32 +173,7 @@ type typeEntry struct {

// ResultMeta is returned from Get calls along with the value and can be used
// to expose information about the cache status for debugging or testing.
type ResultMeta struct {
// Hit indicates whether or not the request was a cache hit
Hit bool

// Age identifies how "stale" the result is. It's semantics differ based on
// whether or not the cache type performs background refresh or not as defined
// in https://www.consul.io/api/index.html#agent-caching.
//
// For background refresh types, Age is 0 unless the background blocking query
// is currently in a failed state and so not keeping up with the server's
// values. If it is non-zero it represents the time since the first failure to
// connect during background refresh, and is reset after a background request
// does manage to reconnect and either return successfully, or block for at
// least the yamux keepalive timeout of 30 seconds (which indicates the
// connection is OK but blocked as expected).
//
// For simple cache types, Age is the time since the result being returned was
// fetched from the servers.
Age time.Duration

// Index is the internal ModifyIndex for the cache entry. Not all types
// support blocking and all that do will likely have this in their result type
// already but this allows generic code to reason about whether cache values
// have changed.
Index uint64
}
type ResultMeta = cacheshim.ResultMeta

// Options are options for the Cache.
type Options struct {
Expand Down
58 changes: 3 additions & 55 deletions agent/cache/request.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
package cache

import (
"time"
"github.com/hashicorp/consul/agent/cacheshim"
)

// Request is a cacheable request.
Expand All @@ -13,64 +13,12 @@ import (
// the agent/structs package.
//
//go:generate mockery --name Request --inpackage
type Request interface {
// CacheInfo returns information used for caching this request.
CacheInfo() RequestInfo
}
type Request = cacheshim.Request

// RequestInfo represents cache information for a request. The caching
// framework uses this to control the behavior of caching and to determine
// cacheability.
//
// TODO(peering): finish ensuring everything that sets a Datacenter sets or doesn't set PeerName.
// TODO(peering): also make sure the peer name is present in the cache key likely in lieu of the datacenter somehow.
type RequestInfo struct {
// Key is a unique cache key for this request. This key should
// be globally unique to identify this request, since any conflicting
// cache keys could result in invalid data being returned from the cache.
// The Key does not need to include ACL or DC information, since the
// cache already partitions by these values prior to using this key.
Key string

// Token is the ACL token associated with this request.
//
// Datacenter is the datacenter that the request is targeting.
//
// PeerName is the peer that the request is targeting.
//
// All of these values are used to partition the cache. The cache framework
// today partitions data on these values to simplify behavior: by
// partitioning ACL tokens, the cache doesn't need to be smart about
// filtering results. By filtering datacenter/peer results, the cache can
// service the multi-DC/multi-peer nature of Consul. This comes at the expense of
// working set size, but in general the effect is minimal.
Token string
Datacenter string
PeerName string

// MinIndex is the minimum index being queried. This is used to
// determine if we already have data satisfying the query or if we need
// to block until new data is available. If no index is available, the
// default value (zero) is acceptable.
MinIndex uint64

// Timeout is the timeout for waiting on a blocking query. When the
// timeout is reached, the last known value is returned (or maybe nil
// if there was no prior value). This "last known value" behavior matches
// normal Consul blocking queries.
Timeout time.Duration

// MaxAge if set limits how stale a cache entry can be. If it is non-zero and
// there is an entry in cache that is older than specified, it is treated as a
// cache miss and re-fetched. It is ignored for cachetypes with Refresh =
// true.
MaxAge time.Duration

// MustRevalidate forces a new lookup of the cache even if there is an
// existing one that has not expired. It is implied by HTTP requests with
// `Cache-Control: max-age=0` but we can't distinguish that case from the
// unset case for MaxAge. Later we may support revalidating the index without
// a full re-fetch but for now the only option is to refetch. It is ignored
// for cachetypes with Refresh = true.
MustRevalidate bool
}
type RequestInfo = cacheshim.RequestInfo
23 changes: 7 additions & 16 deletions agent/cache/watch.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,26 +9,17 @@ import (
"reflect"
"time"

"github.com/hashicorp/consul/lib"
"google.golang.org/protobuf/proto"

"github.com/hashicorp/consul/agent/cacheshim"
"github.com/hashicorp/consul/lib"
)

// UpdateEvent is a struct summarizing an update to a cache entry
type UpdateEvent struct {
// CorrelationID is used by the Notify API to allow correlation of updates
// with specific requests. We could return the full request object and
// cachetype for consumers to match against the calls they made but in
// practice it's cleaner for them to choose the minimal necessary unique
// identifier given the set of things they are watching. They might even
// choose to assign random IDs for example.
CorrelationID string
Result interface{}
Meta ResultMeta
Err error
}
type UpdateEvent = cacheshim.UpdateEvent

// Callback is the function type accepted by NotifyCallback.
type Callback func(ctx context.Context, event UpdateEvent)
type Callback = cacheshim.Callback

// Notify registers a desire to be updated about changes to a cache result.
//
Expand Down Expand Up @@ -126,7 +117,7 @@ func (c *Cache) notifyBlockingQuery(ctx context.Context, r getOptions, correlati
// Check the index of the value returned in the cache entry to be sure it
// changed
if index == 0 || index < meta.Index {
cb(ctx, UpdateEvent{correlationID, res, meta, err})
cb(ctx, UpdateEvent{CorrelationID: correlationID, Result: res, Meta: meta, Err: err})

// Update index for next request
index = meta.Index
Expand Down Expand Up @@ -186,7 +177,7 @@ func (c *Cache) notifyPollingQuery(ctx context.Context, r getOptions, correlatio

// Check for a change in the value or an index change
if index < meta.Index || !isEqual(lastValue, res) {
cb(ctx, UpdateEvent{correlationID, res, meta, err})
cb(ctx, UpdateEvent{CorrelationID: correlationID, Result: res, Meta: meta, Err: err})

// Update index and lastValue
lastValue = res
Expand Down
118 changes: 118 additions & 0 deletions agent/cacheshim/cache.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
// Copyright (c) HashiCorp, Inc.
// SPDX-License-Identifier: BUSL-1.1

package cacheshim

import (
"context"
"time"
)

// cacheshim defines any shared cache types for any packages that don't want to have a dependency on the agent cache.
// This was created as part of a refactor to remove agent/leafcert package's dependency on agent/cache.

type ResultMeta struct {
// Hit indicates whether or not the request was a cache hit
Hit bool

// Age identifies how "stale" the result is. It's semantics differ based on
// whether or not the cache type performs background refresh or not as defined
// in https://www.consul.io/api/index.html#agent-caching.
//
// For background refresh types, Age is 0 unless the background blocking query
// is currently in a failed state and so not keeping up with the server's
// values. If it is non-zero it represents the time since the first failure to
// connect during background refresh, and is reset after a background request
// does manage to reconnect and either return successfully, or block for at
// least the yamux keepalive timeout of 30 seconds (which indicates the
// connection is OK but blocked as expected).
//
// For simple cache types, Age is the time since the result being returned was
// fetched from the servers.
Age time.Duration

// Index is the internal ModifyIndex for the cache entry. Not all types
// support blocking and all that do will likely have this in their result type
// already but this allows generic code to reason about whether cache values
// have changed.
Index uint64
}

type Request interface {
// CacheInfo returns information used for caching this request.
CacheInfo() RequestInfo
}

type RequestInfo struct {
// Key is a unique cache key for this request. This key should
// be globally unique to identify this request, since any conflicting
// cache keys could result in invalid data being returned from the cache.
// The Key does not need to include ACL or DC information, since the
// cache already partitions by these values prior to using this key.
Key string

// Token is the ACL token associated with this request.
//
// Datacenter is the datacenter that the request is targeting.
//
// PeerName is the peer that the request is targeting.
//
// All of these values are used to partition the cache. The cache framework
// today partitions data on these values to simplify behavior: by
// partitioning ACL tokens, the cache doesn't need to be smart about
// filtering results. By filtering datacenter/peer results, the cache can
// service the multi-DC/multi-peer nature of Consul. This comes at the expense of
// working set size, but in general the effect is minimal.
Token string
Datacenter string
PeerName string

// MinIndex is the minimum index being queried. This is used to
// determine if we already have data satisfying the query or if we need
// to block until new data is available. If no index is available, the
// default value (zero) is acceptable.
MinIndex uint64

// Timeout is the timeout for waiting on a blocking query. When the
// timeout is reached, the last known value is returned (or maybe nil
// if there was no prior value). This "last known value" behavior matches
// normal Consul blocking queries.
Timeout time.Duration

// MaxAge if set limits how stale a cache entry can be. If it is non-zero and
// there is an entry in cache that is older than specified, it is treated as a
// cache miss and re-fetched. It is ignored for cachetypes with Refresh =
// true.
MaxAge time.Duration

// MustRevalidate forces a new lookup of the cache even if there is an
// existing one that has not expired. It is implied by HTTP requests with
// `Cache-Control: max-age=0` but we can't distinguish that case from the
// unset case for MaxAge. Later we may support revalidating the index without
// a full re-fetch but for now the only option is to refetch. It is ignored
// for cachetypes with Refresh = true.
MustRevalidate bool
}

type UpdateEvent struct {
// CorrelationID is used by the Notify API to allow correlation of updates
// with specific requests. We could return the full request object and
// cachetype for consumers to match against the calls they made but in
// practice it's cleaner for them to choose the minimal necessary unique
// identifier given the set of things they are watching. They might even
// choose to assign random IDs for example.
CorrelationID string
Result interface{}
Meta ResultMeta
Err error
}

type Callback func(ctx context.Context, event UpdateEvent)

type Cache interface {
Get(ctx context.Context, t string, r Request) (interface{}, ResultMeta, error)
NotifyCallback(ctx context.Context, t string, r Request, correlationID string, cb Callback) error
Notify(ctx context.Context, t string, r Request, correlationID string, ch chan<- UpdateEvent) error
}

const ConnectCARootName = "connect-ca-root"
28 changes: 28 additions & 0 deletions agent/connect/uri.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ type CertURI interface {
}

var (
spiffeIDWorkloadIdentityRegexp = regexp.MustCompile(
`^(?:/ap/([^/]+))/ns/([^/]+)/identity/([^/]+)$`)
spiffeIDServiceRegexp = regexp.MustCompile(
`^(?:/ap/([^/]+))?/ns/([^/]+)/dc/([^/]+)/svc/([^/]+)$`)
spiffeIDAgentRegexp = regexp.MustCompile(
Expand Down Expand Up @@ -94,6 +96,32 @@ func ParseCertURI(input *url.URL) (CertURI, error) {
Datacenter: dc,
Service: service,
}, nil
} else if v := spiffeIDWorkloadIdentityRegexp.FindStringSubmatch(path); v != nil {
// Determine the values. We assume they're reasonable to save cycles,
// but if the raw path is not empty that means that something is
// URL encoded so we go to the slow path.
ap := v[1]
ns := v[2]
workloadIdentity := v[3]
if input.RawPath != "" {
var err error
if ap, err = url.PathUnescape(v[1]); err != nil {
return nil, fmt.Errorf("Invalid admin partition: %s", err)
}
if ns, err = url.PathUnescape(v[2]); err != nil {
return nil, fmt.Errorf("Invalid namespace: %s", err)
}
if workloadIdentity, err = url.PathUnescape(v[3]); err != nil {
return nil, fmt.Errorf("Invalid workload identity: %s", err)
}
}

return &SpiffeIDWorkloadIdentity{
TrustDomain: input.Host,
Partition: ap,
Namespace: ns,
WorkloadIdentity: workloadIdentity,
}, nil
} else if v := spiffeIDAgentRegexp.FindStringSubmatch(path); v != nil {
// Determine the values. We assume they're reasonable to save cycles,
// but if the raw path is not empty that means that something is
Expand Down
28 changes: 4 additions & 24 deletions agent/connect/uri_service.go
Original file line number Diff line number Diff line change
Expand Up @@ -54,33 +54,13 @@ func (id SpiffeIDService) uriPath() string {
return path
}

// SpiffeIDWorkloadIdentity is the structure to represent the SPIFFE ID for a workload identity.
type SpiffeIDWorkloadIdentity struct {
Host string
Partition string
Namespace string
Identity string
}

func (id SpiffeIDWorkloadIdentity) URI() *url.URL {
var result url.URL
result.Scheme = "spiffe"
result.Host = id.Host
result.Path = fmt.Sprintf("/ap/%s/ns/%s/identity/%s",
id.Partition,
id.Namespace,
id.Identity,
)
return &result
}

// SpiffeIDFromIdentityRef creates the SPIFFE ID from a workload identity.
// TODO (ishustava): make sure ref type is workload identity.
func SpiffeIDFromIdentityRef(trustDomain string, ref *pbresource.Reference) string {
return SpiffeIDWorkloadIdentity{
Host: trustDomain,
Partition: ref.Tenancy.Partition,
Namespace: ref.Tenancy.Namespace,
Identity: ref.Name,
TrustDomain: trustDomain,
Partition: ref.Tenancy.Partition,
Namespace: ref.Tenancy.Namespace,
WorkloadIdentity: ref.Name,
}.URI().String()
}
Loading

0 comments on commit 78b170a

Please sign in to comment.