Deferred certificate issuance #1315

mcpherrinm · 2019-12-14T02:21:04Z

For some workloads, it might be undesirable to have spire-agents eagerly fetch certificates as soon as they learn about the registration.

Ideally I would like to specify when a registration entry is created via the API that issuance should be deferred until an agent has successfully attested the workload. Then the agent can fetch the certificate. The tradeoff of avoiding issuance seems worthwhile in some scenarios.

Here are two example use-cases:

We provide human users who log into systems SPIFFE credentials so they may perform administrative tasks by calling services or databases. Most of the time humans do not log into systems, so having certificates always ready to go is not needed. It is sufficient to provision them on-demand.

We run many CI jobs in docker containers. Some small fraction of them needs to call other services, so we want to make sure they have the option of getting a SPIFFE identity. I'd like to avoid issuing certificates until they're requested. Since we use short-lived containers for doing builds, there's significant overhead of re-issuing certificates to each build.

APTy · 2019-12-14T03:25:49Z

This is a cool idea, particularly around the sparse access of humans to production systems.

elee · 2020-01-16T22:57:38Z

This may be compelling for our usage of Spire in Kubernetes as a node compromise would be limited to only the workloads resident on the node at the time of attack vs. all SVIDs (resident and non-resident) becoming available.

cc: @gregose @brentjo @gregose as per our call today

mcpherrinm · 2020-01-17T18:19:32Z

I'm not sure this is exactly what you'd want for that property.
An attacker who wanted a non-resident workload's SVID would have to control the local SPIRE agent, and if it can do that, then it could trigger the lazy issuance. Admittedly you would have an issuance log at that point, but it's not the security boundary I'd like.

How to actually get that:
Each node runs a spire agent, and you give each agent a unique SPIFFE ID.
Your kubernetes integration only registers workloads which are pods actually on the node.
I believe the support/k8s-workload-registrar already does this (but haven't verified. If it doesn't, I'll add it. We're looking into this soon, to replace some internal integration glue code we have).

azdagron · 2020-01-17T18:48:24Z

k8s-workload-register currently registers all workloads against a generic per-cluster node SPIFFE ID.

elee · 2020-01-22T00:04:29Z

That's helpful context @mcpherrinm @azdagron -- having something {de,}register workloads per kubernetes node seems like a more feasible approach. A few things about having the kubernetes integration register workloads that jump out at me:

it may be a race condition between the integration creating these registration entries and the workloads requesting them
spire agent would have to aggressively poll to prune invalid workload registration entires as they are removed or modified (I think it does this already)
the availability of the agents are now very coupled to the availability of the kube control plane and this registration control loop

I'm not entirely sure how to mitigate (1) at a glance, the other two challenges seem like design decision tradeoffs

azdagron · 2020-01-22T00:18:50Z

#1 seems possible to mitigate by plugging into the Kubernetes Scheduling Framework (https://kubernetes.io/docs/concepts/configuration/scheduling-framework). The "Reserve" and "Unreserve" integration points seem promising. On "reserve", a registration entry could be added. On "unreserve" it could be removed.

azdagron · 2020-01-22T00:19:32Z

ReservePlugin interface
https://github.com/kubernetes/kubernetes/blob/edad4bbfc824215fc254096dfbbd1b2ab8ce6781/pkg/scheduler/framework/v1alpha1/interface.go#L347

UnreservePlugin interface
https://github.com/kubernetes/kubernetes/blob/edad4bbfc824215fc254096dfbbd1b2ab8ce6781/pkg/scheduler/framework/v1alpha1/interface.go#L378

evan2645 · 2020-03-11T20:19:22Z

Ideally I would like to specify when a registration entry is created via the API that issuance should be deferred until an agent has successfully attested the workload. Then the agent can fetch the certificate.

My first impression is that it feels more natural to enable this feature on an agent-by-agent basis, e.g. disable_eager_svid_caching = true. Perhaps that inclination is due to my mental model in which an entry describes a workload and its identity... the behavior in question here is a function of agent logic rather than being anything to do with the workload or its identity itself.

Do you have cases in which exposing this feature as an agent configurable wouldn't quite cut the mustard?

I'd like to avoid issuing certificates until they're requested. Since we use short-lived containers for doing builds, there's significant overhead of re-issuing certificates to each build.

These two statements feel conflicting? Or, is this an argument for per-entry control?

azdagron · 2022-05-09T16:45:14Z

I think this is being solved to some extent with #2593. Happy to revisit if needed.

azdagron closed this as completed May 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deferred certificate issuance #1315

Deferred certificate issuance #1315

mcpherrinm commented Dec 14, 2019

APTy commented Dec 14, 2019

elee commented Jan 16, 2020

mcpherrinm commented Jan 17, 2020

azdagron commented Jan 17, 2020

elee commented Jan 22, 2020

azdagron commented Jan 22, 2020

azdagron commented Jan 22, 2020 •

edited

Loading

evan2645 commented Mar 11, 2020

azdagron commented May 9, 2022

Deferred certificate issuance #1315

Deferred certificate issuance #1315

Comments

mcpherrinm commented Dec 14, 2019

APTy commented Dec 14, 2019

elee commented Jan 16, 2020

mcpherrinm commented Jan 17, 2020

azdagron commented Jan 17, 2020

elee commented Jan 22, 2020

azdagron commented Jan 22, 2020

azdagron commented Jan 22, 2020 • edited Loading

evan2645 commented Mar 11, 2020

azdagron commented May 9, 2022

azdagron commented Jan 22, 2020 •

edited

Loading