Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deferred certificate issuance #1315

Closed
mcpherrinm opened this issue Dec 14, 2019 · 9 comments
Closed

Deferred certificate issuance #1315

mcpherrinm opened this issue Dec 14, 2019 · 9 comments

Comments

@mcpherrinm
Copy link
Contributor

For some workloads, it might be undesirable to have spire-agents eagerly fetch certificates as soon as they learn about the registration.

Ideally I would like to specify when a registration entry is created via the API that issuance should be deferred until an agent has successfully attested the workload. Then the agent can fetch the certificate. The tradeoff of avoiding issuance seems worthwhile in some scenarios.

Here are two example use-cases:

We provide human users who log into systems SPIFFE credentials so they may perform administrative tasks by calling services or databases. Most of the time humans do not log into systems, so having certificates always ready to go is not needed. It is sufficient to provision them on-demand.

We run many CI jobs in docker containers. Some small fraction of them needs to call other services, so we want to make sure they have the option of getting a SPIFFE identity. I'd like to avoid issuing certificates until they're requested. Since we use short-lived containers for doing builds, there's significant overhead of re-issuing certificates to each build.

@APTy
Copy link
Contributor

APTy commented Dec 14, 2019

This is a cool idea, particularly around the sparse access of humans to production systems.

@elee
Copy link
Contributor

elee commented Jan 16, 2020

This may be compelling for our usage of Spire in Kubernetes as a node compromise would be limited to only the workloads resident on the node at the time of attack vs. all SVIDs (resident and non-resident) becoming available.

cc: @gregose @brentjo @gregose as per our call today

@mcpherrinm
Copy link
Contributor Author

I'm not sure this is exactly what you'd want for that property.
An attacker who wanted a non-resident workload's SVID would have to control the local SPIRE agent, and if it can do that, then it could trigger the lazy issuance. Admittedly you would have an issuance log at that point, but it's not the security boundary I'd like.

How to actually get that:
Each node runs a spire agent, and you give each agent a unique SPIFFE ID.
Your kubernetes integration only registers workloads which are pods actually on the node.
I believe the support/k8s-workload-registrar already does this (but haven't verified. If it doesn't, I'll add it. We're looking into this soon, to replace some internal integration glue code we have).

@azdagron
Copy link
Member

k8s-workload-register currently registers all workloads against a generic per-cluster node SPIFFE ID.

@elee
Copy link
Contributor

elee commented Jan 22, 2020

That's helpful context @mcpherrinm @azdagron -- having something {de,}register workloads per kubernetes node seems like a more feasible approach. A few things about having the kubernetes integration register workloads that jump out at me:

  1. it may be a race condition between the integration creating these registration entries and the workloads requesting them
  2. spire agent would have to aggressively poll to prune invalid workload registration entires as they are removed or modified (I think it does this already)
  3. the availability of the agents are now very coupled to the availability of the kube control plane and this registration control loop

I'm not entirely sure how to mitigate (1) at a glance, the other two challenges seem like design decision tradeoffs

@azdagron
Copy link
Member

#1 seems possible to mitigate by plugging into the Kubernetes Scheduling Framework (https://kubernetes.io/docs/concepts/configuration/scheduling-framework). The "Reserve" and "Unreserve" integration points seem promising. On "reserve", a registration entry could be added. On "unreserve" it could be removed.

@evan2645
Copy link
Member

Ideally I would like to specify when a registration entry is created via the API that issuance should be deferred until an agent has successfully attested the workload. Then the agent can fetch the certificate.

My first impression is that it feels more natural to enable this feature on an agent-by-agent basis, e.g. disable_eager_svid_caching = true. Perhaps that inclination is due to my mental model in which an entry describes a workload and its identity... the behavior in question here is a function of agent logic rather than being anything to do with the workload or its identity itself.

Do you have cases in which exposing this feature as an agent configurable wouldn't quite cut the mustard?

I'd like to avoid issuing certificates until they're requested. Since we use short-lived containers for doing builds, there's significant overhead of re-issuing certificates to each build.

These two statements feel conflicting? Or, is this an argument for per-entry control?

@azdagron
Copy link
Member

azdagron commented May 9, 2022

I think this is being solved to some extent with #2593. Happy to revisit if needed.

@azdagron azdagron closed this as completed May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants