-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding enhancement #98 for SPIRE integration #100
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,227 @@ | ||
<!-- | ||
**Note:** When your enhancement is complete, all of these comment blocks should be removed. | ||
|
||
To get started with this template: | ||
|
||
- [ ] **Fill out this file as best you can.** | ||
At minimum, you should fill in the "Summary", and "Motivation" sections. | ||
These should be easy if you've preflighted the idea of the enhancement with the | ||
appropriate SIG(s). | ||
- [ ] **Merge early and iterate.** | ||
Avoid getting hung up on specific details and instead aim to get the goals of | ||
the enhancement clarified and merged quickly. The best way to do this is to just | ||
start with the high-level sections and fill out details incrementally in | ||
subsequent PRs. | ||
--> | ||
# enhancement-98: SPIRE Integration | ||
|
||
<!-- | ||
A table of contents is helpful for quickly jumping to sections of a enhancement and for | ||
highlighting any additional information provided beyond the standard enhancement | ||
template. | ||
--> | ||
|
||
<!-- toc --> | ||
- [Release Signoff Checklist](#release-signoff-checklist) | ||
- [Summary](#summary) | ||
- [Motivation](#motivation) | ||
- [Goals](#goals) | ||
- [Non-Goals](#non-goals) | ||
- [Proposal](#proposal) | ||
- [User Stories (optional)](#user-stories-optional) | ||
- [Story 1](#story-1) | ||
- [Story 2](#story-2) | ||
- [Notes/Constraints/Caveats (optional)](#notesconstraintscaveats-optional) | ||
- [Risks and Mitigations](#risks-and-mitigations) | ||
- [Design Details](#design-details) | ||
- [Test Plan](#test-plan) | ||
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) | ||
- [Drawbacks](#drawbacks) | ||
- [Alternatives](#alternatives) | ||
- [Infrastructure Needed (optional)](#infrastructure-needed-optional) | ||
<!-- /toc --> | ||
|
||
## Release Signoff Checklist | ||
|
||
<!-- | ||
**ACTION REQUIRED:** In order to merge code into a release, there must be an | ||
issue in [keylime/enhancements] referencing this enhancement and targeting a release**. | ||
|
||
For enhancements that make changes to code or processes/procedures in core | ||
Keylime i.e., [keylime/keylime], we require the following Release | ||
Signoff checklist to be completed. | ||
|
||
Check these off as they are completed for the Release Team to track. These | ||
checklist items _must_ be updated for the enhancement to be released. | ||
--> | ||
|
||
- [ ] Enhancement issue in release milestone, which links to pull request in [keylime/enhancements] | ||
- [ ] Core members have approved the issue with the label `implementable` | ||
- [ ] Design details are appropriately documented | ||
- [ ] Test plan is in place | ||
- [ ] User-facing documentation has been created in [keylime/keylime-docs] | ||
|
||
<!-- | ||
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone. | ||
--> | ||
|
||
## Summary | ||
|
||
SPIFFE/SPIRE is an elegant solution to workload identity that is pluggable | ||
in it's node and workload attestors. Keylime would be a perfect candidate | ||
for node attestation if it had a few extra APIs that would allow the | ||
SPIRE agent and SPIRE server to be able to verify the following: | ||
|
||
* Is the node where the SPIRE agent being attested by Keylime? | ||
* Is the attestation passing? | ||
|
||
We propose new APIs on the agent and verifier to allow SPIRE to verify | ||
these things. But since SPIRE will also need plugins (agent and server) | ||
for the node attestation we have some flexibility in how these APIs | ||
appear. If we keep them generic enough, they could theoretically be used | ||
by any system that wants to independently verify the state of given node | ||
in Keylime. | ||
|
||
## Motivation | ||
|
||
* To expand Keylime's usefulness and reach in the cloud-native landscape. | ||
* To have a better hardware root-of-trust for software identity | ||
* To have a more complete Zero Trust solution | ||
|
||
### Goals | ||
|
||
When complete, this proposal will allow SPIRE plugins to be written to | ||
target Keylime as an attestor and provide useful properties in keylime | ||
as selectors in SPIRE. This will allow a user to craft authentication | ||
and authorization policy that takes into account a machine's boot and | ||
file integration attestation state. | ||
|
||
### Non-Goals | ||
|
||
Although these APIs will be generic, no direct effort will be made to | ||
support other non-SPIRE entities. | ||
|
||
## Proposal | ||
|
||
|
||
### User Stories (optional) | ||
|
||
#### Story 1 | ||
|
||
A developer will be able to develop a SPIRE agent and server plugin | ||
that communitcates with the Keylime agent and verifier to be able to | ||
independenly prove that the agent in question is on the same node as the | ||
SPIRE agent and also that the agent is passing it's attestation policies | ||
in Keylime. | ||
|
||
This integration will also pull in various properties of the Keylime setup | ||
(agent configuration, policy, etc) to use as selectors for SPIRE. | ||
|
||
### Notes/Constraints/Caveats (optional) | ||
|
||
None | ||
|
||
|
||
### Risks and Mitigations | ||
|
||
Care will need to be taken so that we don't leak any sensitive data in | ||
these APIs and that our verification/signing process is secure and leads | ||
to the guarantees we are making (that the SPIRE and Keylime agents are | ||
on the same node). | ||
|
||
The security of the information flows has been reviewed by several | ||
members of the Keylime development team as well as SPIRE participants. The | ||
implementation will need thorough review as well. | ||
|
||
|
||
## Design Details | ||
|
||
The following flow is anticipated for the full Keylime SPIRE plugins: | ||
|
||
``` | ||
┌───────────────────────────────────────────────┐ ┌───────────────┐ | ||
│ │ │ │ | ||
│ Node #3 │ │ SPIRE │ | ||
│ ┌───────────────────────────────────┼────────────────────► SERVER │ | ||
│ ┌────────┴────┐ │ │ │ | ||
│ │ SPIRE │ #1 │ │ │ | ||
│ │ Agent ◄────────────────┐ │ └─────┬─────────┘ | ||
│ │ │ │ │ │ | ||
│ └─────────┬───┘ │ │ │ #4 | ||
│ │#2 ┌────▼─────┐ │ │ | ||
│ ┌──▼──────┐ │ Keylime │ │ ┌─────▼─────────┐ | ||
│ │ TPM │ │ Agent │ │ │ │ | ||
│ │ │ │ │ │ │ Keylime │ | ||
│ └─────────┘ └──────────┘ │ │ Verifier │ | ||
│ │ │ │ | ||
└───────────────────────────────────────────────┘ └───────────────┘ | ||
``` | ||
|
||
This flow has the following steps: | ||
|
||
1. SPIRE Agent queries node-local /info API on keylime agent to get information like the Keylime UUID | ||
2. SPIRE Agent creates a nonce that is sent to the TPM’s AK (keylime created) for signing | ||
3. SPIRE Agent sends the information to the Spire Server | ||
4. SPIRE Server queries Keylime Verifier about the agent. Does it exist? Is it passing attestation? If so, can you unencrypt (verify signature) of this nonce? If all are true, then SPIRE attestation passed and identity is issued. | ||
|
||
|
||
In order to accomodate this flow, this enhancement will consist of the following: | ||
|
||
1. A new node-local, non-TLS API on the keylime agent responding the the `/info` path. It will return information about the keylime agent which will be used to not only identity the agent, but also be used to perform a signature verification. A 3rd party can use the credential created by the agent in the TPM to sign a nonce which can then be verified by the verifier. The new API will return the following information: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Particularly under the light of #60 it would be nice to see a separation between node-local APIs, and APIs that are being called by the verifier. Also, IMHO it would be nice to consider the following things for node-local APIs:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm curious why we think they should be separate servers? A single binary which starts multiple processes? Multiple binaries? The latter would be a much bigger lift for packagers, etc.
This is definitely worth doing. I don't know if the initial APIs will have them, but I'll make them versioned so we can add them later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
sorry, that might have not been clear: logical separation in the code (definitely not multiple binaries or processes), so that the server listening on the unix socket serves all the node-local APIs, and the one listening on TCP for the verifier serves all the existing APIs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have been reading this carefully, and I'd like to dig a little into Mike's assumptions. During the initial review of the proposal I was not at peace with the spire agent having to talk to both the agent and the TPM device. I think we can drop the requirement that the spire agent talk to the keylime agent -- without affecting security. My apologies for the extremely long writeup, and if I have a reasoning error in here please do point it out. I will then eat crow for having wasted your times :) (A) we are positing a situation in which the keylime verifier is attesting the target node. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @galmasi @mpeters. Are we assuming the Maybe I am missing something, but it looks like we either do exactly what I describe here or we implement a new HTTP api for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @galmasi I was reading your reply, and I have some questions/comments. I'm also not entirely happy with the SPIRE agent needing to talk to both the agent and the TPM, but for opposite reasons than you.
From Mike's diagram the part that I don't particularly like is that (2) is going from the SPIRE agent to the TPM. It should be the keylime agent which is always querying the TPM.
My point of view is the exact opposite on this: we should need to drop the requirement that the SPIRE agent is talking to the TPM. Here is my reasoning behind this: SPIRE already has a TPM integration as of today, and in order to promote and make keylime more valuable to other use cases even apart from SPIRE (and I am actually working on one right now), the barrier for attestation needs be lowered, and provide more value on top of this. In this case, the keylime agent is the one which interacts with the trust hardware module, and it happens to use the TPM at this point in time. It's the node-level abstraction of how to do these type of actions on the host. Furthermore, keylime does more than what SPIRE is currently doing with its TPM integration, and this is where the particular value add (IMHO) lies. So I think for your (C.1) I would do the challenge through a node-level API through the keylime agent. It's extremely important though that this API is a host local API (obviously). Admittedly though, your approach can theoretically be considered more secure: as both components independently are talking to the same TPM which is the source of truth after all. However, for all practical purposes a host local API basically does the same thing (and one can control and restrict further access to this socket with additional methods as well). In a nutshell that would provide the following components:
It's a good discussion :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mpeters so, accessing the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This was my first design @mheese but after starting it and talking it over with others I noticed it was flawed. The purpose of SPIRE attestation via keylime is twofold:
If the SPIRE agent just talks to the Keylime agent then it can't really prove #1. A compromised keylime agent on node A could accept requests and forward them to some other process on node B which could get it's answers either from a Keylime agent on node B or the TPM on node B. And the SPIRE agent on node A wouldn't know the difference as long as node A was registered with Keylime. So by talking to the TPM directly we can independently prove the identity of the node and then prove #2 by talking to the Keylime agent and verifier. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@maugustosilva Yes, I guess it's not clear from my proposal that the SPIRE There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as I mentioned to @galmasi already, that's why it is important that this API is node-local and cannot be reached through any other means. Sure, directly going to the TPM fully eliminates all these concerns, but it also makes it so much more impractical (and keeping it node-local is the practical approach of guaranteeing the same things). So as long as this API is node-local, you can prove (1). That's what would keep this approach generic and being easily adoptable by other products for which talking directly to the TPM is a barrier which is just too high to achieve and which is why they would like to integrate with keylime to begin with. That all said, it seems like you and others feel strongly about this approach. Yet again, while I agree that this is the theoretically safer approach, I disagree that it makes a practical difference. |
||
|
||
* agent_uuid | ||
* tpm_hash_alg | ||
* tpm_encryption_alg | ||
* tpm_signing_alg | ||
* ek_handle | ||
|
||
2. A new API on the verifier that can take a signed payload from a TPM and given agent's UUID verify that it came from a TPM associated with that agent. This will be used to independently verify that the Keylime agent resides on a node with that TPM. | ||
|
||
3. An expansion of the existing `/agents` GET API on the verifier to return enough information for use as selectors in SPIRE. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what information is still necessary/needed for it to be enough for SPIRE? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right now I was thinking of adding the name(s) of the Keylime policies passed by the node. Right now the only one with a name is the file integrity policy (IMA), but we can look at adding names to the measured boot policy and others in the future. What other keylime data would you like to see as a selector? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no idea, that's kind of why I was asking :) |
||
|
||
|
||
### Test Plan | ||
|
||
The individual new APIs will have tests written for them. And the new | ||
SPIRE plugins written to use those APIs will also have their own CI/CD | ||
tests/pipelines to test against those APIs, targetting specific versions | ||
of the Keylime agent and verifier. | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
These will be net-new APIs and will require a minor bump in the Keylime | ||
API version number. It's not believed that they will require database | ||
schema changes, nor any upgrade migrations. As such, there doesn't need | ||
to be a downgrade strategy. | ||
|
||
### Dependency requirements | ||
|
||
It is not believed that we will require any new dependencies for these | ||
APIs as they will just re-use existing libraries for any cryptographic | ||
signing or verification of those signatures. | ||
|
||
## Drawbacks | ||
|
||
It's possible that these APIs won't be useful outside of the SPIRE | ||
integration, but it's our belief they will be generic enough to be evolved | ||
for any 3rd party that wants to do deep verification of an node's status | ||
in Keylime. | ||
|
||
## Alternatives | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would also be great to compare a keylime integration against the "tpm_devid" plugin here, and what the advantages for a keylime integration would be. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea, I'll add it. |
||
|
||
It's already possible to create an integration of Keylime with SPIRE | ||
by using the x509pop plugin, but there are several limitations with | ||
this approach: | ||
|
||
* You need to have a key management solution for those certs | ||
* It's very automatic and requires a lot of setup/configuration for the users | ||
* It relies on using the payload delivery mechanism of the keylime tenant which some users turn off for security | ||
* It doesn't propagate any information about the Keylime setup into SPIRE properties for use in auth policy. | ||
* There's no way to revoke the certificate if attestation fails in the future | ||
|
||
This enhancement should allow for full Keylime/SPIRE plugins to fix all | ||
of those problems and make it really easy and convenient for users. | ||
|
||
## Infrastructure Needed (optional) | ||
|
||
This enhancement shouldn't need any additional infrastructure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is SPIRE attestation as 'periodic' as Keylime attestation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, especially for node attestation, I believe it only happens once at SPIRE agent startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever SPIRE states about the node should only have a short-term validity period since Keylime attestation may detect that the nodes has gone out-of-policy shortly after ...