Make Keylime easily deployable on Kubernetes/Openshift #1378

maugustosilva · 2023-05-17T23:10:02Z

After several discussions with @mpeters @ansasaki Lukas Vrabec @galmasi and Marcus Hesse, we collectively decided that the time to have Keylime easily deployed on Kubernetes/Openshift has come. I propose we use this issue to concentrate all the relevant discussion on this topic.

I will start by listing some common relevant points, and I do thank Marcus Hesse for starting the discussion on the keylime-operator on CNCF's Slack. I believe I have addressed most of your questions on this writeup.

The main goal is to end with an "Attestation Operator", which can not only automatically add nodes (i.e., agents) to specific verifiers but can also properly react to administrative activities such as node reboots or cordoning off.

I am not an Kubernetes/Openshift expert by any means, and therefore my proposal here is bound to be incomplete/incorrect, and therefore additions/corrects are welcome. That being said, I see the following set of intermediate steps, in increasing order of complexity, as a good way to achieve our goal.

Ensure that all keylime components can be fully executed in an containerized manner. For this the following requirements should be satisfied.
a. Unmodified public images. I suggest we expand https://quay.io/organization/keylime (under Red Hat's control), already offering the "latest" verifier, registrar and tenant to also include the rust agent image (@ansasaki is pursing this)
b. Carefully determine the least amount of (container) privileges will be required to run the agent
c. Provide some tool to perform containerized keylime deployments (@maugustosilva and @galmasi have a tool, which is about to be released into open-source, to perform this task).
Create a simple Kubernetes application for keylime. At this point, we should be able to start by writing progressively more yaml files

a. The idea is to start with very simple Deployment with the following objects:
* AStatefulSet (initially of 1) for the Registrar
* AStatefulSet (initially of 1) for the Verifier
* A DaemonSet for the Agents
* Both exposed as Service (type=NodePort)
* mTLS certificates stored as Secrets
* Given the fact keylime can be fully configured via environment variables, we shall use environment dependent variables on our yaml.

b. Initially, I propose we make the following simplifying boundary conditions
* Given the use of the sqlite we could start without any DB deployment
* mTLS certificates are pre-generated (with keyime_ca commands) and added to the Kubernetes cluster
* Environment variables will be also set and maintained by some external tool
* The tenant will NOT be part of the initial deployment.
* Make use of the "Node Feature Discovery" to mark all the nodes with tpm devices (and make it part of the DaemonSet node selector)

c. From this point we should expand for an "scale-out" deployment.
* Multiple Registrars and Verifiers
* A pre-packaged helm deployment of some SQL database server will be used.
* A Service (type=LoadBalancer)

d. At this point, the following technical considerations should be made.
* I am hoping we can "get away" with a pre-packaged n-way replicated SQL DB server.
* Verifiers are identified by a "verifier ID", which I assume can be take from the "persistent identifier within a StatefulSet"
* The load balancing algorithm will have to use the URI (which contains the agent UUID) for the selection of the backend (i.e., we cannot use round-robin or source IP, given that presently a single tenant will add all the agents to the set of verifiers)
* Tenant is still considered as a component outside of the whole deployment
Create an Operator for keylime. My experience writing operators is fairly limited, but I will point out some of the desirable characteristics:
- Ability to automatically generate all pertinent certificates
- Ability to deal with environment variables
- Ability to automatically add agents to verifiers
- Ability to react to administrative tasks on node, such as reboot, drainage, cordoning off.
Make the Operator more "production-ready"
- How to deal with (measured boot and runtime/IMA) policies?
- How to deal with "scale-out" operations (i.e., if the number of verifier pods increase, should we perform "rebalancing")?
- How to integrate "durable attestation" on this scenario?
The majority of the aforementioned stakeholders (@maugustosilva @mpeters @ansasaki Lukas Vrabec @galmasi and Marcus Hesse) voted for having this worked developed on a new repository within the keylime project. I will create such repository.

The text was updated successfully, but these errors were encountered:

maugustosilva · 2023-06-02T16:57:41Z

Copied this issue, as is, to keylime/attestation-operator#1

maugustosilva · 2024-02-28T21:33:59Z

We reach a point where we have an actually functional "base" Keylime deployment (via helm) at https://github.com/keylime/attestation-operator . This issue can be closed.

maugustosilva mentioned this issue May 23, 2023

Meeting 24/05/23 keylime/meetings#65

Closed

23 tasks

maugustosilva closed this as completed Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Keylime easily deployable on Kubernetes/Openshift #1378

Make Keylime easily deployable on Kubernetes/Openshift #1378

maugustosilva commented May 17, 2023

maugustosilva commented Jun 2, 2023

maugustosilva commented Feb 28, 2024

Make Keylime easily deployable on Kubernetes/Openshift #1378

Make Keylime easily deployable on Kubernetes/Openshift #1378

Comments

maugustosilva commented May 17, 2023

maugustosilva commented Jun 2, 2023

maugustosilva commented Feb 28, 2024