Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Keylime easily deployable on Kubernetes/Openshift #1378

Closed
maugustosilva opened this issue May 17, 2023 · 2 comments
Closed

Make Keylime easily deployable on Kubernetes/Openshift #1378

maugustosilva opened this issue May 17, 2023 · 2 comments

Comments

@maugustosilva
Copy link
Contributor

After several discussions with @mpeters @ansasaki Lukas Vrabec @galmasi and Marcus Hesse, we collectively decided that the time to have Keylime easily deployed on Kubernetes/Openshift has come. I propose we use this issue to concentrate all the relevant discussion on this topic.

I will start by listing some common relevant points, and I do thank Marcus Hesse for starting the discussion on the keylime-operator on CNCF's Slack. I believe I have addressed most of your questions on this writeup.

The main goal is to end with an "Attestation Operator", which can not only automatically add nodes (i.e., agents) to specific verifiers but can also properly react to administrative activities such as node reboots or cordoning off.

I am not an Kubernetes/Openshift expert by any means, and therefore my proposal here is bound to be incomplete/incorrect, and therefore additions/corrects are welcome. That being said, I see the following set of intermediate steps, in increasing order of complexity, as a good way to achieve our goal.

  1. Ensure that all keylime components can be fully executed in an containerized manner. For this the following requirements should be satisfied.
    a. Unmodified public images. I suggest we expand https://quay.io/organization/keylime (under Red Hat's control), already offering the "latest" verifier, registrar and tenant to also include the rust agent image (@ansasaki is pursing this)
    b. Carefully determine the least amount of (container) privileges will be required to run the agent
    c. Provide some tool to perform containerized keylime deployments (@maugustosilva and @galmasi have a tool, which is about to be released into open-source, to perform this task).

  2. Create a simple Kubernetes application for keylime. At this point, we should be able to start by writing progressively more yaml files

    a. The idea is to start with very simple Deployment with the following objects:
    * AStatefulSet (initially of 1) for the Registrar
    * AStatefulSet (initially of 1) for the Verifier
    * A DaemonSet for the Agents
    * Both exposed as Service (type=NodePort)
    * mTLS certificates stored as Secrets
    * Given the fact keylime can be fully configured via environment variables, we shall use environment dependent variables on our yaml.

    b. Initially, I propose we make the following simplifying boundary conditions
    * Given the use of the sqlite we could start without any DB deployment
    * mTLS certificates are pre-generated (with keyime_ca commands) and added to the Kubernetes cluster
    * Environment variables will be also set and maintained by some external tool
    * The tenant will NOT be part of the initial deployment.
    * Make use of the "Node Feature Discovery" to mark all the nodes with tpm devices (and make it part of the DaemonSet node selector)

    c. From this point we should expand for an "scale-out" deployment.
    * Multiple Registrars and Verifiers
    * A pre-packaged helm deployment of some SQL database server will be used.
    * A Service (type=LoadBalancer)

    d. At this point, the following technical considerations should be made.
    * I am hoping we can "get away" with a pre-packaged n-way replicated SQL DB server.
    * Verifiers are identified by a "verifier ID", which I assume can be take from the "persistent identifier within a StatefulSet"
    * The load balancing algorithm will have to use the URI (which contains the agent UUID) for the selection of the backend (i.e., we cannot use round-robin or source IP, given that presently a single tenant will add all the agents to the set of verifiers)
    * Tenant is still considered as a component outside of the whole deployment

  3. Create an Operator for keylime. My experience writing operators is fairly limited, but I will point out some of the desirable characteristics:

    • Ability to automatically generate all pertinent certificates
    • Ability to deal with environment variables
    • Ability to automatically add agents to verifiers
    • Ability to react to administrative tasks on node, such as reboot, drainage, cordoning off.
  4. Make the Operator more "production-ready"

    • How to deal with (measured boot and runtime/IMA) policies?
    • How to deal with "scale-out" operations (i.e., if the number of verifier pods increase, should we perform "rebalancing")?
    • How to integrate "durable attestation" on this scenario?
  5. The majority of the aforementioned stakeholders (@maugustosilva @mpeters @ansasaki Lukas Vrabec @galmasi and Marcus Hesse) voted for having this worked developed on a new repository within the keylime project. I will create such repository.

@maugustosilva
Copy link
Contributor Author

Copied this issue, as is, to keylime/attestation-operator#1

@maugustosilva
Copy link
Contributor Author

We reach a point where we have an actually functional "base" Keylime deployment (via helm) at https://github.com/keylime/attestation-operator . This issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant