Skip to content

Conversation

@ardaguclu
Copy link
Member

This PR is based on #1872 (changes in enhancements/kube-apiserver/kms-encryption-foundations.md).

There are many aspects that need to be implemented to support KMS in OpenShift. We have decided to open more granular EPs to better track the work.

This EPs main aim is to focus on the encryption controller changes in library-go. This EP defers some concepts to future in order to start with simpler, manageable iterations.

PoC PR openshift/library-go#2045 (this is just a PoC, original PR will be opened when this EP merges).

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 3, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 3, 2025

@ardaguclu: This pull request references CNTRLPLANE-2120 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

This PR is based on #1872 (changes in enhancements/kube-apiserver/kms-encryption-foundations.md).

There are many aspects that need to be implemented to support KMS in OpenShift. We have decided to open more granular EPs to better track the work.

This EPs main aim is to focus on the encryption controller changes in library-go. This EP defers some concepts to future in order to start with simpler, manageable iterations.

PoC PR openshift/library-go#2045 (this is just a PoC, original PR will be opened when this EP merges).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from hasbro17 and yuqi-zhang December 3, 2025 09:35
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 3, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jaypoulz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ardaguclu ardaguclu force-pushed the kms-encryption-controllers branch from c091cbc to f734b05 Compare December 3, 2025 09:43
Copy link
Member

@flavianmissi flavianmissi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to take a short break from reviewing, but leaving the comments I got so far.

@ardaguclu
Copy link
Member Author

/cc @ibihim @flavianmissi

@openshift-ci openshift-ci bot requested review from flavianmissi and ibihim December 3, 2025 14:08
@ardaguclu ardaguclu force-pushed the kms-encryption-controllers branch 2 times, most recently from 1794054 to 1ddc3d8 Compare December 4, 2025 07:19
@ardaguclu
Copy link
Member Author

@flavianmissi I was uncomfortable about the disconnects between the sections and the verbosity. So I overhauled the EP to have better clarity. Please let me know your thoughts.

@ardaguclu ardaguclu force-pushed the kms-encryption-controllers branch 3 times, most recently from e920a9c to 5804b76 Compare December 4, 2025 08:35
@ardaguclu ardaguclu force-pushed the kms-encryption-controllers branch from f39a0d7 to 8f79ed6 Compare December 5, 2025 04:21
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 5, 2025

@ardaguclu: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@flavianmissi
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 5, 2025
@ardaguclu
Copy link
Member Author

/cc @benluddy

@openshift-ci openshift-ci bot requested a review from benluddy December 5, 2025 13:00
@ardaguclu
Copy link
Member Author

As we agreed with @flavianmissi, in next iterations there will be another condition to notify users to delete unused kms plugins from cluster, when prune_controller prunes them.

Copy link
Contributor

@ibihim ibihim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I have some questions though as a beginner to downstream e2ee with kms


#### Encryption Controllers

**keyController** manages encryption key lifecycle. Creates encryption key secrets in `openshift-config-managed` namespace. For KMS mode, creates empty secrets with KMS configuration hashes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what I read later, it:

  • hashes the APIServer resource and stores the hash as an annotation in the secret
  • stores the key_id as data.

This confused me and made me jump forth and back as I see only that it creates Secrest with KMS configuration hashes. Which configuration? The APIServer? The kube-apiserver encryption config?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encryption controllers convert apiserver.config.openshift.io to Secret. This Secret is used to bidirectionally convert to KeyState <-> Secret <-> kube-apiserver encryption config. So for instance we should carry some data (kmsconfig hash or key_id) in somewhere to always to be converable to the other. Otherwise, we can not detect apiserver config is changed or not.

We'll need key_id too to detect key_id is rotated or not. But this is deferred to next releases.


**keyController** manages encryption key lifecycle. Creates encryption key secrets in `openshift-config-managed` namespace. For KMS mode, creates empty secrets with KMS configuration hashes.

**stateController** generates EncryptionConfiguration for API server consumption. Implements distributed state machine ensuring all API servers converge to same revision. For KMS mode, generates configuration with deterministic Unix socket paths.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a deterministic Unix Socket path? Are there probabilistic paths?

Why do we want this? To identify if the KMS is out of date and the address changed? To run several KMS plugins somehow in parallel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To run several KMS plugins somehow in parallel?

This is one of the reasons. Deterministic implies that plugin lifecycle runs kms plugin with a unix socket that can be generated by the encryption controllers to communicate with (plugin lifecycle and encryption controllers are not organically related). So that this works as a contract that provides a deterministic communication.

However, since we likely decide to only support external kms plugins. We won't need this functionality. I proposed this openshift/api#2622 API definition to directly use whatever user sets. Thus, this unix socket generation logic will be removed from the EP.

4. Reuse existing migration controller (no changes needed)

**Tech Preview v2 additions:**
- Poll KMS plugin Status endpoint for `key_id` changes in apiserver operators
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we call the KMS plugin Status endpoint?

It gets exposed by the kube-apiserver endpoint, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question. We left it TBD for now. Accessing kms plugins require privileged container. API servers (kube-apiserver, oauth-apiserver, openshift-apiserver) are privileged. So they can access. However operators (cluster-kube-apiserver-operator, openshift-apiserver-operator, cluster-authentication-operator) are not privileged. We'll have to find a way for operators.

namespace: openshift-config-managed
annotations:
encryption.apiserver.operator.openshift.io/mode: "kms"
encryption.apiserver.operator.openshift.io/kms-config-hash: "a1b2c3d4e5f67890"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hash APIServer.spec.encryption, right? It wouldn't make sense to hash the whole spec as it contains non-encryption configuration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may or may not based on what user expects. If configuration needs migration, we'll have to hash it. However, since I proposed to simplify this with openshift/api#2622, probably we don't need this anymore.

endpoint: unix:///var/run/kmsplugin/kms-a1b2c3d4e5f67890.socket
apiVersion: v2
```
The deterministic socket path allows KMS plugin lifecycle management to use the same path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is KMS plugin lifecycle management?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be handled by another EP (see: #1872). Plugin lifecycle implies to run kms plugins somehow to be communicated by apiservers.

- `kmsConfigHash` (annotation) - Detects admin configuration changes
- `kmsKeyIDHash` (data field) - Detects external key rotation

Separate hashes handle scenarios where config changes without key rotation (updating Vault address) or key rotates without config changes (AWS automatic rotation).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be at the top of the over-arching chapter, such that it is better to understand what the sub-chapters cover. Maybe even in the EncryptionController section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update accordingly.

**From aescbc to KMS:**
1. Admin updates APIServer: `type: kms` with KMS configuration.
2. keyController creates KMS secret (empty data, with hash).
3. migrationController re-encrypts resources using external KMS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

migrationController uses KAS, not KMS directly right? We don't plugin the KAS-O via UDS to the KMS plugin, do we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't plugin the KAS-O via UDS to the KMS plugin

That is correct.

KAS-O initiates migration via Storage migration resource. So it does not involve directly. Instead it creates corresponding storage migration resource and the migration will be completed asynchronously

```go
EncryptionSecretKMSConfigHash = "encryption.apiserver.operator.openshift.io/kms-config-hash"
```
Stores truncated hash (16 hex characters, 8 bytes) of KMS configuration for change detection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the default hashing mechanism that we use for other resources as well, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fully up to us. It doesn't have to follow the other hashing methods. The entire goal is to compare the hash values to detect any changes and that is all.

```
Stores truncated hash (16 hex characters, 8 bytes) of KMS configuration for change detection.

> **Note:** The hash is truncated to 16 hex characters (8 bytes) to stay within Unix socket path length limits (typically 108 characters) while maintaining sufficient uniqueness for distinguishing different KMS configurations. This allows deterministic socket paths like `/var/run/kmsplugin/kms-a1b2c3d4e5f67890.socket`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes really late in the doc. So we use the config hash in the unix socket path.

What is the benefit of it?
We can have several KMS plugins open at once?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mechanism seems like obsolete anyway.

**Reverse Conversion** (stateController reads EncryptionConfiguration from API server pods):
1. Extract hash from socket path: `kms-a1b2c3d4e5f67890.socket` → `a1b2c3d4e5f67890`
2. Look up secret with matching `kms-config-hash` annotation
3. Reconstruct KeyState with original KMS configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that mean?

  1. We read the encryption configuration for APIServer.Encryption.
  2. We hash it.
  3. We get the appropriate secret / configmap that contains:
  • key_id (can be compared with current key_id)
  • config hash (known, used for comparison)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encryption controllers rely on the bidirectional conversion between Secret <-> KeyState <-> kube-apiserver encryption configuration. So we need to find a way to carry the data between them.

However, with this openshift/api#2622, probably we won't need to carry kmsconfig hash in unix socket. Unix socket path itself will be discriminator.

@ardaguclu
Copy link
Member Author

I'll update this EP base on the changes in openshift/api#2622
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants