Skip to content

Conversation

@flavianmissi
Copy link
Member

do not review.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 22, 2025
@openshift-ci openshift-ci bot requested review from TrilokGeer and hasbro17 October 22, 2025 09:15
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 22, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jaypoulz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 28, 2025
@flavianmissi
Copy link
Member Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 28, 2025
also remove foundations EP, taken over by Arda
Copy link
Member

@ardaguclu ardaguclu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I continue to "Workflow Description". Before that I wanted to drop my opinions.


## Summary

Provide a lightweight KMS shim architecture that enables users to deploy and manage their own KMS plugins (AWS, Vault, Thales, etc.) while OpenShift handles the complexity of Unix socket communication required by the Kubernetes KMS v2 API. OpenShift provides a socket proxy container image that users deploy alongside their KMS plugins to translate between network communication (used by the shim in API server pods) and Unix socket communication (required by standard KMS v2 plugins). This creates a clear support boundary, reduces Red Hat's support burden, and allows users to deploy KMS plugins anywhere (in-cluster, external infrastructure, or separate clusters) and update them independently of OpenShift's release cycle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this EP is not ready for review. Just consider my comments about storing historical context in this discussion. We can discuss the details offline.

allows users to deploy KMS plugins anywhere (in-cluster, external infrastructure, or separate clusters)

I think, we should strongly recommend running KMS plugins as static pods (in documentation) that are NOT managed by api server to prevent chicken-egg problem and lower latency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EP updated, PTAL

* Providing KMS plugin images or implementations (only socket proxy image)
* Automatic injection or mutation of user plugin pods
* Prescribing how users deploy socket proxy (sidecar, separate pod, external - all valid)
* Authentication between shim and socket proxy (out of scope for Tech Preview)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any need to authentication in any time?. They will be residing under same pod as separate containers?.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [kube|openshift|oauth]-apiserver and its shim will be different containers under the same pod, but the socket proxy will be on a different pod together with the kms plugin.
I'm not sure we can get away with no auth 🤔

Copy link
Member

@ardaguclu ardaguclu Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we drop supporting external services can be used as kms plugin and strictly enforce running kms plugins on the every control plane host as static pods, do we still need authentication?.

Maybe offering such flexibility is a foot-gun for us?. For instance, api server will get an error due to not getting response within 100ms from external kms plugin, customers will directly complain to us. In my opinion, we have a chance to enforce all the kms plugins to run as static pods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the socket proxy will be on a different pod together with the kms plugin.

Services within the cluster can communicate each other without relying on any authentication. Am I missing something?.

Copy link
Member Author

@flavianmissi flavianmissi Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, api server will get an error due to not getting response within 100ms from external kms plugin

Oh, I wasn't aware this was enforced in the code.

Services within the cluster can communicate each other without relying on any authentication. Am I missing something?

That's correct. Maybe I'm overthinking this, but with the kms-plugin as a side-car to the apiservers, only the apiservers themselves can access the plugin to decrypt resources. By placing the kms-plugin on its own pod, we introduce a route which is available to other workloads in the cluster.
I could be wrong, but an unguarded decryption service doesn't sound that much better than plain text 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but an unguarded decryption service doesn't sound that much better than plain text

That is a good point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware this was enforced in the code.

I couldn't find the hard coded timeout value (probably I'm mistaken). There is a timeout value in kms api https://kubernetes.io/docs/tasks/administer-cluster/kms-provider/#encrypting-your-data-with-the-kms-provider-kms-v2. We'll probably set it to some opinionated value which won't be too high (high values can make cluster unstable)

1. **KMS Shim** (OpenShift-managed sidecar in API server pods): Translates Unix socket → HTTP/gRPC network calls
2. **Socket Proxy** (OpenShift-provided image, user-deployed): Translates HTTP/gRPC network calls → Unix socket

This architecture solves the **SELinux MCS isolation problem** that prevents different pods from sharing Unix sockets via hostPath, while allowing users to deploy standard upstream KMS v2 plugins without modification. Users deploy the socket proxy alongside their plugin using OpenShift's provided container image, giving them full control over the deployment architecture (in-cluster, external, or hybrid).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full control over the deployment architecture (in-cluster, external, or hybrid).

In my opinion, we should be very restrictive instead of full flexibility. Because circular dependency between kms plugin and api server is non-solvable outage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think this is mostly for in-cluster kms plugin though, since plugins external entirely remove the dependency (but increase the latency).
I'll see how I can update this to reflect that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solutions indeed provides such flexibility and no need of excluding it. But it deserves to mention in here and documentation about strongly not recommended.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can totally get rid of mentioning external services or standard pods, etc. Customers must (you know I like using must statements :) ) run kms plugins as static pods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's true. I think sticking to static pods for kms-plugin deployments also gives us more flexibility in what kind authentication model we'll chose to use.
I'll make updates to reflect that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about kube apiserver can communicate via host network or uds in TP. After that we support aggregated api servers via Service resource in TP v2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about kube apiserver can communicate via host network or uds in TP. After that we support aggregated api servers via Service resource in TP v2?


**Key Innovation 1: Intelligent Routing in Shim**

The shim maintains multiple endpoint configurations and routes requests intelligently based on operation type:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This intelligent mechanism binds plugin shim to the encryptionconfigurations of apiserver. Because order matters in encryptionconfigurations. In my opinion, it might be better to initiate a plugin shim container per each kms plugin would be simpler.

The shim maintains multiple endpoint configurations and routes requests intelligently based on operation type:

- **Encrypt requests**: Always sent to the **primary endpoint** (the `endpoint` field)
- **Decrypt requests**: Try **primary endpoint** first, then fall back to **additional endpoints** (the `additionalEndpoints` field) if decryption fails
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like not a duty of plugin shim, instead it is a duty of apiserver based on the order of apiserver. How can plugin shim manage this order?

- kms plugin 1
- identity
- kmsplugin 2

If plugin shim falls back to kmsplugin 2, that is wrong. API server should fall back to identity, if kms plugin 1 fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an excellent point. I'm going to remove the "intelligent" bits of the design for now.

**Key Innovation 2: User-Controlled Deployment with OpenShift-Provided Components**

OpenShift provides the socket proxy container image, and users deploy it however they choose:
- **In-cluster as sidecar**: User deploys plugin + socket proxy in same pod (simplest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are the owner of platform, we should enforce customers to use this option.


OpenShift provides the socket proxy container image, and users deploy it however they choose:
- **In-cluster as sidecar**: User deploys plugin + socket proxy in same pod (simplest)
- **In-cluster as separate pods**: User deploys plugin and socket proxy in separate pods (if they prefer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to OCP policies, I don't think this will ever work (even we couldn't have made it work).

OpenShift provides the socket proxy container image, and users deploy it however they choose:
- **In-cluster as sidecar**: User deploys plugin + socket proxy in same pod (simplest)
- **In-cluster as separate pods**: User deploys plugin and socket proxy in separate pods (if they prefer)
- **External infrastructure**: User deploys plugin + socket proxy on VMs, separate clusters, or cloud provider infrastructure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I recall, every kms plugin call must be finished under 100ms. I think, it can't work in any case.

Copy link
Member

@ardaguclu ardaguclu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've completed my review with more comments.

encryption:
type: KMS
kms:
type: External
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is nice to reserve this. It gives us an option in the future to add internal kms.

kms:
type: External
external:
endpoint: http://vault-kms-plugin.kms-plugins.svc:8080
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking about service endpoint to use as unique identifier.

type: External
external:
endpoint: http://vault-kms-new.kms-plugins.svc:8080
additionalEndpoints:
Copy link
Member

@ardaguclu ardaguclu Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing these additionalEndpoints as configurable can very easily conflict with encryptionconfigurations. We shouldn't do that. User can update endpoint to http://vault-kms-new.kms-plugins.svc:8080 and our controllers need to keep the older kms plugins as active automatically (that leads us to plugin shim lifecycle management, I'm not sure why do we avoid that?)

We start migration whenever we see a new encryption configuration. AdditionalEnpoints breaks that rule too. Old endpoints in any case will be stored in encryptionconfigurations as fall back mechanism in the providers to be used by encryption controllers. This extra field may be removed entirely?

EDITED: additionalEndpoints makes almost all encryption controllers as redundant.


#### Risk: User Deployment Errors

**Risk:** Users incorrectly deploy plugin + socket proxy (wrong socket path, missing Service, port mismatch, etc.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If KMS plugin is not deployed as static pods, API Server will depend on KMS plugin to decrypt resources with the same time KMS plugin will need to be initialized by API Server.


**SELinux MCS (Multi-Category Security) Isolation Problem:**

OpenShift uses SELinux MCS labels to provide pod-to-pod isolation. Each pod gets a unique MCS label (e.g., `s0:c111,c222`). When a container creates a file (including Unix sockets) via hostPath, the file inherits the container's MCS label:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my some investigations, if we configure SCCs as privileged, pods can communicate with each other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, in my opinion main goal of this EP (e.g. plugin shim to convert unix socket <-> http) provides us more sustainable and flexible solution that can work with any plugin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to investigate 💪🏼 From our discussion with Ben today, it sounds like we can go ahead with your approach for the early TP if nothing else.
I'll update this PR to reflect that.

- API server reads old secrets, shim routes decrypt to **additional endpoints** (old key)
- API server re-encrypts, shim routes encrypt to **primary endpoint** (new key)
7. Once migration completes, cluster admin removes `additionalEndpoints` from config
8. Shim stops routing to **additional endpoints**, old plugin deployment can be deleted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prune controller is supposed to clear the old provider configurations. Shim should stop routing, only when prune_controller deletes the configurations from encryptionconfgurations.

also clarify flow when kms config changes
also add note on authentication for GA
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 8, 2025

@flavianmissi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint b2703fb link true /test markdownlint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 7, 2026
@ardaguclu
Copy link
Member

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants