-
Notifications
You must be signed in to change notification settings - Fork 526
CM-624: Cert-Manager Network Policy #1816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CM-624: Cert-Manager Network Policy #1816
Conversation
|
@PillaiManish: This pull request references CM-525 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@PillaiManish: This pull request references CM-525 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
761d81b to
7e8c31b
Compare
|
cc: @TrilokGeer, @mytreya-rh |
cec68b6 to
a670ad7
Compare
bharath-b-rh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EP focuses only on the cert-manager operand and I think it should include for istiocsr as well.
|
|
||
| ## Summary | ||
|
|
||
| This document proposes the implementation of specific, fine-grained Kubernetes NetworkPolicy objects for the `cert-manager` operator and its operands. Currently, the operator and its components run without network restrictions, posing a potential security risk. By defining explicit ingress and egress rules, we can enforce the principle of least privilege, securing the `cert-manager` namespaces and ensuring that its components only communicate with necessary services like the Kubernetes API server and Prometheus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should explicitly mention operator and operand namespaces, because operand namespaces will cover cert-manager and istio-csr both.
|
|
||
| ## Motivation | ||
|
|
||
| In a multi-tenant or security-conscious environment, it is crucial to enforce network segregation to limit the potential impact of a compromised pod. The `cert-manager` operator and its components are critical for certificate management within the cluster, but they operate with default-allow network rules. Applying network policies is a standard security best practice that utilizes the platform's own capabilities to secure platform workloads. This enhancement ensures that the `cert-manager` components are not an unintended attack vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think but they operate with default-allow network rules gives out the meaning by default allow all policy is used and could be rephrased to say The cert-manager operator and its components are critical for certificate management within the cluster, but they lack any network traffic filtering or validation
|
|
||
| ### Goals | ||
|
|
||
| - Implement a default-deny policy for all pods in the `cert-manager-operator` and `cert-manager` namespaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
operator manages istio-csr operand pods too!!
|
|
||
| - Implement a default-deny policy for all pods in the `cert-manager-operator` and `cert-manager` namespaces. | ||
| - Define specific ingress and egress rules for the `cert-manager` operator pod to allow essential communication. | ||
| - Define specific ingress and egress rules for each `cert-manager` operand (`cert-manager`, `webhook`, `cainjector`) to allow them to function correctly while blocking unnecessary traffic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Define specific ingress and egress rules for each `cert-manager` operand (`cert-manager`, `webhook`, `cainjector`) to allow them to function correctly while blocking unnecessary traffic. | |
| - Define specific ingress and egress rules for each `cert-manager` components (`cert-manager`, `webhook`, `cainjector`) to allow them to function correctly while blocking unnecessary traffic. |
|
|
||
| ## Proposal | ||
|
|
||
| The proposal is to have the `cert-manager-operator` create and manage a set of `NetworkPolicy` objects across its two namespaces: `cert-manager-operator` for the operator itself, and `cert-manager` for the operands. The strategy is to first apply a default-deny policy and then layer more specific `allow` policies for required traffic flows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better approach would be to have Operator specific NP part of bundle. And for versions prior to 4.20, we have to check whether the suggestion in doc is feasible considering the maintenance part too. This feature will be part of 1.18 release which will be supported in 4.17+.
|
|
||
| ## Proposal | ||
|
|
||
| The proposal is to have the `cert-manager-operator` create and manage a set of `NetworkPolicy` objects across its two namespaces: `cert-manager-operator` for the operator itself, and `cert-manager` for the operands. The strategy is to first apply a default-deny policy and then layer more specific `allow` policies for required traffic flows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the NP manifest is included, does the operator installation fail? If so, is it specific to OLMv0 or OLMv1 also doesn't support it?
| * **For the `cert-manager` controller pod (`app: cert-manager`):** | ||
|
|
||
| * **Allow Egress to API Server:** Permit egress to the API server on port 6443/TCP for its core reconciliation loops. | ||
| * **Allow Egress for Issuers:** Permit all egress traffic to allow communication with various external ACME issuers (e.g., Let's Encrypt) or other services required for certificate challenges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the manifest in Operand Namespace (cert-manager) Policies section we are allowing all egress traffic. How can user update this with their use case specifics?
|
|
||
| ## Alternatives (Not Implemented) | ||
|
|
||
| * **Deny-All at Namespace Level:** An initial approach considered applying a single `podSelector: {}` deny-all policy to the entire namespace. However, this is less explicit. Using a pod selector for each `deny` policy ensures that the denial is clearly associated with the component it is intended to protect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the case with current proposal. At least that's my understanding reading the manifests in Implementation Details/Notes/Constraints section.
| ## Upgrade / Downgrade Strategy | ||
|
|
||
| * **Upgrade:** On upgrade, the operator will apply the new `NetworkPolicy` objects. Since the previous version had no policies, this will be a seamless transition to a more secure state. | ||
| * **Downgrade:** If a user downgrades to a version of the operator that is not aware of network policies, the `NetworkPolicy` objects will be orphaned (left behind). Since older versions operated in a default-allow world, these leftover restrictive policies could break the installation. The downgrade documentation must instruct the user to manually delete the `NetworkPolicy` objects from the `cert-manager-operator` and `cert-manager` namespaces before downgrading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this wouldn't be the case, unless the NP's have been tweaked correct? Could we mention it explicitly?
|
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale |
|
@PillaiManish: This pull request references CM-624 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
bharath-b-rh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
except for couple of suggestions.
cc: @TrilokGeer @mytreya-rh for the reviews.
| // | ||
| // +kubebuilder:validation:Optional | ||
| // +optional | ||
| NetworkPolicy *v1.NetworkPolicy `json:"networkPolicy,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have an id to map the policy, so that reconciliation can be done effectively. Same is required for istiocsr API as well.
| NetworkPolicy *v1.NetworkPolicy `json:"networkPolicy,omitempty"` | |
| NetworkPolicies []NetworkPolicy `json:"networkPolicies,omitempty"` | |
| } | |
| type NetworkPolicy struct { | |
| // name to assign to the created NetworkPolicy object. | |
| // +required | |
| Name string `json:"name,omitempty"` | |
| NetworkPolicy *v1.NetworkPolicy `json:"networkPolicy,omitempty"` | |
| } |
| // When set to "enabled", the operator will create default network policies to secure | ||
| // communication between cert-manager controller, webhook, and cainjector components. | ||
| // When set to "disabled" or empty, no default network policies are created. | ||
| // Valid values are: "enabled", "disabled", or empty (default: disabled). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a reason for introducing this config like not to interrupt the existing deployments(which have followed the upgrade path) and a tentative timeline when this parameter will be deprecated, or atleast say this will be deprecated is future release and will NetworkPolicy will be created by default and hence it's good for users to enable and define the required policies?
|
/lgtm |
|
/lgtm |
c84693d to
a6aa29d
Compare
lunarwhite
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work. Mostly nitpicks
dff737c to
8016f9e
Compare
8016f9e to
118794f
Compare
|
@lunarwhite fixed all the nitpicks. Thanks for all the suggestions 😄 cc: @mytreya-rh, @bharath-b-rh |
|
/lgtm |
|
@PillaiManish: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/lgtm |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mytreya-rh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR implements the proposal to secure
cert-manager-operatorby addingNetworkPolicyresources for both the operator and its operands. The goal is to reduce the attack surface and ensure components operate under the principle of least privilege.The core strategy establishes a baseline of network isolation by creating a "deny-all"
NetworkPolicyfor all ingress and egress traffic. From there, we explicitly allow the minimal required traffic for the components to function correctly.Key allowed flows include:
The operator manages the lifecycle of these network policies for itself and its operands using the
staticResourceController.