Skip to content

Conversation

@PillaiManish
Copy link
Contributor

@PillaiManish PillaiManish commented Jul 3, 2025

This PR implements the proposal to secure cert-manager-operator by adding NetworkPolicy resources for both the operator and its operands. The goal is to reduce the attack surface and ensure components operate under the principle of least privilege.

The core strategy establishes a baseline of network isolation by creating a "deny-all" NetworkPolicy for all ingress and egress traffic. From there, we explicitly allow the minimal required traffic for the components to function correctly.

Key allowed flows include:

  • API Server Access: Egress traffic to the Kubernetes API server is permitted for all components to manage resources.
  • Metrics Scraping: Ingress traffic is allowed on the designated metrics ports to enable monitoring by Prometheus.
  • Webhook Validation: Ingress traffic to the webhook port is allowed to facilitate admission review requests from the API server.

The operator manages the lifecycle of these network policies for itself and its operands using the staticResourceController.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 3, 2025

@PillaiManish: This pull request references CM-525 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 3, 2025
@openshift-ci openshift-ci bot requested review from deads2k and dhellmann July 3, 2025 11:23
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 3, 2025

@PillaiManish: This pull request references CM-525 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set.

In response to this:

This PR implements the proposal to secure cert-manager-operator by adding NetworkPolicy resources for both the operator and its operands. The goal is to reduce the attack surface and ensure components operate under the principle of least privilege.

The core strategy establishes a baseline of network isolation by creating a "deny-all" NetworkPolicy for all ingress and egress traffic. From there, we explicitly allow the minimal required traffic for the components to function correctly.

Key allowed flows include:

  • API Server Access: Egress traffic to the Kubernetes API server is permitted for all components to manage resources.
  • Metrics Scraping: Ingress traffic is allowed on the designated metrics ports to enable monitoring by Prometheus.
  • Webhook Validation: Ingress traffic to the webhook port is allowed to facilitate admission review requests from the API server.

The operator manages the lifecycle of these network policies for itself and its operands using the staticResourceController.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@PillaiManish PillaiManish force-pushed the cert-manager-network-policy branch from 761d81b to 7e8c31b Compare July 3, 2025 12:07
@PillaiManish
Copy link
Contributor Author

cc: @TrilokGeer, @mytreya-rh

@PillaiManish PillaiManish force-pushed the cert-manager-network-policy branch 2 times, most recently from cec68b6 to a670ad7 Compare July 3, 2025 12:43
Copy link
Contributor

@bharath-b-rh bharath-b-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EP focuses only on the cert-manager operand and I think it should include for istiocsr as well.


## Summary

This document proposes the implementation of specific, fine-grained Kubernetes NetworkPolicy objects for the `cert-manager` operator and its operands. Currently, the operator and its components run without network restrictions, posing a potential security risk. By defining explicit ingress and egress rules, we can enforce the principle of least privilege, securing the `cert-manager` namespaces and ensuring that its components only communicate with necessary services like the Kubernetes API server and Prometheus.
Copy link
Contributor

@bharath-b-rh bharath-b-rh Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should explicitly mention operator and operand namespaces, because operand namespaces will cover cert-manager and istio-csr both.


## Motivation

In a multi-tenant or security-conscious environment, it is crucial to enforce network segregation to limit the potential impact of a compromised pod. The `cert-manager` operator and its components are critical for certificate management within the cluster, but they operate with default-allow network rules. Applying network policies is a standard security best practice that utilizes the platform's own capabilities to secure platform workloads. This enhancement ensures that the `cert-manager` components are not an unintended attack vector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think but they operate with default-allow network rules gives out the meaning by default allow all policy is used and could be rephrased to say The cert-manager operator and its components are critical for certificate management within the cluster, but they lack any network traffic filtering or validation


### Goals

- Implement a default-deny policy for all pods in the `cert-manager-operator` and `cert-manager` namespaces.
Copy link
Contributor

@bharath-b-rh bharath-b-rh Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

operator manages istio-csr operand pods too!!


- Implement a default-deny policy for all pods in the `cert-manager-operator` and `cert-manager` namespaces.
- Define specific ingress and egress rules for the `cert-manager` operator pod to allow essential communication.
- Define specific ingress and egress rules for each `cert-manager` operand (`cert-manager`, `webhook`, `cainjector`) to allow them to function correctly while blocking unnecessary traffic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Define specific ingress and egress rules for each `cert-manager` operand (`cert-manager`, `webhook`, `cainjector`) to allow them to function correctly while blocking unnecessary traffic.
- Define specific ingress and egress rules for each `cert-manager` components (`cert-manager`, `webhook`, `cainjector`) to allow them to function correctly while blocking unnecessary traffic.


## Proposal

The proposal is to have the `cert-manager-operator` create and manage a set of `NetworkPolicy` objects across its two namespaces: `cert-manager-operator` for the operator itself, and `cert-manager` for the operands. The strategy is to first apply a default-deny policy and then layer more specific `allow` policies for required traffic flows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better approach would be to have Operator specific NP part of bundle. And for versions prior to 4.20, we have to check whether the suggestion in doc is feasible considering the maintenance part too. This feature will be part of 1.18 release which will be supported in 4.17+.


## Proposal

The proposal is to have the `cert-manager-operator` create and manage a set of `NetworkPolicy` objects across its two namespaces: `cert-manager-operator` for the operator itself, and `cert-manager` for the operands. The strategy is to first apply a default-deny policy and then layer more specific `allow` policies for required traffic flows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the NP manifest is included, does the operator installation fail? If so, is it specific to OLMv0 or OLMv1 also doesn't support it?

* **For the `cert-manager` controller pod (`app: cert-manager`):**

* **Allow Egress to API Server:** Permit egress to the API server on port 6443/TCP for its core reconciliation loops.
* **Allow Egress for Issuers:** Permit all egress traffic to allow communication with various external ACME issuers (e.g., Let's Encrypt) or other services required for certificate challenges.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the manifest in Operand Namespace (cert-manager) Policies section we are allowing all egress traffic. How can user update this with their use case specifics?


## Alternatives (Not Implemented)

* **Deny-All at Namespace Level:** An initial approach considered applying a single `podSelector: {}` deny-all policy to the entire namespace. However, this is less explicit. Using a pod selector for each `deny` policy ensures that the denial is clearly associated with the component it is intended to protect.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the case with current proposal. At least that's my understanding reading the manifests in Implementation Details/Notes/Constraints section.

## Upgrade / Downgrade Strategy

* **Upgrade:** On upgrade, the operator will apply the new `NetworkPolicy` objects. Since the previous version had no policies, this will be a seamless transition to a more secure state.
* **Downgrade:** If a user downgrades to a version of the operator that is not aware of network policies, the `NetworkPolicy` objects will be orphaned (left behind). Since older versions operated in a default-allow world, these leftover restrictive policies could break the installation. The downgrade documentation must instruct the user to manually delete the `NetworkPolicy` objects from the `cert-manager-operator` and `cert-manager` namespaces before downgrading.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this wouldn't be the case, unless the NP's have been tweaked correct? Could we mention it explicitly?

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 21, 2025
@bharath-b-rh
Copy link
Contributor

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 21, 2025
@PillaiManish PillaiManish changed the title CM-525: Cert-Manager Network Policy CM-624: Cert-Manager Network Policy Sep 8, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 8, 2025

@PillaiManish: This pull request references CM-624 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

This PR implements the proposal to secure cert-manager-operator by adding NetworkPolicy resources for both the operator and its operands. The goal is to reduce the attack surface and ensure components operate under the principle of least privilege.

The core strategy establishes a baseline of network isolation by creating a "deny-all" NetworkPolicy for all ingress and egress traffic. From there, we explicitly allow the minimal required traffic for the components to function correctly.

Key allowed flows include:

  • API Server Access: Egress traffic to the Kubernetes API server is permitted for all components to manage resources.
  • Metrics Scraping: Ingress traffic is allowed on the designated metrics ports to enable monitoring by Prometheus.
  • Webhook Validation: Ingress traffic to the webhook port is allowed to facilitate admission review requests from the API server.

The operator manages the lifecycle of these network policies for itself and its operands using the staticResourceController.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@bharath-b-rh bharath-b-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
except for couple of suggestions.

cc: @TrilokGeer @mytreya-rh for the reviews.

//
// +kubebuilder:validation:Optional
// +optional
NetworkPolicy *v1.NetworkPolicy `json:"networkPolicy,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have an id to map the policy, so that reconciliation can be done effectively. Same is required for istiocsr API as well.

Suggested change
NetworkPolicy *v1.NetworkPolicy `json:"networkPolicy,omitempty"`
NetworkPolicies []NetworkPolicy `json:"networkPolicies,omitempty"`
}
type NetworkPolicy struct {
// name to assign to the created NetworkPolicy object.
// +required
Name string `json:"name,omitempty"`
NetworkPolicy *v1.NetworkPolicy `json:"networkPolicy,omitempty"`
}

// When set to "enabled", the operator will create default network policies to secure
// communication between cert-manager controller, webhook, and cainjector components.
// When set to "disabled" or empty, no default network policies are created.
// Valid values are: "enabled", "disabled", or empty (default: disabled).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a reason for introducing this config like not to interrupt the existing deployments(which have followed the upgrade path) and a tentative timeline when this parameter will be deprecated, or atleast say this will be deprecated is future release and will NetworkPolicy will be created by default and hence it's good for users to enable and define the required policies?

@bharath-b-rh
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 16, 2025
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 25, 2025
@bharath-b-rh
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 3, 2025
@PillaiManish PillaiManish force-pushed the cert-manager-network-policy branch from c84693d to a6aa29d Compare October 23, 2025 10:52
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 23, 2025
Copy link
Member

@lunarwhite lunarwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work. Mostly nitpicks

@PillaiManish PillaiManish force-pushed the cert-manager-network-policy branch from dff737c to 8016f9e Compare October 28, 2025 06:08
@PillaiManish PillaiManish force-pushed the cert-manager-network-policy branch from 8016f9e to 118794f Compare October 28, 2025 06:14
@PillaiManish
Copy link
Contributor Author

@lunarwhite fixed all the nitpicks. Thanks for all the suggestions 😄

cc: @mytreya-rh, @bharath-b-rh

@lunarwhite
Copy link
Member

/lgtm
Thanks!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 28, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 28, 2025

@PillaiManish: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@bharath-b-rh
Copy link
Contributor

/lgtm

@mytreya-rh
Copy link
Contributor

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 28, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mytreya-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 28, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 304f5f4 into openshift:master Oct 28, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants