Ingress router resource config #1877

joseorpa · 2025-10-27T15:31:57Z

Enhancement: Ingress Operator Resource Configuration via v1alpha1 API
This enhancement proposes adding the ability to configure resource limits
and requests for the ingress-operator deployment containers via a new
v1alpha1 API field in the IngressController custom resource.

This addresses the need for:

Setting resource limits for QoS guarantees
Compliance requirements for resource constraints
Scaling operator resources for large deployments

Relates to: RFE-1476

This enhancement proposes adding the ability to configure resource limits and requests for the ingress-operator deployment containers via a new v1alpha1 API field in the IngressController custom resource. This addresses the need for: - Setting resource limits for QoS guarantees - Compliance requirements for resource constraints - Scaling operator resources for large deployments Relates to: RFE-1476

Minor updates

openshift-ci · 2025-10-27T15:32:19Z

Hi @joseorpa. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Miciah · 2025-10-27T21:45:22Z

enhancements/ingress/operator-resource-configuration.md

+### Alternative 2: Modify v1 API directly
+
+Add `operatorResourceRequirements` field directly to stable v1 API.
+
+**Pros**:
+- No need for v1alpha1 version
+- Simpler for users (one API version)
+
+**Cons**:
+- Changes stable API (breaking compatibility promise)
+- Cannot iterate on design easily
+- Difficult to remove if issues found
+- Against OpenShift API stability guarantees
+
+**Decision**: Rejected - Use v1alpha1 for new features as per OpenShift conventions


Where is this v1alpha1 convention coming from? Can we introduce v1alpha1 when we already have v1?

The usual approach is to add the field directly to the existing v1 API:

Define a new featuregate, initially in the TPNU feature set (but not Default).

Add a field to the v1 API, using the new featuregate (as you've done using the // +openshift:enable:FeatureGate marker).

Implement the feature and write tests.

Add the featuregate to the Default feature set when it's ready.

This v1alpha1 convention comes from openshift/api#2485 (review)

@JoelSpeed can you help here?

There is a difference between adding a field to an already stable API (which Miciah has pointed out) and adding a completely new API.

The PR I reviewed, and left feedback on, was introducing a completely new API type, and as such, starting as alpha is correct per our latest guidelines.

If you think this should just be a field on an existing v1 API then that's a different discussion

Miciah · 2025-10-27T21:46:20Z

enhancements/ingress/operator-resource-configuration.md

+Create a new v1alpha1 API version for IngressController in the 
+`operator.openshift.io` group, following the pattern made for example by 
+[cluster monitoring v1alpha1 configuration](https://github.com/openshift/api/blob/94481d71bb6f3ce6019717ea7900e6f88f42fa2c/config/v1alpha1/types_cluster_monitoring.go#L172-L193).


Can we use a shared type for all operators?

You mean core Kubernetes corev1.ResourceRequirements ? I've seen there is a lot of types in operator.openshift.io group.

enhancements/ingress/operator-resource-configuration.md

Miciah · 2025-10-27T21:55:35Z

enhancements/ingress/operator-resource-configuration.md

+- Maintain backward compatibility with existing IngressController v1 API
+- Use v1alpha1 API version for this Tech Preview feature
+- Provide sensible defaults that work for most deployments
+- Support both the ingress-operator and kube-rbac-proxy containers


Why kube-rbac-proxy? Is that only for QoS?

I'm correcting this as well, it is for QoS but I agree, not directly related to router pods.

enhancements/ingress/operator-resource-configuration.md

Miciah · 2025-10-27T21:59:53Z

enhancements/ingress/operator-resource-configuration.md

+A new controller (`operator-deployment-controller`) in the cluster-ingress-operator 
+watches the default IngressController CR and reconciles the operator's own deployment 
+when `operatorResourceRequirements` is specified.
+
+**Controller responsibilities:**
+1. Watch IngressController resources (v1alpha1)
+2. Reconcile `ingress-operator` Deployment in `openshift-ingress-operator` namespace
+3. Update container resource specifications
+4. Handle error cases gracefully (invalid values, conflicts, etc.)


This won't work; CVO manages the ingress-operator deployment. You can't have cluster-ingress-operator update its own deployment.

I'm updating this as well, ingress operator would control just the deployment of the router pods.

Miciah · 2025-10-27T22:03:00Z

enhancements/ingress/operator-resource-configuration.md

+**Mitigation**:
+- Controller reconciliation loop detects and corrects drift
+- Document that configuration should be via IngressController CR, not direct deployment edits
+- Admission webhooks prevent direct deployment modifications


Are you proposing adding an admission webhook to block updates to the ingress-operator deployment?

I'm correcting this and changing it for a conversion webhook for the different API versions.

Miciah · 2025-10-27T22:07:14Z

enhancements/ingress/operator-resource-configuration.md

+
+This enhancement proposes adding the ability to configure resource limits and 
+requests for the ingress-operator deployment containers via a new v1alpha1 API 
+field in the IngressController custom resource.


If this is for the ingress-operator deployment, it doesn't make sense to put this in the IngressController CRD, which describes configuration for router pods.

I agree, I will update the content of this part of the enhancement as well.

Miciah · 2025-10-27T22:15:08Z

enhancements/ingress/operator-resource-configuration.md

+1. **Q**: Should we support auto-scaling (VPA) in the future?
+   - **A**: Out of scope for initial implementation, but API should not preclude it


Autoscaling the operator?

This should be the router pod for sure, updating this too

Miciah · 2025-10-27T22:20:57Z

enhancements/ingress/operator-resource-configuration.md

+3. **Q**: Should this apply to all IngressControllers or only the default?
+   - **A**: Initial implementation only default, but API supports any IngressController


Does the configuration apply to IngressControllers (router) pods at all, or only to the ingress-operator pod?

If you mean it applies only to the ingress-operator pod, are you saying that resource requests and limits for the ingress-operator pod are read from the "default" IngressController, and resource request and limits specified on other IngressController CRs are ignored? Putting configuration for the operator in the IngressController CRD is confusing (see #1877 (comment)).

If you actually mean resource requests and limits for router pods, then it seems to me that it is simplest and least surprising to respect the configuration for all IngressControllers, not only for the default. Does respecting configuration for other IngressControllers pose some problem?

It will be for all router pods.

Miciah · 2025-10-27T22:21:31Z

enhancements/ingress/operator-resource-configuration.md

+4. **Q**: How do we handle the operator modifying its own deployment safely?
+   - **A**: Use owner references carefully, reconcile loop with backoff


Can you elaborate on this point? How do you avoid conflicts with CVO?

Changed it to router pods controlled by ingress-controller

Miciah · 2025-10-27T22:22:48Z

enhancements/ingress/operator-resource-configuration.md

+- [ ] Sufficient field testing (2+ minor releases in Tech Preview)
+- [ ] No major bugs reported for 2 consecutive releases


This is an unusual requirement for OpenShift. For a feature like this, we would usually introduce as Tech Preview and graduate to GA in the same release development cycle.

Miciah · 2025-10-27T22:23:45Z

enhancements/ingress/operator-resource-configuration.md

+- [ ] No major bugs reported for 2 consecutive releases
+- [ ] Performance impact assessed and documented
+- [ ] API design validated by diverse user scenarios
+- [ ] At least 10 production users providing positive feedback


Do you believe you will be able to find 10 production users of this feature?

Miciah · 2025-10-27T22:26:18Z

enhancements/ingress/operator-resource-configuration.md

+- Simpler to implement
+- No API version changes needed
+- Easy to update without CRD changes


You would need a CRD change to add a reference to the ConfigMap... unless you would have the operator just check for a ConfigMap in openshift-config with some hard-coded name?

Miciah · 2025-10-27T22:33:52Z

enhancements/ingress/operator-resource-configuration.md

+### Alternative 3: Separate CRD for operator configuration
+
+Create a new OperatorConfiguration CRD (similar to how cluster monitoring works).
+
+**Pros**:
+- Separation of concerns
+- Can configure multiple operators uniformly
+
+**Cons**:
+- Increases API surface unnecessarily
+- IngressController is the logical place for ingress-operator configuration
+- More CRDs to manage
+- Inconsistent with how other operators handle self-configuration
+
+**Decision**: Rejected - IngressController CR is the appropriate configuration location


If you really to mean for this EP to be specifically for the ingress-operator pod (and not router pods), then I really like this alternative. Have you considered a variant: adding configuration for resource requests and limits to the ClusterVersion CRD (alongside the existing component overrides)? This makes a lot of sense for a few reasons:

CVO is the thing that manages the deployment right now; trying to have cluster-ingress-operator update the deployment that CVO manages is asking for trouble.

The resource requests and limits configuration logically fits under CVO configuration, not the IngressController API.

The configuration logically fits in with component overrides.

The resource requests and limits configuration could apply to any operator, not just cluster-ingress-operator; putting the configuration under the ClusterVersion CRD would provide a centralized, consistent way to configure it for multiple operators.

Miciah · 2025-10-27T22:34:31Z

enhancements/ingress/operator-resource-configuration.md

+**Cons**:
+- Not GitOps friendly
+- Requires direct deployment modification
+- Not discoverable via API
+- Doesn't follow OpenShift declarative configuration patterns
+- Difficult to audit and version control
+


Also, it would require a CVO override.

Miciah · 2025-10-27T22:37:28Z

enhancements/ingress/operator-resource-configuration.md

+
+## Design Details
+
+### Open Questions


Can you address this point from https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#resources-and-limits?

We do not want cluster components to be restarted based on their resource consumption (for example, being killed due to an out-of-memory condition). We need to detect and handle those cases more gracefully, without degrading cluster performance.

Miciah · 2025-10-28T12:56:47Z

/assign

openshift-ci · 2025-10-29T12:02:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from miciah. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

enhancements/ingress/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rikatz · 2025-11-05T15:37:50Z

/cc @alebedev87

candita · 2025-11-25T19:01:09Z

Sorry, I forgot to assign this.
/assign @alebedev87

yuqi-zhang

Some general questions from the API perspective inline

yuqi-zhang · 2025-12-16T00:58:33Z

enhancements/ingress/operator-resource-configuration.md

+TechPreviewNoUpgrade feature set, and will be promoted to the Default feature set 
+once the feature graduates to GA.
+
+**Enabling the Feature Gate:**


(non-blocking): feature gating and tech preview are generally described in the OpenShift docs and processes. I don't think we need to describe the workflow in detail here (to keep the enhancement more concise) but fine to keep as is as well if you prefer.

yuqi-zhang · 2025-12-16T01:04:46Z

enhancements/ingress/operator-resource-configuration.md

+    // router pods (HAProxy containers). This field allows setting resource limits
+    // to achieve Guaranteed QoS class for router pods.
+    //
+    // When this field is set, it takes precedence over spec.nodePlacement.resources


Is this referring to https://github.com/openshift/api/blob/bfa868a224015e94456731c1b5b0c849f901b417/operator/v1/types_ingress.go#L435 ? I don't see a resources field in nodePlacement. Could you help understand where that is being set?

yuqi-zhang · 2025-12-16T01:07:32Z

enhancements/ingress/operator-resource-configuration.md

+
+    // tuning defines parameters for tuning the performance of ingress controller pods.
+    // +optional
+    Tuning *IngressControllerTuning `json:"tuning,omitempty"`


Is this IngressControllerTuningOptions https://github.com/openshift/api/blob/bfa868a224015e94456731c1b5b0c849f901b417/operator/v1/types_ingress.go#L263 or a new field?

yuqi-zhang · 2025-12-16T01:08:27Z

enhancements/ingress/operator-resource-configuration.md

+    // metricsContainer specifies resource requirements for the metrics sidecar
+    // container in router pods.
+    //
+    // If not specified, uses Kubernetes default behavior (no requests or limits).


Curious if it's worth considering adding defaults to these as well?

yuqi-zhang · 2025-12-16T01:11:17Z

enhancements/ingress/operator-resource-configuration.md

+5. The ingress-operator reconciles the router deployment with the specified resources
+6. Kubernetes performs a rolling restart of the router pods with the new resource configuration
+7. Router pods achieve Guaranteed QoS class (when limits == requests)
+8. Platform administrator verifies the changes with `oc describe deployment router-default -n openshift-ingress`


Alternatively, we could have IngressControllerStatus also introduce a subfield for this to reflect if it properly applied, but maybe that's not needed given that it's relatively easy to view the deployment as well.

yuqi-zhang · 2025-12-16T01:12:48Z

enhancements/ingress/operator-resource-configuration.md

+    //   limits: none
+    //
+    // +optional
+    RouterContainer *corev1.ResourceRequirements `json:"routerContainer,omitempty"`


ResourceRequirements also has a Claims subfield: https://pkg.go.dev/k8s.io/api/core/v1#ResourceRequirements , would we plan on allowing the user to set that?

yuqi-zhang · 2025-12-16T01:15:05Z

enhancements/ingress/operator-resource-configuration.md

+1. **Resource limits must be >= requests**: Kubernetes standard validation enforced by API server
+2. **Feature gate check**: If `IngressRouterResourceLimits` feature gate is disabled, 
+   the `resources` field will be ignored (with a warning event logged)
+3. **Minimum values** (recommendations, not hard limits):


Would we be proposing that the API itself has no validation, but ingress controller would be checking the spec and emitting those events? I think it may be best to have some type of min/max validation on the API itself, which will also help with documentation to list out suggestions for the user.

yuqi-zhang · 2025-12-16T01:15:08Z

enhancements/ingress/operator-resource-configuration.md

+3. **Minimum values** (recommendations, not hard limits):
+   - Router container: cpu >= 100m, memory >= 128Mi recommended for production
+   - Values below recommendations will generate warning events but not block the request
+4. **Precedence validation**: When both `spec.resources` and `spec.nodePlacement.resources` 


Same question as above, where is this field located and how would we validate it?

jortizpa and others added 3 commits October 14, 2025 13:45

Update operator-resource-configuration.md

d708dc0

Minor updates

Update operator-resource-configuration.md

97527c2

Minor updates

openshift-ci bot requested review from Miciah and rfredette October 27, 2025 15:32

openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 27, 2025

Miciah reviewed Oct 27, 2025

View reviewed changes

enhancements/ingress/operator-resource-configuration.md Outdated Show resolved Hide resolved

Miciah reviewed Oct 27, 2025

View reviewed changes

enhancements/ingress/operator-resource-configuration.md Outdated Show resolved Hide resolved

Miciah reviewed Oct 27, 2025

View reviewed changes

openshift-ci bot assigned Miciah Oct 28, 2025

Some adjustements after first review

b498028

joseorpa changed the title ~~Ingress operator resource config~~ Ingress router resource config Oct 29, 2025

openshift-ci bot requested a review from alebedev87 November 5, 2025 15:37

openshift-ci bot assigned alebedev87 Nov 25, 2025

yuqi-zhang reviewed Dec 16, 2025

View reviewed changes

		1. Q: Should we support auto-scaling (VPA) in the future?
		- A: Out of scope for initial implementation, but API should not preclude it

		3. Q: Should this apply to all IngressControllers or only the default?
		- A: Initial implementation only default, but API supports any IngressController

		4. Q: How do we handle the operator modifying its own deployment safely?
		- A: Use owner references carefully, reconcile loop with backoff

		- [ ] Sufficient field testing (2+ minor releases in Tech Preview)
		- [ ] No major bugs reported for 2 consecutive releases

Ingress router resource config #1877

Are you sure you want to change the base?

Ingress router resource config #1877

Conversation

joseorpa commented Oct 27, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Oct 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joseorpa Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joseorpa Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Miciah commented Oct 28, 2025

Uh oh!

openshift-ci bot commented Oct 29, 2025

Uh oh!

rikatz commented Nov 5, 2025

Uh oh!

candita commented Nov 25, 2025

Uh oh!

yuqi-zhang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joseorpa commented Oct 27, 2025 •

edited by openshift-ci bot

Loading

joseorpa Oct 28, 2025 •

edited

Loading

joseorpa Oct 28, 2025 •

edited

Loading