Skip to content

Conversation

@zetxqx
Copy link
Contributor

@zetxqx zetxqx commented Nov 5, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This pull request introduces the core logic for model rewriting and weighted traffic
splitting within the request control director. It also includes the reconciler logic for InferenceModelRewrite resources.

Key changes:

  • pkg/epp/requestcontrol:
    • Adds the applyWeightedModelRewrite function to handle model rewriting based on
      InferenceModelRewrite rules.
    • Implements weighted selection of target models for traffic splitting.
    • Ensures that the oldest InferenceModelRewrite resource is respected in case of
      duplicate rules
  • pkg/epp/controller:
    • Implements the read-only reconciler logic for InferenceModelRewrite resources

Which issue(s) this PR fixes:

Fixes partially #1811

Does this PR introduce a user-facing change?:

Users can now configure `InferenceModelRewrite` resources to automatically redirect 
incoming model requests to different target models. This feature also supports weighted 
traffic splitting, allowing you to distribute requests across multiple target models based
 on defined percentages.

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 5, 2025
@netlify
Copy link

netlify bot commented Nov 5, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit a109e90
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6924dfcf6188ed00085620eb
😎 Deploy Preview https://deploy-preview-1820--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 5, 2025
@zetxqx
Copy link
Contributor Author

zetxqx commented Nov 5, 2025

/retest

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 18, 2025
@zetxqx
Copy link
Contributor Author

zetxqx commented Nov 18, 2025

@ahg-g @kfswain rebased and should be ready for review now! thanks in advance!

@zetxqx zetxqx force-pushed the modelrerwiteimpl branch 2 times, most recently from ae26313 to 3e9939f Compare November 18, 2025 04:23
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 19, 2025
Comment on lines +60 to 71
// Across all rules specified on applicable rewrites, precedence MUST be
// given to the match having an "Exact" model match over a generic match
// (a rule with an empty `matches` array).
//
// If ties still exist across multiple InferenceModelRewrite resources (e.g.
// two rewrites both have an exact match for the same model), matching
// precedence MUST be determined by the oldest resource based on
// creation timestamp.
//
// If ties still exist within a single InferenceModelRewrite resource, the
// FIRST matching rule (in list order) is used.
// +required
Copy link
Contributor Author

@zetxqx zetxqx Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nirrozenbaum @ahg-g @kfswain

I've updated the precedence rules for conflicting matches to better align with the HTTPRoute specification in the Kubernetes Gateway API. https://github.com/kubernetes-sigs/gateway-api/blob/f24f3a61f398c65ab629da1843cb65fd5ec9419f/apis/v1/httproute_types.go#L148-L209

The new precedence order is:

  1. More specific wins: An Exact match always takes precedence over an All match (where the matches array is empty).
  2. Tie-Breaker (Oldest Rule): If the specificity of the rules is the same (a tie), the rule that was created or deployed first (the older rule) wins.

This approach is more intuitive and simplifies the implementation of efficient RewriteRule fetching per request. Specifically, when we find an exact match, we no longer need to compare it against less specific, generic rules.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, this matches what we had with InferenceModel also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@zetxqx
Copy link
Contributor Author

zetxqx commented Nov 19, 2025

@ahg-g @nirrozenbaum @kfswain

This is ready for review now. I didn't split things up because I feel most of part is relevant but I'm open to split it up if this is too large for review.

The main changes are:

  1. Modify the API: change the conflict precedence rule a bit, more specific match wins then older wins.
  2. Add a separate modelrewrite datastore for dealing with modelrewrite memory store.
  3. Wired up reconciler logic using the modelrewrite datastore.
  4. Wired up director logic using the modelrewrite datastore.

}
}

func (rr rewriteRuleWithMetadata) isGeneric() bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the intended usecase here? a flat rewrite for any incoming req?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isGeneric means the rule matches any requests.

This is for empty matches, the use case we are targeting is the user wants to rewrite the model from all requests blindly.

func (ms *ModelRewriteStore) GetRule(modelName string) *v1alpha2.InferenceModelRewriteRule {
// Exact matches have the highest precedence.
if rulesWithMd, ok := ms.rulesByExactModelMatch[modelName]; ok && len(rulesWithMd) > 0 {
return &rulesWithMd[0].rule // The list is pre-sorted, so the first element is the oldest.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we intent to flag this status in a controller in the future?

Copy link
Contributor Author

@zetxqx zetxqx Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should if we have a separate controller for this in the future.

Here, we have to make this deterministic even we don't update the status right now. At least, user can read the API comment and figure it out by themselves if some rules does not take effect.

})

for model := range ms.rulesByExactModelMatch {
sort.Slice(ms.rulesByExactModelMatch[model], func(i, j int) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to think through when you would want multiple alias models to map to a range of target models. By allowing an array:

Matches []Match `json:"matches,omitempty"`

We increase the chance of a name collision and make our reconciler code more complex, which is more opportunity for bugs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, here we are discussing two parts:

  1. should targets be an array?
  2. should matches be an array?

for 1: The targets here is for traffic splitting. So a list of targets is a must have. for example

targets:
   - modelRerwrites: "v1",
      weight: 50
   - modelRewrites: "v2",
      weight: 50

for 2: this is really about the user experience and use case. Making mathces an array will help user to aggregate modelRewrites if they are targeting to the same targets. This may sounds a rare use case. But we at least keep the possibility. E.g. the following redirecting both "foodreview" and "spicyfoodreview" to the same traffic splitting logic.

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModelRewrite
metadata:
  name: food-review-routing
  namespace: llm-d-pfc-cpu
spec:
  poolRef:
    group: inference.networking.k8s.io
    name: llm-d-infpool
  rules:
    - matches:
      - model:
          type: Exact
          value: "foodreview"
      - model:
          type: Exact
          value: "spicyfoodreview"
      split:
        - modelRewrite: "foodreview-v1"
          weight: 50
        - modelRewrite: "foodreview-v2"
          weight: 50

w/o matches as an array people has to declare more rules as follow. They have to duplicate the targets.

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModelRewrite
metadata:
  name: food-review-routing
  namespace: llm-d-pfc-cpu
spec:
  poolRef:
    group: inference.networking.k8s.io
    name: llm-d-infpool
  rules:
      match:
      - model:
          type: Exact
          value: "foodreview"
      split:
        - modelRewrite: "foodreview-v1"
          weight: 50
        - modelRewrite: "foodreview-v2"
          weight: 50
       match
       - model
          type: Exact
          value: "spicyfoodreview".       split:
        - modelRewrite: "foodreview-v1"
          weight: 50
        - modelRewrite: "foodreview-v2"
          weight: 50

IMO, I prefer keeping match as an array. Implementation wise it's not that complicated, as what i implemented here we just need to expand the matches to different rewriteRuleWithMetadata and store them in the datastore.

// It always returns the requestContext even in the error case, as the request context is used in error handling.
func (d *Director) HandleRequest(ctx context.Context, reqCtx *handlers.RequestContext) (*handlers.RequestContext, error) {
logger := log.FromContext(ctx)
d.applyWeightedModelRewrite(reqCtx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we validated that this works? We are changing the content length of the body and i think we need to change the corresponding header, I don't remember if we removed that code of not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified, this is working. From my manual testing, we don't need to change the header, just body mutation is enough.

@zetxqx zetxqx force-pushed the modelrerwiteimpl branch 2 times, most recently from 3c2c178 to c6527d8 Compare November 24, 2025 08:18
@zetxqx
Copy link
Contributor Author

zetxqx commented Nov 24, 2025

I believe the failure of CRD CI(recently added) is as expected:
image

@kfswain @ahg-g @nirrozenbaum this is ready for another review now

Copy link
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, any chance we can add integration tests?

Comment on lines +60 to 71
// Across all rules specified on applicable rewrites, precedence MUST be
// given to the match having an "Exact" model match over a generic match
// (a rule with an empty `matches` array).
//
// If ties still exist across multiple InferenceModelRewrite resources (e.g.
// two rewrites both have an exact match for the same model), matching
// precedence MUST be determined by the oldest resource based on
// creation timestamp.
//
// If ties still exist within a single InferenceModelRewrite resource, the
// FIRST matching rule (in list order) is used.
// +required
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@zetxqx zetxqx force-pushed the modelrerwiteimpl branch 2 times, most recently from c7400c0 to 8cc56d9 Compare November 24, 2025 22:33
@zetxqx
Copy link
Contributor Author

zetxqx commented Nov 24, 2025

Great, any chance we can add integration tests?

Added a test case in the integration hermetic test.

// +kubebuilder:validation:MinItems=1
//
Targets []TargetModel `json:"split,omitempty"`
Targets []TargetModel `json:"targets,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this is a miss. updated the "split" to "targets"

@ahg-g
Copy link
Contributor

ahg-g commented Nov 24, 2025

Thanks, this looks good to me; leaving the approval to Kellen and Nir.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 24, 2025
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 24, 2025
@zetxqx
Copy link
Contributor Author

zetxqx commented Nov 24, 2025

@ahg-g fixed a typo revealed by the CI. Now only the CRD Validation CR failed as expected. Would you mind regranting the lgtm again?

@ahg-g
Copy link
Contributor

ahg-g commented Nov 26, 2025

/approve
/lgtm

how do we get past the validation check?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 26, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, zetxqx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 26, 2025
@zetxqx
Copy link
Contributor Author

zetxqx commented Nov 27, 2025

/approve /lgtm

how do we get past the validation check?

Thanks @ahg-g . I thought you can force merge this PR. I also have this #1909 to bypass by maintainers adding a label.

@nirrozenbaum
Copy link
Contributor

/approve /lgtm
how do we get past the validation check?

Thanks @ahg-g . I thought you can force merge this PR. I also have this #1909 to bypass by maintainers adding a label.

as maintainers, we have the option to manually bypass the validation check which obviously shouldn't be enforced here cause this CRD and its handling is still under development. of course this is not a breaking change because it's non functional yet.

@ahg-g @kfswain I recommend to bypass manually rather than adding automation for that, which is much more controlled.
I wouldn't want us to add automation to bypass validation. that sounds risky.

@ahg-g
Copy link
Contributor

ahg-g commented Nov 27, 2025

Sounds good to me

@ahg-g ahg-g merged commit d886b15 into kubernetes-sigs:main Nov 27, 2025
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants