feat: Implement Model Rewrite and Traffic Splitting Logic #1820

zetxqx · 2025-11-05T16:21:08Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

This pull request introduces the core logic for model rewriting and weighted traffic
splitting within the request control director. It also includes the reconciler logic for InferenceModelRewrite resources.

Key changes:

pkg/epp/requestcontrol:
- Adds the applyWeightedModelRewrite function to handle model rewriting based on
  InferenceModelRewrite rules.
- Implements weighted selection of target models for traffic splitting.
- Ensures that the oldest InferenceModelRewrite resource is respected in case of
  duplicate rules
pkg/epp/controller:
- Implements the read-only reconciler logic for InferenceModelRewrite resources

Which issue(s) this PR fixes:

Fixes partially #1811

Does this PR introduce a user-facing change?:

Users can now configure `InferenceModelRewrite` resources to automatically redirect 
incoming model requests to different target models. This feature also supports weighted 
traffic splitting, allowing you to distribute requests across multiple target models based
 on defined percentages.

netlify · 2025-11-05T16:21:14Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`a109e90`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6924dfcf6188ed00085620eb
😎 Deploy Preview	https://deploy-preview-1820--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

zetxqx · 2025-11-05T18:01:14Z

/retest

zetxqx · 2025-11-18T01:27:56Z

@ahg-g @kfswain rebased and should be ready for review now! thanks in advance!

pkg/epp/datastore/datastore.go

zetxqx · 2025-11-19T01:14:09Z

apix/v1alpha2/inferencemodelrewrite_types.go

+	// Across all rules specified on applicable rewrites, precedence MUST be
+	// given to the match having an "Exact" model match over a generic match
+	// (a rule with an empty `matches` array).
+	//
+	// If ties still exist across multiple InferenceModelRewrite resources (e.g.
+	// two rewrites both have an exact match for the same model), matching
+	// precedence MUST be determined by the oldest resource based on
+	// creation timestamp.
+	//
+	// If ties still exist within a single InferenceModelRewrite resource, the
+	// FIRST matching rule (in list order) is used.
 	// +required


@nirrozenbaum @ahg-g @kfswain

I've updated the precedence rules for conflicting matches to better align with the HTTPRoute specification in the Kubernetes Gateway API. https://github.com/kubernetes-sigs/gateway-api/blob/f24f3a61f398c65ab629da1843cb65fd5ec9419f/apis/v1/httproute_types.go#L148-L209

The new precedence order is:

More specific wins: An Exact match always takes precedence over an All match (where the matches array is empty).

Tie-Breaker (Oldest Rule): If the specificity of the rules is the same (a tie), the rule that was created or deployed first (the older rule) wins.

This approach is more intuitive and simplifies the implementation of efficient RewriteRule fetching per request. Specifically, when we find an exact match, we no longer need to compare it against less specific, generic rules.

SG, this matches what we had with InferenceModel also.

zetxqx · 2025-11-19T01:33:03Z

@ahg-g @nirrozenbaum @kfswain

This is ready for review now. I didn't split things up because I feel most of part is relevant but I'm open to split it up if this is too large for review.

The main changes are:

Modify the API: change the conflict precedence rule a bit, more specific match wins then older wins.
Add a separate modelrewrite datastore for dealing with modelrewrite memory store.
Wired up reconciler logic using the modelrewrite datastore.
Wired up director logic using the modelrewrite datastore.