-
Notifications
You must be signed in to change notification settings - Fork 200
feat: Implement Model Rewrite and Traffic Splitting Logic #1820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
5302959 to
2c27884
Compare
|
/retest |
2c27884 to
48f4afa
Compare
48f4afa to
8485ba6
Compare
ae26313 to
3e9939f
Compare
29119d4 to
bf1ddce
Compare
| // Across all rules specified on applicable rewrites, precedence MUST be | ||
| // given to the match having an "Exact" model match over a generic match | ||
| // (a rule with an empty `matches` array). | ||
| // | ||
| // If ties still exist across multiple InferenceModelRewrite resources (e.g. | ||
| // two rewrites both have an exact match for the same model), matching | ||
| // precedence MUST be determined by the oldest resource based on | ||
| // creation timestamp. | ||
| // | ||
| // If ties still exist within a single InferenceModelRewrite resource, the | ||
| // FIRST matching rule (in list order) is used. | ||
| // +required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the precedence rules for conflicting matches to better align with the HTTPRoute specification in the Kubernetes Gateway API. https://github.com/kubernetes-sigs/gateway-api/blob/f24f3a61f398c65ab629da1843cb65fd5ec9419f/apis/v1/httproute_types.go#L148-L209
The new precedence order is:
- More specific wins: An Exact match always takes precedence over an All match (where the matches array is empty).
- Tie-Breaker (Oldest Rule): If the specificity of the rules is the same (a tie), the rule that was created or deployed first (the older rule) wins.
This approach is more intuitive and simplifies the implementation of efficient RewriteRule fetching per request. Specifically, when we find an exact match, we no longer need to compare it against less specific, generic rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG, this matches what we had with InferenceModel also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
bf1ddce to
dc0cca4
Compare
|
This is ready for review now. I didn't split things up because I feel most of part is relevant but I'm open to split it up if this is too large for review. The main changes are:
|
| } | ||
| } | ||
|
|
||
| func (rr rewriteRuleWithMetadata) isGeneric() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the intended usecase here? a flat rewrite for any incoming req?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isGeneric means the rule matches any requests.
This is for empty matches, the use case we are targeting is the user wants to rewrite the model from all requests blindly.
| func (ms *ModelRewriteStore) GetRule(modelName string) *v1alpha2.InferenceModelRewriteRule { | ||
| // Exact matches have the highest precedence. | ||
| if rulesWithMd, ok := ms.rulesByExactModelMatch[modelName]; ok && len(rulesWithMd) > 0 { | ||
| return &rulesWithMd[0].rule // The list is pre-sorted, so the first element is the oldest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we intent to flag this status in a controller in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should if we have a separate controller for this in the future.
Here, we have to make this deterministic even we don't update the status right now. At least, user can read the API comment and figure it out by themselves if some rules does not take effect.
| }) | ||
|
|
||
| for model := range ms.rulesByExactModelMatch { | ||
| sort.Slice(ms.rulesByExactModelMatch[model], func(i, j int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to think through when you would want multiple alias models to map to a range of target models. By allowing an array:
| Matches []Match `json:"matches,omitempty"` |
We increase the chance of a name collision and make our reconciler code more complex, which is more opportunity for bugs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, here we are discussing two parts:
- should
targetsbe an array? - should
matchesbe an array?
for 1: The targets here is for traffic splitting. So a list of targets is a must have. for example
targets:
- modelRerwrites: "v1",
weight: 50
- modelRewrites: "v2",
weight: 50
for 2: this is really about the user experience and use case. Making mathces an array will help user to aggregate modelRewrites if they are targeting to the same targets. This may sounds a rare use case. But we at least keep the possibility. E.g. the following redirecting both "foodreview" and "spicyfoodreview" to the same traffic splitting logic.
apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModelRewrite
metadata:
name: food-review-routing
namespace: llm-d-pfc-cpu
spec:
poolRef:
group: inference.networking.k8s.io
name: llm-d-infpool
rules:
- matches:
- model:
type: Exact
value: "foodreview"
- model:
type: Exact
value: "spicyfoodreview"
split:
- modelRewrite: "foodreview-v1"
weight: 50
- modelRewrite: "foodreview-v2"
weight: 50
w/o matches as an array people has to declare more rules as follow. They have to duplicate the targets.
apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModelRewrite
metadata:
name: food-review-routing
namespace: llm-d-pfc-cpu
spec:
poolRef:
group: inference.networking.k8s.io
name: llm-d-infpool
rules:
match:
- model:
type: Exact
value: "foodreview"
split:
- modelRewrite: "foodreview-v1"
weight: 50
- modelRewrite: "foodreview-v2"
weight: 50
match
- model
type: Exact
value: "spicyfoodreview". split:
- modelRewrite: "foodreview-v1"
weight: 50
- modelRewrite: "foodreview-v2"
weight: 50
IMO, I prefer keeping match as an array. Implementation wise it's not that complicated, as what i implemented here we just need to expand the matches to different rewriteRuleWithMetadata and store them in the datastore.
| // It always returns the requestContext even in the error case, as the request context is used in error handling. | ||
| func (d *Director) HandleRequest(ctx context.Context, reqCtx *handlers.RequestContext) (*handlers.RequestContext, error) { | ||
| logger := log.FromContext(ctx) | ||
| d.applyWeightedModelRewrite(reqCtx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we validated that this works? We are changing the content length of the body and i think we need to change the corresponding header, I don't remember if we removed that code of not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified, this is working. From my manual testing, we don't need to change the header, just body mutation is enough.
3c2c178 to
c6527d8
Compare
c6527d8 to
30a058a
Compare
30a058a to
19d12d6
Compare
|
I believe the failure of CRD CI(recently added) is as expected: @kfswain @ahg-g @nirrozenbaum this is ready for another review now |
ahg-g
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, any chance we can add integration tests?
| // Across all rules specified on applicable rewrites, precedence MUST be | ||
| // given to the match having an "Exact" model match over a generic match | ||
| // (a rule with an empty `matches` array). | ||
| // | ||
| // If ties still exist across multiple InferenceModelRewrite resources (e.g. | ||
| // two rewrites both have an exact match for the same model), matching | ||
| // precedence MUST be determined by the oldest resource based on | ||
| // creation timestamp. | ||
| // | ||
| // If ties still exist within a single InferenceModelRewrite resource, the | ||
| // FIRST matching rule (in list order) is used. | ||
| // +required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
c7400c0 to
8cc56d9
Compare
Added a test case in the integration hermetic test. |
| // +kubebuilder:validation:MinItems=1 | ||
| // | ||
| Targets []TargetModel `json:"split,omitempty"` | ||
| Targets []TargetModel `json:"targets,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this is a miss. updated the "split" to "targets"
|
Thanks, this looks good to me; leaving the approval to Kellen and Nir. |
8cc56d9 to
a109e90
Compare
|
/approve how do we get past the validation check? |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, zetxqx The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
as maintainers, we have the option to manually bypass the validation check which obviously shouldn't be enforced here cause this CRD and its handling is still under development. of course this is not a breaking change because it's non functional yet. @ahg-g @kfswain I recommend to bypass manually rather than adding automation for that, which is much more controlled. |
|
Sounds good to me |

What type of PR is this?
/kind feature
What this PR does / why we need it:
This pull request introduces the core logic for model rewriting and weighted traffic
splitting within the request control director. It also includes the reconciler logic for
InferenceModelRewriteresources.Key changes:
pkg/epp/requestcontrol:applyWeightedModelRewritefunction to handle model rewriting based onInferenceModelRewriterules.InferenceModelRewriteresource is respected in case ofduplicate rules
pkg/epp/controller:InferenceModelRewriteresourcesWhich issue(s) this PR fixes:
Fixes partially #1811
Does this PR introduce a user-facing change?: