-
Notifications
You must be signed in to change notification settings - Fork 201
SLO Aware Routing Sidecar + Plugin EPP Integration and Helm Deployment #1839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLO Aware Routing Sidecar + Plugin EPP Integration and Helm Deployment #1839
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hi @BenjaminBraunDev. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
2c56616 to
f63bc01
Compare
d177545 to
ddee4c7
Compare
ahg-g
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be great if you can send a separate PR for adding the running requests metric
|
@kaushikmitr @BenjaminBraunDev this is predicted latency, not slo, right? if so, please use |
…, add predictor to new 2 phase configuration parser
…n, running routines there, move predictor helm section into new tpl file, rename slo-aware-routing guide and names in docs
…cars not fail immediatly during EPP spinup
b94d598 to
1eb5d8a
Compare
|
Here's the issue regarding the |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BenjaminBraunDev, kfswain The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@kfswain Sorry, forgot to add a small bugfix to the scorer plugin, could you re-approve? |
|
Np, approval remains. It's the LGTM that's removed. I'll leave that to @kaushikmitr to stamp |
|
Sry, I wont make you chase down a person again, was just trying to let other people be involved is all. /lgtm |
|
Thanks! |
This PR is stage 3/3 for adding in the latency prediction and SLO-Aware Routing functionality to EPP.
New Features:
-enable-latency-predictorflag in EPP arg to inform it that sidecars are present and to register slo routing plugins.x-slo-ttft-msandx-slo-tpot-ms) and a boolean for whether to use the SLO routing scheduling profile with slo scoring (x-prediction-based-scheduling). If false, use the default profile and just track and train for future requests.Plugins
Registers and deploys the plugins added in #1849 via scheduling profiles:
PodMetrics
Adds (back) the
totalRunningRequestsMetricprometheus metric from vLLM, which was removed for being unused in the past, but is now a feature of our latency prediction model.Guide
Added a guide for how to deploy IGW with SLO-Aware Routing in site-src/guides/slo-aware-routing.md
Fixes #1323
Does this PR introduce a user-facing change?: