LLM Dynamic Model Demo

A POC to prove that semantic routing and LORA adapters can work together.
Each request is semantically evaluated against pre-configured examples and forwarded onto the correct adapter.
Requests which fail the evaluation e.g. the router doesn't know where to route them is forwarded onto the default (Phi2) base model.

UI(openwebui.com) -> LiteLLM + Semantic Router -> vLLM with LORA adapters

Deploying and Configuring vLLM

oc create secret generic vllm-secrets --from-literal=HUGGING_FACE_HUB_TOKEN=hf_.....

helm install dynamicdemo . -n <your-namespace>

Downloading the models

As this is a demo the models have to be downloaded manually and uploaded to the PVC.

Base Model

Phi-2 from Microsoft is used as the base model.

This needs to be downloaded and stored into the /models-cache directory on the vllm pod.

The two LORA adapters used are:

These need to be downloaded and stored in the /models-cache/lora/ directory on the vllm pod.

Note

The lora sub-directory will need to be created beforehand.

Test locally

If you already have a running vLLM somewhere, you can experiment with the rest of the setup in your local machine as below:

Running the LiteLLM proxy

export BASE_API=https://vllm-noconnor-test.apps.prod.rhoai.rh-aiservices-bu.com/v1/
litellm --config litellm-config/pass_through_config.yaml

LiteLLM will start listening on http://localhost:4000

On startup the proxy will download the BAAI/bge-small-en-v1.5 embedding model used in the Semantic Router.

Running the OpenWebUI component

podman run -d -p 3000:8080 -e ENABLE_OLLAMA_API=false --net=host -e ENABLE_OPENAI_API=true -e GLOBAL_LOG_LEVEL=DEBUG -e OPENAI_API_KEY=sk-123 -e OPENAI_API_BASE_URL=http://localhost:4000 -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

The OpenWebUI will be listening on http://localhost:8080

Validating the deployment

To validate the connection between the openwebui and the litellm proxy click on the top left and you should see phi2, dcot and doctor models listed.

To validate that the proxy is working take a look at the console logs

INFO:     127.0.0.1:40130 - "GET /v1/models HTTP/1.1" 200 OK
[RouteChoice(name='dcot', function_call=None, similarity_score=0.6253249230126284)]
dcot
INFO:     127.0.0.1:37520 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[RouteChoice(name='doctor', function_call=None, similarity_score=0.8541370129211455), RouteChoice(name='dcot', function_call=None, similarity_score=0.7889642869682834)]
doctor
INFO:     127.0.0.1:44268 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Configuring the Semantic Router

The semantic router is invoked by a LiteLLM pre-invoke function and is run before the call to the actual LLM endpoint is made. This functions uses the semantic router framework to decide which models the request should be sent to.

The code is located in litellm-config/custom_router.py

References

Massive thanks to all these projects and the people involved.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
chart		chart
modelcar		modelcar
.gitignore		.gitignore
Containerfile		Containerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Dynamic Model Demo

Deploying and Configuring vLLM

Downloading the models

Base Model

Test locally

Running the LiteLLM proxy

Running the OpenWebUI component

Validating the deployment

Configuring the Semantic Router

References

About

Releases

Packages

Contributors 2

Languages

License

noelo/vllm-router-demo

Folders and files

Latest commit

History

Repository files navigation

LLM Dynamic Model Demo

Deploying and Configuring vLLM

Downloading the models

Base Model

Test locally

Running the LiteLLM proxy

Running the OpenWebUI component

Validating the deployment

Configuring the Semantic Router

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages