Skip to content

OpenWebUI + LiteLLM + SemanticRouter + vLLM -> LLM semantic routing demo with Lora

License

Notifications You must be signed in to change notification settings

noelo/vllm-router-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Dynamic Model Demo

  • A POC to prove that semantic routing and LORA adapters can work together.
  • Each request is semantically evaluated against pre-configured examples and forwarded onto the correct adapter.
  • Requests which fail the evaluation e.g. the router doesn't know where to route them is forwarded onto the default (Phi2) base model.

UI(openwebui.com) -> LiteLLM + Semantic Router -> vLLM with LORA adapters

Deploying and Configuring vLLM

oc create secret generic vllm-secrets --from-literal=HUGGING_FACE_HUB_TOKEN=hf_.....
helm install dynamicdemo . -n <your-namespace>

Downloading the models

As this is a demo the models have to be downloaded manually and uploaded to the PVC.

Base Model

Phi-2 from Microsoft is used as the base model.

This needs to be downloaded and stored into the /models-cache directory on the vllm pod.

The two LORA adapters used are:

These need to be downloaded and stored in the /models-cache/lora/ directory on the vllm pod.

Note

The lora sub-directory will need to be created beforehand.

Test locally

If you already have a running vLLM somewhere, you can experiment with the rest of the setup in your local machine as below:

Running the LiteLLM proxy

export BASE_API=https://vllm-noconnor-test.apps.prod.rhoai.rh-aiservices-bu.com/v1/
litellm --config litellm-config/pass_through_config.yaml

LiteLLM will start listening on http://localhost:4000

On startup the proxy will download the BAAI/bge-small-en-v1.5 embedding model used in the Semantic Router.

Running the OpenWebUI component

podman run -d -p 3000:8080 -e ENABLE_OLLAMA_API=false --net=host -e ENABLE_OPENAI_API=true -e GLOBAL_LOG_LEVEL=DEBUG -e OPENAI_API_KEY=sk-123 -e OPENAI_API_BASE_URL=http://localhost:4000 -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

The OpenWebUI will be listening on http://localhost:8080

Validating the deployment

To validate the connection between the openwebui and the litellm proxy click on the top left and you should see phi2, dcot and doctor models listed.

To validate that the proxy is working take a look at the console logs

INFO:     127.0.0.1:40130 - "GET /v1/models HTTP/1.1" 200 OK
[RouteChoice(name='dcot', function_call=None, similarity_score=0.6253249230126284)]
dcot
INFO:     127.0.0.1:37520 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[RouteChoice(name='doctor', function_call=None, similarity_score=0.8541370129211455), RouteChoice(name='dcot', function_call=None, similarity_score=0.7889642869682834)]
doctor
INFO:     127.0.0.1:44268 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Configuring the Semantic Router

The semantic router is invoked by a LiteLLM pre-invoke function and is run before the call to the actual LLM endpoint is made. This functions uses the semantic router framework to decide which models the request should be sent to.

The code is located in litellm-config/custom_router.py

References

Massive thanks to all these projects and the people involved.

About

OpenWebUI + LiteLLM + SemanticRouter + vLLM -> LLM semantic routing demo with Lora

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published