Merge pull request #10 from bifrostlab/ollama

Updated the documents and codes on how to use Litellm to create a unified interface for both ChatGPT and Ollama
bifrostlab · Feb 11, 2024 · d078140 · d078140
2 parents dfa8d56 + e85442e
commit d078140
Show file tree

Hide file tree

Showing 8 changed files with 444 additions and 75 deletions.
diff --git a/README_OLLAMA.md b/README_OLLAMA.md
diff --git a/llm_assistant/config.py b/llm_assistant/config.py
diff --git a/llm_assistant/demo_ollama.py b/llm_assistant/demo_ollama.py
diff --git a/llm_assistant/ollama/README.md b/llm_assistant/ollama/README.md
@@ -0,0 +1,31 @@
+# README
+## SET UP OLLAMA
+Please visit [here](https://github.com/ollama/ollama.git) and follow their instruction to install ollama on your machine.
+
+*Note*: After installing, you could run a small model (microsoft-phi2) for testing on your local machine
+
+```shell
+ollama run phi
+```
+
+
+## Use LiteLLM as a Proxy to re-use OpenAI interface to interact with both OpenAI models and Ollama
+
+Edit the proxy server config at `proxy_config.yaml`, add the desired models the the corresponding parameters. Please visit [here](https://docs.litellm.ai/docs/proxy/quick_start) for the available settings 
+
+Run the proxy server
+
+```shell
+# if you use OpenAI models
+export OPENAI_API_KEY=<your_key> 
+litellm --config llm_assistant/ollama/proxy_config.yaml
+```
+
+
+Modify OpenAI SDK to interact with our proxy server instead. In `openai_chat` we use OpenAI API SDK to interact with proxy server. Although the SDK require to have an api key, we **don't need to include it here**. We can use **any string value** for the api key. This is because we are interacting with the proxy server, not the OpenAI server. Please visit [here](https://github.com/BerriAI/litellm), Section "Quick Start Proxy - CLI" for more information.
+
+```shell
+python openai_chat.py gpt-3.5-turbo
+python openai_chat.py phi
+```
+
diff --git a/llm_assistant/ollama/openai_chat.py b/llm_assistant/ollama/openai_chat.py
@@ -0,0 +1,21 @@
+from openai import OpenAI
+import argparse
+
+# We don't need an actual api_key here. See `ollama/README.md`
+client = OpenAI(base_url="http://0.0.0.0:8000", api_key="FAKE")
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("model")
+    args = parser.parse_args()
+
+    response = client.chat.completions.create(model=args.model, messages = [
+        {
+            "role": "user",
+            "content": "write a short poem"
+        }
+    ], stream=True)
+
+    for chunk in response:
+        # print(chunk)
+        print(chunk.choices[0].delta.content, end="", flush=True)
diff --git a/llm_assistant/ollama/proxy_config.yaml b/llm_assistant/ollama/proxy_config.yaml
@@ -0,0 +1,12 @@
+model_list:
+  - model_name: gpt-3.5-turbo
+    litellm_params:
+      model: gpt-3.5-turbo
+  - model_name: gpt-4
+    litellm_params:
+      model: gpt-4
+  - model_name: phi
+    litellm_params:
+      model: ollama/phi
+litellm_params:
+  drop_params: True