Running with Ollama #110

OriginalGoku · 2024-11-30T04:09:52Z

Thanks for this interesting project.
I got to know about this project while using Ollama. Since Ollama doesn't support log_prob, I was interested to try Optillm.

I have been trying for the last few hours to get Optillm to work with a local Ollama but somehow your documentation isn't clear.
I am running Ollama on Mac M3.
I have numerous different models on my Ollama and they all give me the same type of error.

2024-11-29 22:46:35,328 - ERROR - Error processing request: Incorrect path_or_model_id: 'llama3.2:1b'. Please provide either the path to a local folder or the repo_id of a model on the Hub.

here is my inference file:
OPENAI_KEY = os.environ.get("OPENAI_API_KEY", "optillm") OPENAI_BASE_URL = "http://localhost:8000/v1" OPTILLM_API_KEY="optillm" messages=[{ "role": "user","content": "<optillm_approach>re2</optillm_approach> How many r's are there in strawberry?" }] client = OpenAI(api_key=OPENAI_KEY, base_url=OPENAI_BASE_URL) model_name ="llama3.2:1b" response = client.chat.completions.create( model=model_name, messages=messages, temperature=0.2, logprobs = True, top_logprobs = 3, )

Your help is highly appreciated.

The text was updated successfully, but these errors were encountered:

codelion · 2024-11-30T07:55:26Z

To run optillm with ollama:

just run your model as you would
ollama run llama3.2:1b

this will create an OpenAI API compatible server at http://localhost:11434as mentioned in the docs

run optillm with that base_url

optillm --base-url http://localhost:11434/v1

this will run optillm proxy at http://localhost:8000 with the external inference server provided by ollama

Use OpenAI Client sdk with base_url of proxy

OPENAI_API_KEY= "sk-no-key"
OPENAI_BASE_URL= "http://localhost:8000/v1"
client = OpenAI(api_key=OPENAI_KEY, base_url=OPENAI_BASE_URL)

codelion · 2024-11-30T07:58:59Z

Running with ollama will not give you log_probs, as they do not support it. To use log_probs you need to run with the inbuilt inference server, as mentioned in the docs.

set OPTILLM_API_KEY

export OPTILLM_API_KEY=optillm

run optillm proxy

optillm

this will run the proxy at http://localhost:8000/

Use OpenAI client with the proxy

OPENAI_API_KEY = "optillm"
OPENAI_BASE_URL= "http://localhost:8000/v1"
client = OpenAI(api_key=OPENAI_KEY, base_url=OPENAI_BASE_URL)

response = client.chat.completions.create(
  model="meta-llama/Llama-3.2-1B-Instruct",
  messages=messages,
  temperature=0.2,
  logprobs = True,
  top_logprobs = 3
)

OriginalGoku · 2024-11-30T18:38:20Z

Thank you for your prompt reply.
I will look into the inbuilt inference as well.
But I have followed your instruction but I still can't get Ollama to work.

Here is the terminal where I am running Ollama:

Here is my inference code:

`import os
from openai import OpenAI
import openai
from dotenv import load_dotenv
from timer import Timer

load_dotenv()

OPENAI_KEY = os.getenv("OPENAI_API_KEY", "optillm")
HF_TOKEN=os.getenv("HF_TOKEN")

OPENAI_BASE_URL = "http://localhost:8000/v1"
OPTILLM_API_KEY="optillm"
client = OpenAI(api_key=OPENAI_KEY, base_url=OPENAI_BASE_URL)

messages=[{ "role": "user","content": "<optillm_approach>re2</optillm_approach> How many r's are there in strawberry?" }]

model_name ="llama3.2:1b"
with Timer():
response = client.chat.completions.create(
model=model_name,
messages=messages,
temperature=0.2,
logprobs = True,
top_logprobs = 3,
)
`

and here is the terminal error from the inference code:
python local_inference.py Elapsed time: 1.3044 seconds Traceback (most recent call last): File "/Users/god/vs_code/optillm/local_inference.py", line 20, in <module> response = client.chat.completions.create( File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_utils/_utils.py", line 275, in wrapper return func(*args, **kwargs) File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 829, in create return self._post( File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 1280, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 957, in request return self._request( File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 1046, in _request return self._retry_request( File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 1095, in _retry_request return self._request( File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 1046, in _request return self._retry_request( File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 1095, in _retry_request return self._request( File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 1061, in _request raise self._make_status_error_from_response(err.response) from None openai.InternalServerError: Error code: 500 - {'error': "Incorrect path_or_model_id: 'llama3.2:1b'. Please provide either the path to a local folder or the repo_id of a model on the Hub."}

Here is the log from optillm.py:
`python optillm.py --base-url http://localhost:11434/v1
2024-11-30 11:25:11,966 - INFO - Looking for package plugins in: /Users/god/vs_code/optillm/optillm/plugins
2024-11-30 11:25:11,966 - INFO - Found package plugin files: ['/Users/god/vs_code/optillm/optillm/plugins/coc_plugin.py', '/Users/god/vs_code/optillm/optillm/plugins/executecode_plugin.py', '/Users/god/vs_code/optillm/optillm/plugins/readurls_plugin.py', '/Users/god/vs_code/optillm/optillm/plugins/router_plugin.py', '/Users/god/vs_code/optillm/optillm/plugins/privacy_plugin.py', '/Users/god/vs_code/optillm/optillm/plugins/memory_plugin.py']
2024-11-30 11:25:11,967 - ERROR - Error loading package plugin /Users/god/vs_code/optillm/optillm/plugins/coc_plugin.py: f-string expression part cannot include a backslash (coc_plugin.py, line 119)
2024-11-30 11:25:12,109 - INFO - Loaded package plugin: executecode
2024-11-30 11:25:12,130 - INFO - Loaded package plugin: readurls
2024-11-30 11:25:12,968 - INFO - Loaded package plugin: router
2024-11-30 11:25:13,257 - INFO - Loaded package plugin: privacy
2024-11-30 11:25:13,637 - INFO - Loaded package plugin: memory
2024-11-30 11:25:13,638 - INFO - Starting server with approach: auto
2024-11-30 11:25:13,638 - INFO - Server configuration: {'approach': 'auto', 'mcts_simulations': 2, 'mcts_exploration': 0.2, 'mcts_depth': 1, 'best_of_n': 3, 'model': 'gpt-4o-mini', 'rstar_max_depth': 3, 'rstar_num_rollouts': 5, 'rstar_c': 1.4, 'n': 1, 'base_url': 'http://localhost:11434/v1', 'optillm_api_key': '[REDACTED]', 'return_full_response': False, 'port': 8000, 'log': 'info'}

Serving Flask app 'optillm'
Debug mode: off
2024-11-30 11:25:13,644 - INFO - WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
Running on all addresses (0.0.0.0)
Running on http://127.0.0.1:8000
Running on http://192.168.1.12:8000
2024-11-30 11:25:13,644 - INFO - Press CTRL+C to quit
2024-11-30 11:32:01,966 - INFO - Received request to /v1/chat/completions
message = {'role': 'user', 'content': "<optillm_approach>re2</optillm_approach> How many r's are there in strawberry?"}
/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
2024-11-30 11:32:02,039 - INFO - Using device: mps
2024-11-30 11:32:02,040 - INFO - Using approach(es) ['re2'], operation SINGLE, with model llama3.2:1b
2024-11-30 11:32:02,040 - INFO - Using RE2 approach for query processing
2024-11-30 11:32:02,040 - INFO - Loading base model: llama3.2:1b
2024-11-30 11:32:02,040 - INFO - Using device: mps
2024-11-30 11:32:02,040 - ERROR - Error in RE2 approach: Incorrect path_or_model_id: 'llama3.2:1b'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
2024-11-30 11:32:02,040 - ERROR - Error processing request: Incorrect path_or_model_id: 'llama3.2:1b'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
2024-11-30 11:32:02,041 - INFO - 127.0.0.1 - - [30/Nov/2024 11:32:02] "POST /v1/chat/completions HTTP/1.1" 500 -`

Your support is highly appreciated.

codelion · 2024-11-30T20:01:19Z

If you are going to use ollama do not set the OPTILLM_API_KEY. Instead just set your OPENAI_API_KEY to sk-no-key as I mentioned in the comment above. Also if you are using ollama log_probs won’t work so you need to remove logprobs = True,
top_logprobs = 3,

codelion added the question Further information is requested label Nov 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running with Ollama #110

Running with Ollama #110

OriginalGoku commented Nov 30, 2024

codelion commented Nov 30, 2024

codelion commented Nov 30, 2024 •

edited

Loading

OriginalGoku commented Nov 30, 2024

codelion commented Nov 30, 2024 •

edited

Loading

Running with Ollama #110

Running with Ollama #110

Comments

OriginalGoku commented Nov 30, 2024

codelion commented Nov 30, 2024

codelion commented Nov 30, 2024 • edited Loading

OriginalGoku commented Nov 30, 2024

codelion commented Nov 30, 2024 • edited Loading

codelion commented Nov 30, 2024 •

edited

Loading

codelion commented Nov 30, 2024 •

edited

Loading