Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running with Ollama #110

Open
OriginalGoku opened this issue Nov 30, 2024 · 4 comments
Open

Running with Ollama #110

OriginalGoku opened this issue Nov 30, 2024 · 4 comments
Labels
question Further information is requested

Comments

@OriginalGoku
Copy link

Thanks for this interesting project.
I got to know about this project while using Ollama. Since Ollama doesn't support log_prob, I was interested to try Optillm.

I have been trying for the last few hours to get Optillm to work with a local Ollama but somehow your documentation isn't clear.
I am running Ollama on Mac M3.
I have numerous different models on my Ollama and they all give me the same type of error.

2024-11-29 22:46:35,328 - ERROR - Error processing request: Incorrect path_or_model_id: 'llama3.2:1b'. Please provide either the path to a local folder or the repo_id of a model on the Hub.

here is my inference file:
OPENAI_KEY = os.environ.get("OPENAI_API_KEY", "optillm") OPENAI_BASE_URL = "http://localhost:8000/v1" OPTILLM_API_KEY="optillm" messages=[{ "role": "user","content": "<optillm_approach>re2</optillm_approach> How many r's are there in strawberry?" }] client = OpenAI(api_key=OPENAI_KEY, base_url=OPENAI_BASE_URL) model_name ="llama3.2:1b" response = client.chat.completions.create( model=model_name, messages=messages, temperature=0.2, logprobs = True, top_logprobs = 3, )

Your help is highly appreciated.

@codelion
Copy link
Owner

To run optillm with ollama:

  1. just run your model as you would
    ollama run llama3.2:1b

this will create an OpenAI API compatible server at http://localhost:11434as mentioned in the docs

  1. run optillm with that base_url

optillm --base-url http://localhost:11434/v1

this will run optillm proxy at http://localhost:8000 with the external inference server provided by ollama

  1. Use OpenAI Client sdk with base_url of proxy
OPENAI_API_KEY= "sk-no-key"
OPENAI_BASE_URL= "http://localhost:8000/v1"
client = OpenAI(api_key=OPENAI_KEY, base_url=OPENAI_BASE_URL)

@codelion codelion added the question Further information is requested label Nov 30, 2024
@codelion
Copy link
Owner

codelion commented Nov 30, 2024

Running with ollama will not give you log_probs, as they do not support it. To use log_probs you need to run with the inbuilt inference server, as mentioned in the docs.

  1. set OPTILLM_API_KEY

export OPTILLM_API_KEY=optillm

  1. run optillm proxy

optillm

this will run the proxy at http://localhost:8000/

  1. Use OpenAI client with the proxy
OPENAI_API_KEY = "optillm"
OPENAI_BASE_URL= "http://localhost:8000/v1"
client = OpenAI(api_key=OPENAI_KEY, base_url=OPENAI_BASE_URL)

response = client.chat.completions.create(
  model="meta-llama/Llama-3.2-1B-Instruct",
  messages=messages,
  temperature=0.2,
  logprobs = True,
  top_logprobs = 3
)

@OriginalGoku
Copy link
Author

Thank you for your prompt reply.
I will look into the inbuilt inference as well.
But I have followed your instruction but I still can't get Ollama to work.

Here is the terminal where I am running Ollama:
Screenshot 2024-11-30 at 1 35 23 PM

Here is my inference code:

`import os
from openai import OpenAI
import openai
from dotenv import load_dotenv
from timer import Timer

load_dotenv()

OPENAI_KEY = os.getenv("OPENAI_API_KEY", "optillm")
HF_TOKEN=os.getenv("HF_TOKEN")

OPENAI_BASE_URL = "http://localhost:8000/v1"
OPTILLM_API_KEY="optillm"
client = OpenAI(api_key=OPENAI_KEY, base_url=OPENAI_BASE_URL)

messages=[{ "role": "user","content": "<optillm_approach>re2</optillm_approach> How many r's are there in strawberry?" }]

model_name ="llama3.2:1b"
with Timer():
response = client.chat.completions.create(
model=model_name,
messages=messages,
temperature=0.2,
logprobs = True,
top_logprobs = 3,
)
`

and here is the terminal error from the inference code:
python local_inference.py Elapsed time: 1.3044 seconds Traceback (most recent call last): File "/Users/god/vs_code/optillm/local_inference.py", line 20, in <module> response = client.chat.completions.create( File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_utils/_utils.py", line 275, in wrapper return func(*args, **kwargs) File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 829, in create return self._post( File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 1280, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 957, in request return self._request( File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 1046, in _request return self._retry_request( File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 1095, in _retry_request return self._request( File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 1046, in _request return self._retry_request( File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 1095, in _retry_request return self._request( File "/Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 1061, in _request raise self._make_status_error_from_response(err.response) from None openai.InternalServerError: Error code: 500 - {'error': "Incorrect path_or_model_id: 'llama3.2:1b'. Please provide either the path to a local folder or the repo_id of a model on the Hub."}

Here is the log from optillm.py:
`python optillm.py --base-url http://localhost:11434/v1
2024-11-30 11:25:11,966 - INFO - Looking for package plugins in: /Users/god/vs_code/optillm/optillm/plugins
2024-11-30 11:25:11,966 - INFO - Found package plugin files: ['/Users/god/vs_code/optillm/optillm/plugins/coc_plugin.py', '/Users/god/vs_code/optillm/optillm/plugins/executecode_plugin.py', '/Users/god/vs_code/optillm/optillm/plugins/readurls_plugin.py', '/Users/god/vs_code/optillm/optillm/plugins/router_plugin.py', '/Users/god/vs_code/optillm/optillm/plugins/privacy_plugin.py', '/Users/god/vs_code/optillm/optillm/plugins/memory_plugin.py']
2024-11-30 11:25:11,967 - ERROR - Error loading package plugin /Users/god/vs_code/optillm/optillm/plugins/coc_plugin.py: f-string expression part cannot include a backslash (coc_plugin.py, line 119)
2024-11-30 11:25:12,109 - INFO - Loaded package plugin: executecode
2024-11-30 11:25:12,130 - INFO - Loaded package plugin: readurls
2024-11-30 11:25:12,968 - INFO - Loaded package plugin: router
2024-11-30 11:25:13,257 - INFO - Loaded package plugin: privacy
2024-11-30 11:25:13,637 - INFO - Loaded package plugin: memory
2024-11-30 11:25:13,638 - INFO - Starting server with approach: auto
2024-11-30 11:25:13,638 - INFO - Server configuration: {'approach': 'auto', 'mcts_simulations': 2, 'mcts_exploration': 0.2, 'mcts_depth': 1, 'best_of_n': 3, 'model': 'gpt-4o-mini', 'rstar_max_depth': 3, 'rstar_num_rollouts': 5, 'rstar_c': 1.4, 'n': 1, 'base_url': 'http://localhost:11434/v1', 'optillm_api_key': '[REDACTED]', 'return_full_response': False, 'port': 8000, 'log': 'info'}

  • Serving Flask app 'optillm'
  • Debug mode: off
    2024-11-30 11:25:13,644 - INFO - WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
  • Running on all addresses (0.0.0.0)
  • Running on http://127.0.0.1:8000
  • Running on http://192.168.1.12:8000
    2024-11-30 11:25:13,644 - INFO - Press CTRL+C to quit
    2024-11-30 11:32:01,966 - INFO - Received request to /v1/chat/completions
    message = {'role': 'user', 'content': "<optillm_approach>re2</optillm_approach> How many r's are there in strawberry?"}
    /Users/god/vs_code/optillm/.venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
    warn("The installed version of bitsandbytes was compiled without GPU support. "
    'NoneType' object has no attribute 'cadam32bit_grad_fp32'
    2024-11-30 11:32:02,039 - INFO - Using device: mps
    2024-11-30 11:32:02,040 - INFO - Using approach(es) ['re2'], operation SINGLE, with model llama3.2:1b
    2024-11-30 11:32:02,040 - INFO - Using RE2 approach for query processing
    2024-11-30 11:32:02,040 - INFO - Loading base model: llama3.2:1b
    2024-11-30 11:32:02,040 - INFO - Using device: mps
    2024-11-30 11:32:02,040 - ERROR - Error in RE2 approach: Incorrect path_or_model_id: 'llama3.2:1b'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
    2024-11-30 11:32:02,040 - ERROR - Error processing request: Incorrect path_or_model_id: 'llama3.2:1b'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
    2024-11-30 11:32:02,041 - INFO - 127.0.0.1 - - [30/Nov/2024 11:32:02] "POST /v1/chat/completions HTTP/1.1" 500 -`

Your support is highly appreciated.

@codelion
Copy link
Owner

codelion commented Nov 30, 2024

If you are going to use ollama do not set the OPTILLM_API_KEY. Instead just set your OPENAI_API_KEY to sk-no-key as I mentioned in the comment above. Also if you are using ollama log_probs won’t work so you need to remove logprobs = True,
top_logprobs = 3,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants