Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Letta does not see the ollama server (API call got non-200 response code) #2282

Open
hherpa opened this issue Dec 19, 2024 · 1 comment
Open

Comments

@hherpa
Copy link

hherpa commented Dec 19, 2024

Bug description

  • Letta does not see the ollama server. It seems to me that it's not about ollama, since everything works with llama_index and langchain

Install

pip install letta

Agent setting

from letta import create_client, LLMConfig, EmbeddingConfig

client = create_client()

agent_state = client.create_agent(
    llm_config=LLMConfig(
        model="qwen2.5:0.5b",
        model_endpoint_type="ollama",
        model_endpoint="http://localhost:11434",
        context_window=128000
    ), 
    embedding_config=EmbeddingConfig(
        embedding_endpoint_type="ollama",
        embedding_endpoint=None,
        embedding_model="all-minilm",
        embedding_dim=1536,
        embedding_chunk_size=300
    )
)

Launch ollama

image

Launch agent

response = client.send_message(
  agent_id=agent_state.id, 
  role="user", 
  message="hello"
)
print("Usage", response.usage)
print("Agent messages", response.messages)

Response

Letta.letta.server.server - ERROR - Error in server._step: API call got non-200 response code (code=500, msg={"error":"llama runner process has terminated: exit status 2"}) for address: http://localhost:11434/api/generate. Make sure that the ollama API server is running and reachable at http://localhost:11434/api/generate.
Traceback (most recent call last):
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\server\server.py", line 450, in _step
    usage_stats = letta_agent.step(
                  ^^^^^^^^^^^^^^^^^
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\agent.py", line 910, in step
    step_response = self.inner_step(
                    ^^^^^^^^^^^^^^^^
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\agent.py", line 1111, in inner_step
    raise e
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\agent.py", line 1026, in inner_step
    response = self._get_ai_reply(
               ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\agent.py", line 650, in _get_ai_reply
    raise e
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\agent.py", line 613, in _get_ai_reply
    response = create(
               ^^^^^^^
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\llm_api\llm_api_tools.py", line 100, in wrapper
    raise e
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\llm_api\llm_api_tools.py", line 69, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\llm_api\llm_api_tools.py", line 389, in create
    return get_chat_completion(
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\local_llm\chat_completion_proxy.py", line 167, in get_chat_completion
    result, usage = get_ollama_completion(endpoint, auth_type, auth_key, model, prompt, context_window)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\local_llm\ollama\api.py", line 68, in get_ollama_completion
    raise Exception(
Exception: API call got non-200 response code (code=500, msg={"error":"llama runner process has terminated: exit status 2"}) for address: http://localhost:11434/api/generate. Make sure that the ollama API server is running and reachable at http://localhost:11434/api/generate.
None
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[50], line 23
      6 agent_state = client.create_agent(
      7     llm_config=LLMConfig(
      8         model="qwen2.5:0.5b",
   (...)
     19     )
     20 )
     22 # Message an agent
---> 23 response = client.send_message(
     24   agent_id=agent_state.id, 
     25   role="user", 
     26   message="hello"
     27 )
     28 print("Usage", response.usage)
     29 print("Agent messages", response.messages)

File ~\AppData\Roaming\Python\Python311\site-packages\letta\client\client.py:2488, in LocalClient.send_message(self, message, role, name, agent_id, agent_name, stream_steps, stream_tokens)
   2485     raise NotImplementedError
   2486 self.interface.clear()
-> 2488 usage = self.server.send_messages(
   2489     actor=self.user,
   2490     agent_id=agent_id,
   2491     messages=[MessageCreate(role=MessageRole(role), text=message, name=name)],
   2492 )
   2494 ## TODO: need to make sure date/timestamp is propely passed
   2495 ## TODO: update self.interface.to_list() to return actual Message objects
   2496 ##       here, the message objects will have faulty created_by timestamps
   (...)
   2504 
   2505 # format messages
   2506 messages = self.interface.to_list()

File ~\AppData\Roaming\Python\Python311\site-packages\letta\server\server.py:761, in SyncServer.send_messages(self, actor, agent_id, messages, wrap_user_message, wrap_system_message, interface)
    758     raise ValueError(f"All messages must be of type Message or MessageCreate, got {[type(message) for message in messages]}")
    760 # Run the agent state forward
--> 761 return self._step(actor=actor, agent_id=agent_id, input_messages=message_objects, interface=interface)

File ~\AppData\Roaming\Python\Python311\site-packages\letta\server\server.py:450, in SyncServer._step(self, actor, agent_id, input_messages, interface)
    447 token_streaming = letta_agent.interface.streaming_mode if hasattr(letta_agent.interface, "streaming_mode") else False
    449 logger.debug(f"Starting agent step")
--> 450 usage_stats = letta_agent.step(
    451     messages=input_messages,
    452     chaining=self.chaining,
    453     max_chaining_steps=self.max_chaining_steps,
    454     stream=token_streaming,
    455     skip_verify=True,
    456 )
    458 # save agent after step
    459 save_agent(letta_agent)

File ~\AppData\Roaming\Python\Python311\site-packages\letta\agent.py:910, in Agent.step(self, messages, chaining, max_chaining_steps, **kwargs)
    908 kwargs["first_message"] = False
    909 kwargs["step_count"] = step_count
--> 910 step_response = self.inner_step(
    911     messages=next_input_message,
    912     **kwargs,
    913 )
    914 heartbeat_request = step_response.heartbeat_request
    915 function_failed = step_response.function_failed

File ~\AppData\Roaming\Python\Python311\site-packages\letta\agent.py:1111, in Agent.inner_step(self, messages, first_message, first_message_retry_limit, skip_verify, stream, step_count)
   1109 else:
   1110     printd(f"step() failed with an unrecognized exception: '{str(e)}'")
-> 1111     raise e

File ~\AppData\Roaming\Python\Python311\site-packages\letta\agent.py:1026, in Agent.inner_step(self, messages, first_message, first_message_retry_limit, skip_verify, stream, step_count)
   1023             raise Exception(f"Hit first message retry limit ({first_message_retry_limit})")
   1025 else:
-> 1026     response = self._get_ai_reply(
   1027         message_sequence=input_message_sequence,
   1028         first_message=first_message,
   1029         stream=stream,
   1030         step_count=step_count,
   1031     )
   1033 # Step 3: check if LLM wanted to call a function
   1034 # (if yes) Step 4: call the function
   1035 # (if yes) Step 5: send the info on the function call and function response to LLM
   1036 response_message = response.choices[0].message

File ~\AppData\Roaming\Python\Python311\site-packages\letta\agent.py:650, in Agent._get_ai_reply(self, message_sequence, function_call, first_message, stream, empty_response_retry_limit, backoff_factor, max_delay, step_count)
    646             time.sleep(delay)
    648     except Exception as e:
    649         # For non-retryable errors, exit immediately
--> 650         raise e
    652 raise Exception("Retries exhausted and no valid response received.")

File ~\AppData\Roaming\Python\Python311\site-packages\letta\agent.py:613, in Agent._get_ai_reply(self, message_sequence, function_call, first_message, stream, empty_response_retry_limit, backoff_factor, max_delay, step_count)
    611 for attempt in range(1, empty_response_retry_limit + 1):
    612     try:
--> 613         response = create(
    614             llm_config=self.agent_state.llm_config,
    615             messages=message_sequence,
    616             user_id=self.agent_state.created_by_id,
    617             functions=allowed_functions,
    618             # functions_python=self.functions_python, do we need this?
    619             function_call=function_call,
    620             first_message=first_message,
    621             force_tool_call=force_tool_call,
    622             stream=stream,
    623             stream_interface=self.interface,
    624         )
    626         # These bottom two are retryable
    627         if len(response.choices) == 0 or response.choices[0] is None:

File ~\AppData\Roaming\Python\Python311\site-packages\letta\llm_api\llm_api_tools.py:100, in retry_with_exponential_backoff.<locals>.wrapper(*args, **kwargs)
     98 # Raise exceptions for any errors not specified
     99 except Exception as e:
--> 100     raise e

File ~\AppData\Roaming\Python\Python311\site-packages\letta\llm_api\llm_api_tools.py:69, in retry_with_exponential_backoff.<locals>.wrapper(*args, **kwargs)
     67 while True:
     68     try:
---> 69         return func(*args, **kwargs)
     71     except requests.exceptions.HTTPError as http_err:
     73         if not hasattr(http_err, "response") or not http_err.response:

File ~\AppData\Roaming\Python\Python311\site-packages\letta\llm_api\llm_api_tools.py:389, in create(llm_config, messages, user_id, functions, functions_python, function_call, first_message, force_tool_call, use_tool_naming, stream, stream_interface, max_tokens, model_settings)
    387 if stream:
    388     raise NotImplementedError(f"Streaming not yet implemented for {llm_config.model_endpoint_type}")
--> 389 return get_chat_completion(
    390     model=llm_config.model,
    391     messages=messages,
    392     functions=functions,
    393     functions_python=functions_python,
    394     function_call=function_call,
    395     context_window=llm_config.context_window,
    396     endpoint=llm_config.model_endpoint,
    397     endpoint_type=llm_config.model_endpoint_type,
    398     wrapper=llm_config.model_wrapper,
    399     user=str(user_id),
    400     # hint
    401     first_message=first_message,
    402     # auth-related
    403     auth_type=model_settings.openllm_auth_type,
    404     auth_key=model_settings.openllm_api_key,
    405 )

File ~\AppData\Roaming\Python\Python311\site-packages\letta\local_llm\chat_completion_proxy.py:167, in get_chat_completion(model, messages, functions, functions_python, function_call, context_window, user, wrapper, endpoint, endpoint_type, function_correction, first_message, auth_type, auth_key)
    165     result, usage = get_koboldcpp_completion(endpoint, auth_type, auth_key, prompt, context_window, grammar=grammar)
    166 elif endpoint_type == "ollama":
--> 167     result, usage = get_ollama_completion(endpoint, auth_type, auth_key, model, prompt, context_window)
    168 elif endpoint_type == "vllm":
    169     result, usage = get_vllm_completion(endpoint, auth_type, auth_key, model, prompt, context_window, user)

File ~\AppData\Roaming\Python\Python311\site-packages\letta\local_llm\ollama\api.py:68, in get_ollama_completion(endpoint, auth_type, auth_key, model, prompt, context_window, grammar)
     66         result = result_full["response"]
     67     else:
---> 68         raise Exception(
     69             f"API call got non-200 response code (code={response.status_code}, msg={response.text}) for address: {URI}."
     70             + f" Make sure that the ollama API server is running and reachable at {URI}."
     71         )
     73 except:
     74     # TODO handle gracefully
     75     raise

Exception: API call got non-200 response code (code=500, msg={"error":"llama runner process has terminated: exit status 2"}) for address: http://localhost:11434/api/generate. Make sure that the ollama API server is running and reachable at http://localhost:11434/api/generate.
@Shua1
Copy link

Shua1 commented Jan 8, 2025

The error message says: Make sure that the ollama API server is running and reachable at http://localhost:11434/api/generate. Did you do that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants