Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama: Stream Always Fails #667

Closed
YanSte opened this issue Jan 13, 2025 · 9 comments
Closed

Ollama: Stream Always Fails #667

YanSte opened this issue Jan 13, 2025 · 9 comments

Comments

@YanSte
Copy link
Contributor

YanSte commented Jan 13, 2025

Hi,

Using stream, structured, text with Ollama always fails.

With any models with tools.

Code:

from datetime import date
from pydantic import ValidationError
from typing_extensions import TypedDict
from pydantic_ai import Agent


class UserProfile(TypedDict, total=False):
    name: str
    dob: date
    bio: str

# Any models with tools
# 
agent = Agent('ollama:llama3.2', result_type=UserProfile)


async def main():
    user_input = 'My name is Ben, I was born on January 28th 1990, I like the chain the dog and the pyramid.'
    async with agent.run_stream(user_input) as result:
        async for message, last in result.stream():  
        # ....

Error:

09:22:36.733   preparing model and tools run_step=1
09:22:36.734   model request run_step=1
09:22:38.064   handle model response
09:22:38.066   preparing model and tools run_step=2
09:22:38.067   model request run_step=2
09:22:39.605   handle model response
---> [21](vscode-notebook-cell:?execution_count=18&line=21) async with agent.run_stream(user_input) as result:
     [22](vscode-notebook-cell:?execution_count=18&line=22)     async for message in result.stream():  
     [23](vscode-notebook-cell:?execution_count=18&line=23)         print(message)

File ~/.pyenv/versions/3.12.0/lib/python3.12/contextlib.py:204, in _AsyncGeneratorContextManager.__aenter__(self)
    [202](~/.pyenv/versions/3.12.0/lib/python3.12/contextlib.py:202) del self.args, self.kwds, self.func
    [203](~/.pyenv/versions/3.12.0/lib/python3.12/contextlib.py:203) try:

--> [204](~/.pyenv/versions/3.12.0/lib/python3.12/contextlib.py:204)     return await anext(self.gen)
    [205](~/.pyenv/versions/3.12.0/lib/python3.12/contextlib.py:205) except StopAsyncIteration:
    [206](~/.pyenv/versions/3.12.0/lib/python3.12/contextlib.py:206)     raise RuntimeError("generator didn't yield") from None

File ~:538, in Agent.run_stream(self, user_prompt, result_type, message_history, model, deps, model_settings, usage_limits, usage, infer_name)
    [535](:535) model_req_span.__exit__(None, None, None)
    [537](:537) with _logfire.span('handle model response') as handle_span:

--> [538](:538)     maybe_final_result = await self._handle_streamed_model_response(
    [539](:539)         model_response, run_context, result_schema
    [540](:540)     )
    [542](:542)     # Check if we got a final result
    [543](:543)     if isinstance(maybe_final_result, _MarkFinalResult):

File ~:1202, in Agent._handle_streamed_model_response(self, model_response, run_context, result_schema)
   [1200](:1200)     return _MarkFinalResult(model_response, None)
   [1201](:1201) else:
-> [1202](:1202)     self._incr_result_retry(run_context)
   [1203](:1203)     response = _messages.RetryPromptPart(
   [1204](:1204)         content='Plain text responses are not permitted, please call one of the functions instead.',
   [1205](:1205)     )
   [1206](:1206)     # stream the response, so usage is correct

File ~:1270, in Agent._incr_result_retry(self, run_context)
   [1268](:1268) run_context.retry += 1
   [1269](:1269) if run_context.retry > self._max_result_retries:
-> [1270](:1270)     raise exceptions.UnexpectedModelBehavior(
   [1271](:1271)         f'Exceeded maximum retries ({self._max_result_retries}) for result validation'
   [1272](:1272)     )
@YanSte YanSte changed the title Ollama: Impossible to stream Ollama: Stream with Ollama Always Fails Jan 13, 2025
@YanSte YanSte changed the title Ollama: Stream with Ollama Always Fails Ollama: Stream Always Fails Jan 13, 2025
@SiddarthNarayanan01
Copy link

SiddarthNarayanan01 commented Jan 15, 2025

The issue doesn't seem to be with the stream, but rather that the model is not outputting the correct UserProfile result type, as seen by the error message: "Exceeded maximum retries for result validation".

A couple of things to note:

  1. Your DOB field should be json serializable and deserializable, so make it a separate class
  2. Try giving a system prompt that tells the model it should be extracting the data from the input query.
  3. Try annotating your result type with descriptions (see below)

To annotate the result type you can do the following

from pydantic import BaseModel, Field

class DOB(BaseModel):
    year: int
    month: int = Field(ge=1, le=12)
    day: int = Field(ge=1, le=31) # Of course would still allow some invalid dates but that can be validated later

class UserProfile(BaseModel):
    name: str = Field(description="user's name")
    dob: DOB = Field(description="user's date of birth, output json format {year: number, month: number, day: number}")
    bio: str = Field(description="Miscellaneous user bio information")

Hope this helps!

@SiddarthNarayanan01
Copy link

SiddarthNarayanan01 commented Jan 15, 2025

Ahh, I was looking at the documentation and I think realize what's going on. So the reason why the example that you shared would work, is because you can construct a datetime.date object from an iso formatted string, which is likely what Pydantic is doing under the hood. If you don't mention that the date should be ISO format, the model is likely trying to output a locale-specific date and thus the parsing doesn't work.

So the best solution is likely to mention in the annotation of your dob field to output "ISO formatted dates"

class UserProfile(BaseModel):
    name: str = Field(description="user's name")
    dob: date = Field(description="user's date of birth, ISO format")
    bio: str = Field(description="Miscellaneous user bio information")

@YanSte
Copy link
Contributor Author

YanSte commented Jan 15, 2025

Thanks, @SiddarthNarayanan01, for getting back to me.

That was just an example.

Here’s another one with a similar issue.
Basemodel or TypedDict nothing work

class UserProfile(BaseModel):  
    name: str
    subname: str 

async def main():
    user_input = '...'
    async with agent.run_stream(user_input) as result:
        async for message, last in result.stream(): 

I've tried everything, but streaming with Ollama doesn't seem to work.
Also with the examples from Pydantic AI.

But run method works well.

@YanSte
Copy link
Contributor Author

YanSte commented Jan 15, 2025

The error come from

async def _handle_streamed_model_response(

giving a models.StreamTextResponse

and self._allow_text_result return false.

@YanSte
Copy link
Contributor Author

YanSte commented Jan 15, 2025

Only stream a structured response doen't work.

@SiddarthNarayanan01
Copy link

Ah, I misunderstood your question, sorry about that! Interestingly enough, I too have issues getting streaming with Ollama to work, but instead of the errors you are facing I get the entire response in one go (no chunking).

Also, the moment the model calls a tool I get an empty response (still no errors). Not sure why this is happening. I'll look at it further.

@YanSte
Copy link
Contributor Author

YanSte commented Jan 16, 2025

Can you reproduce my bug ?

@SiddarthNarayanan01
Copy link

Yep I can reproduce your bug when running with structured output. Not sure if this is the source of the bug but when a model calls a tool, the TextPart is empty. In streaming mode it seems to just stop there and return that empty string back. That's probably why in your case you get the validation error since the model is returning an empty string when in reality you require a JSON structure.

@YanSte
Copy link
Contributor Author

YanSte commented Jan 16, 2025

I’ve decided to stop using Ollama and have returned to LMStudio. I’ve also submitted a pull request: #705.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants