Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

result_type as List #523

Closed
fils opened this issue Dec 22, 2024 · 8 comments
Closed

result_type as List #523

fils opened this issue Dec 22, 2024 · 8 comments
Labels
question Further information is requested

Comments

@fils
Copy link

fils commented Dec 22, 2024

I don't know if this is related to #242 or not.

I am trying to replicate this from the Ollama examples (ref: https://ollama.com/blog/structured-outputs)

from openai import OpenAI
import openai
from pydantic import BaseModel

client = OpenAI(base_url="http://192.168.202.137:11434/v1", api_key="ollama")

class Pet(BaseModel):
    name: str
    animal: str
    age: int
    color: str | None
    favorite_toy: str | None

class PetList(BaseModel):
    pets: list[Pet]

try:
    completion = client.beta.chat.completions.parse(
        temperature=0,
        model='llama3.2:3b',
        messages=[
            {"role": "user", "content": '''
                I have two pets.
                A cat named Luna who is 5 years old and loves playing with yarn. She has grey fur.
                I also have a 2 year old black cat named Loki who loves tennis balls.
            '''}
        ],
        response_format=PetList,
    )

    pet_response = completion.choices[0].message
    if pet_response.parsed:
        print(pet_response.parsed)
    elif pet_response.refusal:
        print(pet_response.refusal)
except Exception as e:
    if type(e) == openai.LengthFinishReasonError:
        print("Too many tokens: ", e)
        pass
    else:
        print(e)
        pass

In pydantic-al I try

from pydantic import BaseModel
from pydantic_ai import Agent
from pydantic_ai.models.ollama import OllamaModel

ollama_model = OllamaModel(
    model_name='llama3.2:3b',
    base_url='http://192.168.202.137:11434/v1'
)

class Pet(BaseModel):
  name: str
  animal: str
  age: int
  color: str | None
  favorite_toy: str | None

class PetList(BaseModel):
  pets: list[Pet]


# # Create a system prompt to guide the model
SYSTEM_PROMPT = """
You are a helper that extracts pet information from text and formats it as a list.
For each pet mentioned, extract:
- name
- animal type
- age
- color (if mentioned)
- favorite toy (if mentioned)

Format as a JSON object with a 'pets' array containing each pet's details.
"""

agent3 = Agent(model=ollama_model, result_type=PetList, retries=3)
result3 = agent3.run_sync(('I have two pets. A cat named Luna who is 5 years old'
                           ' and loves playing with yarn. She has grey fur. I also '
                           'have a 2 year old black cat named Loki who loves tennis balls.'))
pet_data = result3.data
print(pet_data)

passing PetList to the result_type. This will fail.

If I just pass Pet, ie, result_type=Pet it will sometimes work, getting only 1 cat of course, but also fail sometimes.

Any guidance on how to address this would be appreciated.

@IsaaacD
Copy link

IsaaacD commented Dec 22, 2024

I was getting a similar error with langchain_ollama.ChatOllama and thought it was something underlying with the code. But looking into your issue i made a few changes and got output:

  1. SYSTEM_PROMPT wasn't assigned to Agent:
    • agent3 = Agent(model=ollama_model, result_type=PetList, retries=3, system_prompt=SYSTEM_PROMPT)
  2. SYSTEM_PROMPT mentioned "animal type" that was throwing an exception when trying to map to animal property so I changed SYSTEM_PROMPT as follows
    •    SYSTEM_PROMPT = """
         You are a helper that extracts pet information from text and formats it as a list.
         For each pet mentioned, extract:
         - name
         - animal type
         - age
         - color (if mentioned)
         - favorite toy (if mentioned)
         """
  3. Pet changed animal to animal_type
  4. Used a different model (didn't have llama3.2:3b installed, probably shouldn't be an issue)
    • EDIT: Tried with llama3.2:latest and it fails, not sure why but throws Exception has occurred: UnexpectedModelBehavior Exceeded maximum retries (3) for result validation
  5. Changed the prompt to a different string syntax, probably shouldn't matter
  6. Results in pets=[Pet(name='Luna', animal_type='cat', age=5, color='grey', favorite_toy='yarn'), Pet(name='Loki', animal_type='cat', age=2, color='black', favorite_toy='tennis balls')]

Final working code

from pydantic import BaseModel
from pydantic_ai import Agent
from pydantic_ai.models.ollama import OllamaModel

ollama_model = OllamaModel(
    model_name="qwen2.5-coder:14b",
)

class Pet(BaseModel):
  name: str| None
  animal_type: str| None
  age: int| None
  color: str | None
  favorite_toy: str | None

class PetList(BaseModel):
  pets: list[Pet]


# # Create a system prompt to guide the model
SYSTEM_PROMPT = """
You are a helper that extracts pet information from text and formats it as a list.
For each pet mentioned, extract:
- name
- animal type
- age
- color (if mentioned)
- favorite toy (if mentioned)
"""

agent3 = Agent(model=ollama_model, result_type=PetList, retries=3, system_prompt=SYSTEM_PROMPT)
result3 = agent3.run_sync('I have two pets. A cat named Luna who is 5 years old and loves playing with yarn. She has grey fur. I also have a 2 year old black cat named Loki who loves tennis balls.')
pet_data = result3.data
print(pet_data)

@sydney-runkle
Copy link
Member

Thanks for the answer @IsaaacD! Marking as resolved :)

@sydney-runkle sydney-runkle added the question Further information is requested label Dec 23, 2024
@fils
Copy link
Author

fils commented Dec 23, 2024

@IsaaacD just wanted to say thanks..

Also can confirm this code still fails with llama3.2:3b but works with "qwen2.5-coder:14b" as you noted.

Appreciate your response and the update and usage approach with the system prompt too. I have no issue using qwen for this work.

Still, the Ollama library will succeed with llama3.2:3b. So there still must be some underlying combination of model and support for functions or tool chains in here somewhere to account for the difference in behavior with the llama3.2 model between the Ollama library and pydantic-ai.

@IsaaacD
Copy link

IsaaacD commented Dec 23, 2024

I looked into it a bit more and added some debug options to trace what llama3.2 is doing. It appears llama3.2 is returning something that looks like valid JSON but PydanticAI is rejecting it or parsing it wrong? I'll put the changes I made and the list the ouput afterwards.

I changed the top to import these and add debug, switched back to llama3.2:

from devtools import debug
import logfire
from logging import basicConfig
from langchain_core.globals import set_debug
set_debug(True)
logfire.configure(send_to_logfire='if-token-present')
logfire.ConsoleOptions.min_log_level ='trace'
logfire.ConsoleOptions.verbose = True
basicConfig(handlers=[logfire.LogfireLoggingHandler()])
ollama_model = OllamaModel(
    model_name="llama3.2:latest",
)

Then around the call I added a try/except flow to print out the results:

agent3 = Agent(model=ollama_model, result_type=PetList, retries=3, system_prompt=SYSTEM_PROMPT)
try:
  result3 = agent3.run_sync('I have two pets. A cat named Luna who is 5 years old and loves playing with yarn. She has grey fur. I also have a 2 year old black cat named Loki who loves tennis balls.')
  pet_data = result3.data
  print(pet_data)
except:
  print("Error")
finally:
  debug(agent3.last_run_messages)

And then finally the debug(agent3.last_run_Messages) produces this:

pet_examplel.py:41 <module>
    agent3.last_run_messages: [
        SystemPrompt(
            content=(
                '\n'
                'You are a helper that extracts pet information from text and formats it as a list.\n'
                'For each pet mentioned, extract:\n'
                '- name\n'
                '- animal type\n'
                '- age\n'
                '- color (if mentioned)\n'
                '- favorite toy (if mentioned)\n'
            ),
            role='system',
        ),
        UserPrompt(
            content=(
                'I have two pets. A cat named Luna who is 5 years old and loves playing with yarn. She has grey fur. I'
                ' also have a 2 year old black cat named Loki who loves tennis balls.'
            ),
            timestamp=datetime.datetime(2024, 12, 23, 14, 55, 5, 637176, tzinfo=datetime.timezone.utc),
            role='user',
        ),
        ModelStructuredResponse(
            calls=[
                ToolCall(
                    tool_name='final_result',
                    args=ArgsJson(
                        args_json=(
                            '{"pets":"[{\\"name\\": \\"Luna\\", \\"animal_type\\": \\"cat\\", \\"age\\": \\"5\\", \\"color\\": \\"gre'
                            'y\\", \\"favorite toy\\": \\"yarn\\"}, {\\"name\\": \\"Loki\\", \\"animal_type\\": \\"cat\\", \\"age\\":'
                            ' \\"2\\", \\"color\\": \\"black\\", \\"favorite toy\\": \\"tennis balls\\"}]"}'
                        ),
                    ),
                    tool_id='call_kiakrf40',
                ),
            ],
            timestamp=datetime.datetime(2024, 12, 23, 14, 55, 6, tzinfo=datetime.timezone.utc),
            role='model-structured-response',
        ),
        RetryPrompt(
            content=[
                {
                    'type': 'list_type',
                    'loc': ('pets',),
                    'msg': 'Input should be a valid array',
                    'input': (
                        '[{"name": "Luna", "animal_type": "cat", "age": "5", "color": "grey", "favorite toy": "yarn"},'
                        ' {"name": "Loki", "animal_type": "cat", "age": "2", "color": "black", "favorite toy": "tennis'
                        ' balls"}]'
                    ),
                },
            ],
            tool_name='final_result',
            tool_id='call_kiakrf40',
            timestamp=datetime.datetime(2024, 12, 23, 14, 55, 7, 871995, tzinfo=datetime.timezone.utc),
            role='retry-prompt',
        ),
        ModelTextResponse(
            content=(
                'Here is the answer to your question:\n'
                '\n'
                '**Your Pets:**\n'
                '\n'
                '1. Luna (5 years old) - grey cat\n'
                '\t* Favorite Toy: Yarn\n'
                '2. Loki (2 years old) - black cat\n'
                '\t* Favorite Toy: Tennis Balls'
            ),
            timestamp=datetime.datetime(2024, 12, 23, 14, 55, 8, tzinfo=datetime.timezone.utc),
            role='model-text-response',
        ),
        RetryPrompt(
            content='Plain text responses are not permitted, please call one of the functions instead.',
            tool_name=None,
            tool_id=None,
            timestamp=datetime.datetime(2024, 12, 23, 14, 55, 8, 287356, tzinfo=datetime.timezone.utc),
            role='retry-prompt',
        ),
        ModelStructuredResponse(
            calls=[
                ToolCall(
                    tool_name='final_result',
                    args=ArgsJson(
                        args_json=(
                            '{"pets":"[{\'name\': \'Luna\', \'animal_type\': \'cat\', \'age\': \'5\', \'color\': \'grey\', \'favorite t'
                            "oy': 'yarn'}, {'name': 'Loki', 'animal_type': 'cat', 'age': '2', 'color': 'black', 'favor"
                            'ite toy\': \'tennis balls\'}]"}'
                        ),
                    ),
                    tool_id='call_ssu65p8z',
                ),
            ],
            timestamp=datetime.datetime(2024, 12, 23, 14, 55, 8, tzinfo=datetime.timezone.utc),
            role='model-structured-response',
        ),
        RetryPrompt(
            content=[
                {
                    'type': 'list_type',
                    'loc': ('pets',),
                    'msg': 'Input should be a valid array',
                    'input': (
                        "[{'name': 'Luna', 'animal_type': 'cat', 'age': '5', 'color': 'grey', 'favorite toy': 'yarn'},"
                        " {'name': 'Loki', 'animal_type': 'cat', 'age': '2', 'color': 'black', 'favorite toy': 'tennis"
                        " balls'}]"
                    ),
                },
            ],
            tool_name='final_result',
            tool_id='call_ssu65p8z',
            timestamp=datetime.datetime(2024, 12, 23, 14, 55, 8, 905979, tzinfo=datetime.timezone.utc),
            role='retry-prompt',
        ),
        ModelTextResponse(
            content=(
                'def format_pet_info(pets):\n'
                '    formatted_pets = []\n'
                '    for pet in pets:\n'
                '        name = pet["name"]\n'
                '        animal_type = pet["animal_type"]\n'
                '        age = pet["age"]\n'
                '        color = pet.get("color")\n'
                '        favorite_toys = f"{pet.get(\'favorite toy\', \'No favorite toy mentioned\')}" if pet.get(\'favorit'
                'e toy\') else "No favorite toy mentioned"\n'
                '        \n'
                '        formatted_pet = f"{name} ({age} years old)"\n'
                '        if color:\n'
                '            formatted_pet += f" - {color}"\n'
                '        \n'
                '        formatted_pets.append(formatted_pet)\n'
                '    \n'
                '    return formatted_pets\n'
                '\n'
                'pets = [\n'
                "    {'name': 'Luna', 'animal_type': 'cat', 'age': '5', 'color': 'grey', 'favorite toy': 'yarn'},\n"
                "    {'name': 'Loki', 'animal_type': 'cat', 'age': '2', 'color': 'black', 'favorite toy': 'tennis ball"
                "s'}\n"
                ']\n'
                '\n'
                'print(format_pet_info(pets))'
            ),
            timestamp=datetime.datetime(2024, 12, 23, 14, 55, 10, tzinfo=datetime.timezone.utc),
            role='model-text-response',
        ),
    ] (list) len=9

Note: I did have to change the output in VS Code with the following launch.json as it didn't produce a full output without setting "PYTHONUNBUFFERED": "0":

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "testpython.py",
            "type": "debugpy",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal",
            "env": {
                "PYTHONUNBUFFERED": "0"
            }
        }
    ],
    
}

It looks like the last couple of prompts should've been right proper JSON (the terminal output is word-wrapped so I don't think that's an issue). Might be something in the JSON parser or the escape characters llama3.2 is producing. I'll run this again right now with Qwen to compare.

@IsaaacD
Copy link

IsaaacD commented Dec 23, 2024

This is the resulting output from Qwen:

pet_examplel.py:49 <module>
    agent3.last_run_messages: [
        SystemPrompt(
            content=(
                '\n'
                'You are a helper that extracts pet information from text and formats it as a list.\n'
                'For each pet mentioned, extract:\n'
                '- name\n'
                '- animal type\n'
                '- age\n'
                '- color (if mentioned)\n'
                '- favorite toy (if mentioned)\n'
            ),
            role='system',
        ),
        UserPrompt(
            content=(
                'I have two pets. A cat named Luna who is 5 years old and loves playing with yarn. She has grey fur. I'
                ' also have a 2 year old black cat named Loki who loves tennis balls.'
            ),
            timestamp=datetime.datetime(2024, 12, 23, 15, 10, 0, 191338, tzinfo=datetime.timezone.utc),
            role='user',
        ),
        ModelStructuredResponse(
            calls=[
                ToolCall(
                    tool_name='final_result',
                    args=ArgsJson(
                        args_json=(
                            '{"pets":[{"age":"5 years old","animal_type":"cat","color":"grey","favorite_toy":"yarn","n'
                            'ame":"Luna"},{"age":"2 years old","animal_type":"cat","color":"black","favorite_toy":"ten'
                            'nis balls","name":"Loki"}]}'
                        ),
                    ),
                    tool_id='call_1mb6236j',
                ),
            ],
            timestamp=datetime.datetime(2024, 12, 23, 15, 10, 35, tzinfo=datetime.timezone.utc),
            role='model-structured-response',
        ),
        RetryPrompt(
            content=[
                {
                    'type': 'int_parsing',
                    'loc': (
                        'pets',
                        0,
                        'age',
                    ),
                    'msg': 'Input should be a valid integer, unable to parse string as an integer',
                    'input': '5 years old',
                },
                {
                    'type': 'int_parsing',
                    'loc': (
                        'pets',
                        1,
                        'age',
                    ),
                    'msg': 'Input should be a valid integer, unable to parse string as an integer',
                    'input': '2 years old',
                },
            ],
            tool_name='final_result',
            tool_id='call_1mb6236j',
            timestamp=datetime.datetime(2024, 12, 23, 15, 10, 37, 427123, tzinfo=datetime.timezone.utc),
            role='retry-prompt',
        ),
        ModelStructuredResponse(
            calls=[
                ToolCall(
                    tool_name='final_result',
                    args=ArgsJson(
                        args_json=(
                            '{"pets":[{"age":5,"animal_type":"cat","color":"grey","favorite_toy":"yarn","name":"Luna"}'
                            ',{"age":2,"animal_type":"cat","color":"black","favorite_toy":"tennis balls","name":"Loki"'
                            '}]}'
                        ),
                    ),
                    tool_id='call_cb69k61m',
                ),
            ],
            timestamp=datetime.datetime(2024, 12, 23, 15, 10, 41, tzinfo=datetime.timezone.utc),
            role='model-structured-response',
        ),
        ToolReturn(
            tool_name='final_result',
            content='Final result processed.',
            tool_id='call_cb69k61m',
            timestamp=datetime.datetime(2024, 12, 23, 15, 10, 41, 200655, tzinfo=datetime.timezone.utc),
            role='tool-return',
        ),
    ] (list) len=6

The only main thing I see is that llama3.2 gives favorite toy with a space, instead of favorite_toy with underscore... Maybe if we switch to pascal case instead of snake case llama3.2 could get it? I'll try and report back 😊

EDIT: I tried with changing - favorite toy and - animal type in the user prompt to have "_" in them and it appears llama3.2 was getting it more correct, but still not adding [] around the output. And when prompted again with the retry logic it just fails to produce valid JSON. So going to try and prompt it better to show it what the output should be. There's also the fact llama doesn't wrap it in another JSON object like pets:[]

@fils
Copy link
Author

fils commented Dec 24, 2024

@IsaaacD very nice investigation. Thanks!

New to the use of LLMs in functions and workflow approaches. Is it true that the models work best in these cases when trained or fine tuned from the start to work in this manner? Combining that with good prompt engineering is what is needed to return results that parse into pydantic data structures?

I'll dig around, but I assume there is a way in pydantic to catch that a data stuct wasn't fully populated after n tries. Or maybe it's just a try / except.

Either way, appreciate your CSI skills here. I'll use those interrogate future events for sure!

@IsaaacD
Copy link

IsaaacD commented Dec 24, 2024

@fils I'm not likely the person to ask about LLMs, just a humble software developer. I do believe the training set matters, and Qwen was made to read and write code so it's what I use in my tasks.

It's very weird tho, I found a Jupyter notebook (can't find the source now, I'm on my phone but I'll share it later) where it was trying to use Grok for natural language web scraping. I re-wrote it with Qwen 2.5 and python (outside Jupyter) and was having issues with it in code. I had one scenario where I fed the data to Qwen in a terminal with ollama and it spit out the results 100% accurate with one prompt but that was only once.

I tried recreating that in code and then in the terminal afterwards and didn't have any success. That's where I came here to see about wrapping it in Pydantic AI and saw it you were experiencing something similar. Didn't know if it was a downstream dependency, but doesn't appear so.

So long story short, training data matters, you won't generate PNGs with most LLMs, and it makes sense because it wasn't trained on them. I think code is of a similar vein, if llama3.2 didn't have much code or JSON in its training data, then it makes sense that it struggles with it.

EDIT: Here's the notebook link, it's the last one where it was trying to parse the cars into a table https://github.com/curiousily/AI-Bootcamp/blob/master/20.scraping-with-llm.ipynb

@IsaaacD
Copy link

IsaaacD commented Dec 27, 2024

I was just reading the issues on the Pydantic AI repo and ran into an issue that @fils listed in his initial post that I didn't see until I saw the bottom of the thread and noticed his name. I think this is likely what's going on, so I'm not sure if there's much to expect for all use cases until something is implemented for Ollama models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants