Skip to content

[Feature]: Add support for reusable subschemas in tool requests (PydanticAI) #15035

@theobjectivedad

Description

@theobjectivedad

🚀 The feature, motivation and pitch

Currently PydanticAI clients leverage tools for structured response mapping. Consider the following tools definition in the request:

[
    {
        "type": "function",
        "function": {
            "name": "final_result",
            "description": "The final response which ends this conversation",
            "parameters": {
                "$defs": {
                    "Chapter": {
                        "properties": {
                            "chapter_name": {
                                "description": "Name the chapter",
                                "title": "Chapter Name",
                                "type": "string"
                            },
                            "content": {
                                "description": "Content of the chapter",
                                "title": "Content",
                                "type": "string"
                            }
                        },
                        "required": [
                            "chapter_name",
                            "content"
                        ],
                        "title": "Chapter",
                        "type": "object"
                    }
                },
                "properties": {
                    "title": {
                        "description": "Title of the story",
                        "title": "Title",
                        "type": "string"
                    },
                    "summary": {
                        "description": "Short summary of the story",
                        "title": "Summary",
                        "type": "string"
                    },
                    "chapters": {
                        "description": "List of chapters",
                        "items": {
                            "$ref": "#/$defs/Chapter"
                        },
                        "title": "Chapters",
                        "type": "array"
                    }
                },
                "required": [
                    "title",
                    "summary",
                    "chapters"
                ],
                "title": "Story",
                "type": "object"
            }
        }
    }
]

Here, parameters contains the reusable subschema Chapter passed under "$defs". This is a valid JSON schema as rendered by a Pydantic BaseModel, however results in a HTTP 400 error in vLLM.

Alternatives

For PydanticAI clients there are a few options available:

  • Don't use response schemas with nested BaseModels
  • Update PydanticAI such that subschemas are de-normalized before calling the vLLM completions endpoint.

Additional context

Code to Reproduce

The PydanticAI agent example below can reproduce the issue. Note that it needs to be run against PR13483 to work properly with PydanticAI:

class Chapter(BaseModel):
    chapter_name: str = Field(..., description="Name the chapter")
    content: str = Field(..., description="Content of the chapter")

class Story(BaseModel):
    title: str = Field(..., description="Title of the story")
    summary: str = Field(..., description="Short summary of the story")
    chapters: List[Chapter] = Field(..., description="List of chapters")

# Create a PydanticAI agent
agent = Agent(
    name="test_tools3_agent",
    model=llm,
    system_prompt="You are a creative novelist and helpful assistant.",
)

# Fails with error described.
result = await agent.run(
    "Generate a short story about cats.",
    result_type=Story,
)

LOG.info("Results: %s", result.data)

If I change chapters: List[Chapter] to chapters: List[str] the example runs perfectly. since there is no subschema passed.

vLLM Startup Command

Note that I am running the version of vLLM from PR13483. Basically this PR adds support for tool_choice=required which is needed for PydanticAI. This is my vLLM run command:

docker run -it -d \
    --name=eleanor-vLLM \
    --restart=unless-stopped \
    --shm-size=15g \
    --ulimit memlock=-1 \
    --ipc=host \
    --entrypoint=python3 \
    --gpus="device=0,1,2,3" \
    --publish=7800:8000 \
    --volume=/models:/models:ro \
    --health-cmd=timeout 5 bash -c 'cat < /dev/null > /dev/tcp/localhost/8000' \
    --health-start-period=240s \
    --health-interval=15s \
    --health-timeout=8s \
    --health-retries=3 \
    --env=OMP_NUM_THREADS=1 \
    harbor.k8s.wm.k8slab/eleanor-ai/vllm-openai:tool-req-patch \
        -m vllm.entrypoints.openai.api_server \
        --model /models/Llama-3.3-70B-Instruct \
        --served-model-name Llama-3.3-70B-Instruct \
        --response-role auto \
        --load-format safetensors \
        --tokenizer-mode auto \
        --enable-chunked-prefill=True \
        --max-num-batched-tokens=4096 \
        --dtype bfloat16 \
        --kv-cache-dtype auto \
        --gpu-memory-utilization 0.90 \
        --enable-auto-tool-choice \
        --tool-call-parser llama3_json \
        --enable-prefix-caching \
        --device=cuda \
        --task=generate \
        --scheduler-delay-factor=0.25 \
        --uvicorn-log-level=debug \
        --distributed-executor-backend=mp \
        --max-logprobs=100 \
        --enable-prompt-tokens-details \
        --generation-config=auto \
        --override-generation-config={"logprobs": 1} \
        --guided-decoding-backend=outlines \
        --disable_custom_all_reduce \
        --max-model-len 65535 \
        --tensor-parallel-size 4 \
        --port 8000 \
        --host 0.0.0.0

vLLM Logs

Request:

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nEnvironment: ipython\nCutting Knowledge Date: December 2023\nToday Date: 26 Jul 2024\n\nYou are a creative novelist and helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nGiven the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.\n\nRespond in the format {"name": function name, "parameters": dictionary of argument name and its value}.Do not use variables.\n\n{\n    "type": "function",\n    "function": {\n        "name": "final_result",\n        "description": "The final response which ends this conversation",\n        "parameters": {\n            "$defs": {\n                "Chapter": {\n                    "properties": {\n                        "chapter_name": {\n                            "description": "Name the chapter",\n                            "title": "Chapter Name",\n                            "type": "string"\n                        },\n                        "content": {\n                            "description": "Content of the chapter",\n                            "title": "Content",\n                            "type": "string"\n                        }\n                    },\n                    "required": [\n                        "chapter_name",\n                        "content"\n                    ],\n                    "title": "Chapter",\n                    "type": "object"\n                }\n            },\n            "properties": {\n                "title": {\n                    "description": "Title of the story",\n                    "title": "Title",\n                    "type": "string"\n                },\n                "summary": {\n                    "description": "Short summary of the story",\n                    "title": "Summary",\n                    "type": "string"\n                },\n                "chapters": {\n                    "description": "List of chapters",\n                    "items": {\n                        "$ref": "#/$defs/Chapter"\n                    },\n                    "title": "Chapters",\n                    "type": "array"\n                }\n            },\n            "required": [\n                "title",\n                "summary",\n                "chapters"\n            ],\n            "title": "Story",\n            "type": "object"\n        }\n    }\n}\n\nGenerate a short story about cats.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=12000, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=GuidedDecodingParams(json={'type': 'array', 'minItems': 1, 'items': {'type': 'object', 'anyOf': [{'properties': {'name': {'type': 'string', 'enum': ['final_result']}, 'parameters': {'$defs': {'Chapter': {'properties': {'chapter_name': {'description': 'Name the chapter', 'title': 'Chapter Name', 'type': 'string'}, 'content': {'description': 'Content of the chapter', 'title': 'Content', 'type': 'string'}}, 'required': ['chapter_name', 'content'], 'title': 'Chapter', 'type': 'object'}}, 'properties': {'title': {'description': 'Title of the story', 'title': 'Title', 'type': 'string'}, 'summary': {'description': 'Short summary of the story', 'title': 'Summary', 'type': 'string'}, 'chapters': {'description': 'List of chapters', 'items': {'$ref': '#/$defs/Chapter'}, 'title': 'Chapters', 'type': 'array'}}, 'required': ['title', 'summary', 'chapters'], 'title': 'Story', 'type': 'object'}}, 'required': ['name', 'parameters']}]}}, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None), extra_args=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
INFO 03-18 12:20:23 [async_llm_engine.py:549] Building guided decoding logits processor. guided_decoding: GuidedDecodingParams(json={'type': 'array', 'minItems': 1, 'items': {'type': 'object', 'anyOf': [{'properties': {'name': {'type': 'string', 'enum': ['final_result']}, 'parameters': {'$defs': {'Chapter': {'properties': {'chapter_name': {'description': 'Name the chapter', 'title': 'Chapter Name', 'type': 'string'}, 'content': {'description': 'Content of the chapter', 'title': 'Content', 'type': 'string'}}, 'required': ['chapter_name', 'content'], 'title': 'Chapter', 'type': 'object'}}, 'properties': {'title': {'description': 'Title of the story', 'title': 'Title', 'type': 'string'}, 'summary': {'description': 'Short summary of the story', 'title': 'Summary', 'type': 'string'}, 'chapters': {'description': 'List of chapters', 'items': {'$ref': '#/$defs/Chapter'}, 'title': 'Chapters', 'type': 'array'}}, 'required': ['title', 'summary', 'chapters'], 'title': 'Story', 'type': 'object'}}, 'required': ['name', 'parameters']}]}}, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None)
INFO:     172.17.0.1:21179 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions