[Feature]: Add support for reusable subschemas in tool requests (PydanticAI)

### 🚀 The feature, motivation and pitch

Currently PydanticAI clients leverage tools for structured response mapping. Consider the following ``tools`` definition in the request:

```json
[
    {
        "type": "function",
        "function": {
            "name": "final_result",
            "description": "The final response which ends this conversation",
            "parameters": {
                "$defs": {
                    "Chapter": {
                        "properties": {
                            "chapter_name": {
                                "description": "Name the chapter",
                                "title": "Chapter Name",
                                "type": "string"
                            },
                            "content": {
                                "description": "Content of the chapter",
                                "title": "Content",
                                "type": "string"
                            }
                        },
                        "required": [
                            "chapter_name",
                            "content"
                        ],
                        "title": "Chapter",
                        "type": "object"
                    }
                },
                "properties": {
                    "title": {
                        "description": "Title of the story",
                        "title": "Title",
                        "type": "string"
                    },
                    "summary": {
                        "description": "Short summary of the story",
                        "title": "Summary",
                        "type": "string"
                    },
                    "chapters": {
                        "description": "List of chapters",
                        "items": {
                            "$ref": "#/$defs/Chapter"
                        },
                        "title": "Chapters",
                        "type": "array"
                    }
                },
                "required": [
                    "title",
                    "summary",
                    "chapters"
                ],
                "title": "Story",
                "type": "object"
            }
        }
    }
]
```

Here, ``parameters`` contains the reusable subschema ``Chapter`` passed under ``"$defs"``. This is a valid JSON schema as rendered by a Pydantic ``BaseModel``, however results in a HTTP 400 error in vLLM. 



### Alternatives

For PydanticAI clients there are a few options available:

* Don't use response schemas with nested ``BaseModels``
* Update PydanticAI such that subschemas are de-normalized before calling the vLLM completions endpoint.

### Additional context

## Code to Reproduce

The PydanticAI agent example below can reproduce the issue. Note that it needs to be run against [PR13483](https://github.com/vllm-project/vllm/pull/13483) to work properly with PydanticAI:

```python
class Chapter(BaseModel):
    chapter_name: str = Field(..., description="Name the chapter")
    content: str = Field(..., description="Content of the chapter")

class Story(BaseModel):
    title: str = Field(..., description="Title of the story")
    summary: str = Field(..., description="Short summary of the story")
    chapters: List[Chapter] = Field(..., description="List of chapters")

# Create a PydanticAI agent
agent = Agent(
    name="test_tools3_agent",
    model=llm,
    system_prompt="You are a creative novelist and helpful assistant.",
)

# Fails with error described.
result = await agent.run(
    "Generate a short story about cats.",
    result_type=Story,
)

LOG.info("Results: %s", result.data)
```

If I change ``chapters: List[Chapter]`` to ``chapters: List[str]`` the example runs perfectly. since there is no subschema passed.

## vLLM Startup Command

Note that I am running the version of vLLM from [PR13483](https://github.com/vllm-project/vllm/pull/13483). Basically this PR adds support for ``tool_choice=required`` which is needed for PydanticAI. This is my vLLM run command:

```shell
docker run -it -d \
    --name=eleanor-vLLM \
    --restart=unless-stopped \
    --shm-size=15g \
    --ulimit memlock=-1 \
    --ipc=host \
    --entrypoint=python3 \
    --gpus="device=0,1,2,3" \
    --publish=7800:8000 \
    --volume=/models:/models:ro \
    --health-cmd=timeout 5 bash -c 'cat < /dev/null > /dev/tcp/localhost/8000' \
    --health-start-period=240s \
    --health-interval=15s \
    --health-timeout=8s \
    --health-retries=3 \
    --env=OMP_NUM_THREADS=1 \
    harbor.k8s.wm.k8slab/eleanor-ai/vllm-openai:tool-req-patch \
        -m vllm.entrypoints.openai.api_server \
        --model /models/Llama-3.3-70B-Instruct \
        --served-model-name Llama-3.3-70B-Instruct \
        --response-role auto \
        --load-format safetensors \
        --tokenizer-mode auto \
        --enable-chunked-prefill=True \
        --max-num-batched-tokens=4096 \
        --dtype bfloat16 \
        --kv-cache-dtype auto \
        --gpu-memory-utilization 0.90 \
        --enable-auto-tool-choice \
        --tool-call-parser llama3_json \
        --enable-prefix-caching \
        --device=cuda \
        --task=generate \
        --scheduler-delay-factor=0.25 \
        --uvicorn-log-level=debug \
        --distributed-executor-backend=mp \
        --max-logprobs=100 \
        --enable-prompt-tokens-details \
        --generation-config=auto \
        --override-generation-config={"logprobs": 1} \
        --guided-decoding-backend=outlines \
        --disable_custom_all_reduce \
        --max-model-len 65535 \
        --tensor-parallel-size 4 \
        --port 8000 \
        --host 0.0.0.0
```

## vLLM Logs

Request:

```text
'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nEnvironment: ipython\nCutting Knowledge Date: December 2023\nToday Date: 26 Jul 2024\n\nYou are a creative novelist and helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nGiven the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.\n\nRespond in the format {"name": function name, "parameters": dictionary of argument name and its value}.Do not use variables.\n\n{\n    "type": "function",\n    "function": {\n        "name": "final_result",\n        "description": "The final response which ends this conversation",\n        "parameters": {\n            "$defs": {\n                "Chapter": {\n                    "properties": {\n                        "chapter_name": {\n                            "description": "Name the chapter",\n                            "title": "Chapter Name",\n                            "type": "string"\n                        },\n                        "content": {\n                            "description": "Content of the chapter",\n                            "title": "Content",\n                            "type": "string"\n                        }\n                    },\n                    "required": [\n                        "chapter_name",\n                        "content"\n                    ],\n                    "title": "Chapter",\n                    "type": "object"\n                }\n            },\n            "properties": {\n                "title": {\n                    "description": "Title of the story",\n                    "title": "Title",\n                    "type": "string"\n                },\n                "summary": {\n                    "description": "Short summary of the story",\n                    "title": "Summary",\n                    "type": "string"\n                },\n                "chapters": {\n                    "description": "List of chapters",\n                    "items": {\n                        "$ref": "#/$defs/Chapter"\n                    },\n                    "title": "Chapters",\n                    "type": "array"\n                }\n            },\n            "required": [\n                "title",\n                "summary",\n                "chapters"\n            ],\n            "title": "Story",\n            "type": "object"\n        }\n    }\n}\n\nGenerate a short story about cats.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=12000, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=GuidedDecodingParams(json={'type': 'array', 'minItems': 1, 'items': {'type': 'object', 'anyOf': [{'properties': {'name': {'type': 'string', 'enum': ['final_result']}, 'parameters': {'$defs': {'Chapter': {'properties': {'chapter_name': {'description': 'Name the chapter', 'title': 'Chapter Name', 'type': 'string'}, 'content': {'description': 'Content of the chapter', 'title': 'Content', 'type': 'string'}}, 'required': ['chapter_name', 'content'], 'title': 'Chapter', 'type': 'object'}}, 'properties': {'title': {'description': 'Title of the story', 'title': 'Title', 'type': 'string'}, 'summary': {'description': 'Short summary of the story', 'title': 'Summary', 'type': 'string'}, 'chapters': {'description': 'List of chapters', 'items': {'$ref': '#/$defs/Chapter'}, 'title': 'Chapters', 'type': 'array'}}, 'required': ['title', 'summary', 'chapters'], 'title': 'Story', 'type': 'object'}}, 'required': ['name', 'parameters']}]}}, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None), extra_args=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
INFO 03-18 12:20:23 [async_llm_engine.py:549] Building guided decoding logits processor. guided_decoding: GuidedDecodingParams(json={'type': 'array', 'minItems': 1, 'items': {'type': 'object', 'anyOf': [{'properties': {'name': {'type': 'string', 'enum': ['final_result']}, 'parameters': {'$defs': {'Chapter': {'properties': {'chapter_name': {'description': 'Name the chapter', 'title': 'Chapter Name', 'type': 'string'}, 'content': {'description': 'Content of the chapter', 'title': 'Content', 'type': 'string'}}, 'required': ['chapter_name', 'content'], 'title': 'Chapter', 'type': 'object'}}, 'properties': {'title': {'description': 'Title of the story', 'title': 'Title', 'type': 'string'}, 'summary': {'description': 'Short summary of the story', 'title': 'Summary', 'type': 'string'}, 'chapters': {'description': 'List of chapters', 'items': {'$ref': '#/$defs/Chapter'}, 'title': 'Chapters', 'type': 'array'}}, 'required': ['title', 'summary', 'chapters'], 'title': 'Story', 'type': 'object'}}, 'required': ['name', 'parameters']}]}}, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None)
INFO:     172.17.0.1:21179 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Add support for reusable subschemas in tool requests (PydanticAI) #15035

🚀 The feature, motivation and pitch

Alternatives

Additional context

Code to Reproduce

vLLM Startup Command

vLLM Logs

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Add support for reusable subschemas in tool requests (PydanticAI) #15035

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Code to Reproduce

vLLM Startup Command

vLLM Logs

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions