fix: Get builtin tool calling working in remote-vllm #1236

bbrowning · 2025-02-24T20:47:34Z

What does this PR do?

This PR makes a couple of changes required to get the test tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search passing on the remote-vllm provider.

First, we adjust agent_instance to also pass in the description and parameters of builtin tools. We need these parameters so we can pass the tool's expected parameters into vLLM. The meta-reference implementations may not have needed these for builtin tools, as they are able to take advantage of the Llama-model specific support for certain builtin tools. However, with vLLM, our server-side chat templates for tool calling treat all tools the same and don't separate out Llama builtin vs custom tools. So, we need to pass the full set of parameter definitions and list of required parameters for builtin tools as well.

Next, we adjust the vllm streaming chat completion code to fix up some edge cases where it was returning an extra ChatCompletionResponseEvent with an empty ToolCall with empty string call_id, tool_name, and arguments properties. This is a bug discovered after the above fix, where after a successful tool invocation we were sending extra chunks back to the client with these empty ToolCalls.

Test Plan

With these changes, the following test that previously failed now passes:

VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
LLAMA_STACK_CONFIG=remote-vllm \
python -m pytest -v \
tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search \
--inference-model "meta-llama/Llama-3.2-3B-Instruct"

Additionally, I ran the remote-vllm client-sdk and provider inference tests as below to ensure they all still passed with this change:

VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
LLAMA_STACK_CONFIG=remote-vllm \
python -m pytest -v \
tests/client-sdk/inference/test_text_inference.py \
--inference-model "meta-llama/Llama-3.2-3B-Instruct"

VLLM_URL="http://localhost:8000/v1" \
python -m pytest -s -v \
llama_stack/providers/tests/inference/test_text_inference.py \
--providers "inference=vllm_remote"

This PR makes a couple of changes required to get the test `tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search` passing on the remote-vllm provider. First, we adjust agent_instance to also pass in the description and parameters of builtin tools. We need these parameters so we can pass the tool's expected parameters into vLLM. The meta-reference implementations may not have needed these for builtin tools, as they are able to take advantage of the Llama-model specific support for certain builtin tools. However, with vLLM, our server-side chat templates for tool calling treat all tools the same and don't separate out Llama builtin vs custom tools. So, we need to pass the full set of parameter definitions and list of required parameters for builtin tools as well. Next, we adjust the streaming chat completion code to fix up some edge cases where it was returning an extra ChatCompletionResponseEvent with an empty ToolCall with empty string call_id, tool_name, and arguments properties. This is a bug discovered after the above fix, where after a successful tool invocation we were sending extra chunks back to the client with these empty ToolCalls. With these changes, the following test that previously failed now passes: ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v \ tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search \ --inference-model "meta-llama/Llama-3.2-3B-Instruct" ``` Additionally, I ran the remote-vllm client-sdk and provider inference tests as below to ensure they all still passed with this change: ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v \ tests/client-sdk/inference/test_text_inference.py \ --inference-model "meta-llama/Llama-3.2-3B-Instruct" ``` ``` VLLM_URL="http://localhost:8000/v1" \ python -m pytest -s -v \ llama_stack/providers/tests/inference/test_text_inference.py \ --providers "inference=vllm_remote" ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>

terrytangyuan · 2025-02-24T21:18:08Z

llama_stack/providers/inline/agents/meta_reference/agent_instance.py

                        raise ValueError(f"Tool {built_in_type} already exists")

-                    tool_def_map[built_in_type] = ToolDefinition(tool_name=built_in_type)
+                    tool_def_map[built_in_type] = ToolDefinition(


@ashwinb @yanxi0830 Could you help double check on this change?

I'm not sure if having additional fields here would break things (seems not?), but if not, looks like we can just set the key value here for tool_def_map then L926 below should take care fo this.

yeah we can unify this with the code below as @ehhuang suggests.

Absolutely - I made the simplest possible change here to minimize lines changed, but will make a follow-up chore PR to clean up the duplication here.

terrytangyuan

Merging this since we’ve verified the fix. If there is any potential improvement, we can have follow up PRs

This gracefully handles the case where the vLLM server responded to a completion request with no choices, which can happen in certain vLLM error situations. Previously, we'd error out with a stack trace about a list index out of range. Now, we just log a warning to the user and move past any chunks with an empty choices list. A specific example of the type of stack trace this fixes: ``` File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 170, in _process_vllm_chat_completion_stream_response choice = chunk.choices[0] ~~~~~~~~~~~~~^^^ IndexError: list index out of range ``` Now, instead of erroring out with that stack trace, we log a warning that vLLM failed to generate any completions and alert the user to check the vLLM server logs for details. This is related to llamastack#1277 and addresses the stack trace shown in that issue, although does not in and of itself change the functional behavior of vLLM tool calling. As part of this fix, I added new unit tests to trigger this same error and verify it no longer happens. That is `test_process_vllm_chat_completion_stream_response_no_choices` in the new `tests/unit/providers/inference/test_remote_vllm.py`. I also added a couple of more tests to trigger and verify the last couple of remote vllm provider bug fixes - specifically a test for llamastack#1236 (builtin tool calling) and llamastack#1325 (vLLM <= v0.6.3). This required fixing the signature of `_process_vllm_chat_completion_stream_response` to accept the actual type of chunks it was getting passed - specifically changing from our openai_compat `OpenAICompatCompletionResponse` to `openai.types.chat.chat_completion_chunk.ChatCompletionChunk`. It was not actually getting passed `OpenAICompatCompletionResponse` objects before, and was using attributes that didn't exist on those objects. So, the signature now matches the type of object it's actually passed. Run these new unit tests like this: ``` pytest tests/unit/providers/inference/test_remote_vllm.py ``` Additionally, I ensured the existing `test_text_inference.py` tests passed via: ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/inference/test_text_inference.py --inference-model "meta-llama/Llama-3.2-3B-Instruct" --vision-inference-model "" ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>

# What does this PR do? This gracefully handles the case where the vLLM server responded to a completion request with no choices, which can happen in certain vLLM error situations. Previously, we'd error out with a stack trace about a list index out of range. Now, we just log a warning to the user and move past any chunks with an empty choices list. A specific example of the type of stack trace this fixes: ``` File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 170, in _process_vllm_chat_completion_stream_response choice = chunk.choices[0] ~~~~~~~~~~~~~^^^ IndexError: list index out of range ``` Now, instead of erroring out with that stack trace, we log a warning that vLLM failed to generate any completions and alert the user to check the vLLM server logs for details. This is related to #1277 and addresses the stack trace shown in that issue, although does not in and of itself change the functional behavior of vLLM tool calling. ## Test Plan As part of this fix, I added new unit tests to trigger this same error and verify it no longer happens. That is `test_process_vllm_chat_completion_stream_response_no_choices` in the new `tests/unit/providers/inference/test_remote_vllm.py`. I also added a couple of more tests to trigger and verify the last couple of remote vllm provider bug fixes - specifically a test for #1236 (builtin tool calling) and #1325 (vLLM <= v0.6.3). This required fixing the signature of `_process_vllm_chat_completion_stream_response` to accept the actual type of chunks it was getting passed - specifically changing from our openai_compat `OpenAICompatCompletionResponse` to `openai.types.chat.chat_completion_chunk.ChatCompletionChunk`. It was not actually getting passed `OpenAICompatCompletionResponse` objects before, and was using attributes that didn't exist on those objects. So, the signature now matches the type of object it's actually passed. Run these new unit tests like this: ``` pytest tests/unit/providers/inference/test_remote_vllm.py ``` Additionally, I ensured the existing `test_text_inference.py` tests passed via: ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v tests/integration/inference/test_text_inference.py \ --inference-model "meta-llama/Llama-3.2-3B-Instruct" \ --vision-inference-model "" ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>

bbrowning requested review from ashwinb, dineshyv, dltn, ehhuang, hardikjshah, raghotham, sixianyi0721, terrytangyuan, vladimirivic and yanxi0830 as code owners February 24, 2025 20:47

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 24, 2025

terrytangyuan reviewed Feb 24, 2025

View reviewed changes

terrytangyuan approved these changes Feb 26, 2025

View reviewed changes

terrytangyuan merged commit c64f0d5 into llamastack:main Feb 26, 2025
3 checks passed

bbrowning mentioned this pull request Mar 5, 2025

fix: Gracefully handle no choices in remote vLLM response #1424

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Get builtin tool calling working in remote-vllm #1236

fix: Get builtin tool calling working in remote-vllm #1236

Uh oh!

bbrowning commented Feb 24, 2025

Uh oh!

terrytangyuan Feb 24, 2025

Uh oh!

ehhuang Feb 25, 2025

Uh oh!

ashwinb Feb 26, 2025

Uh oh!

bbrowning Feb 27, 2025

Uh oh!

terrytangyuan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: Get builtin tool calling working in remote-vllm #1236

fix: Get builtin tool calling working in remote-vllm #1236

Uh oh!

Conversation

bbrowning commented Feb 24, 2025

What does this PR do?

Test Plan

Uh oh!

terrytangyuan Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

ehhuang Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

ashwinb Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

bbrowning Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

terrytangyuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants