fix: multiple tool calls in remote-vllm chat_completion #2161

bbrowning · 2025-05-14T14:48:29Z

What does this PR do?

This fixes an issue in how we used the tool_call_buf from streaming tool calls in the remote-vllm provider where it would end up concatenating parameters from multiple different tool call results instead of aggregating the results from each tool call separately.

It also fixes an issue found while digging into that where we were accidentally mixing the json string form of tool call parameters with the string representation of the python form, which mean we'd end up with single quotes in what should be double-quoted json strings.

Closes #1120

Test Plan

The following tests are now passing 100% for the remote-vllm provider, where some of the test_text_inference were failing before this change:

VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/inference/test_text_inference.py --text-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic"

VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/inference/test_vision_inference.py --vision-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic"

All but one of the agent tests are passing (including the multi-tool one). See the PR at vllm-project/vllm#17917 and a gist at https://gist.github.com/bbrowning/4734240ce96b4264340caa9584e47c9e for changes needed there, which will have to get made upstream in vLLM.

Agent tests:

VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/agents/test_agents.py --text-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic"

tests/integration/agents/test_agents.py

ashwinb

lgtm

This fixes an issue in how we used the tool_call_buf from streaming tool calls in the remote-vllm provider where it would end up concatenating parameters from multiple different tool call results instead of aggregating the results from each tool call separately. It also fixes an issue found while digging into that where we were accidentally mixing the json string form of tool call parameters with the string representation of the python form, which mean we'd end up with single quotes in what should be double-quoted json strings. The following tests are now passing 100% for the remote-vllm provider, where some of the test_text_inference were failing before this change: ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/inference/test_text_inference.py --text-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/inference/test_vision_inference.py --vision-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" ``` Many of the agent tests are passing, although some are failing due to bugs in vLLM's pythonic tool parser for Llama models. See the PR at vllm-project/vllm#17917 and a gist at https://gist.github.com/bbrowning/b5007709015cb2aabd85e0bd08e6d60f for changes needed there, which will have to get made upstream in vLLM. Agent tests: ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/agents/test_agents.py --text-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" ```` Signed-off-by: Ben Browning <bbrownin@redhat.com>

This updates test_agents.py a bit after testing with Llama 4 Scout and the remote-vllm provider. The main difference here is a bit more verbose prompting to encourage tool calls because Llama 4 Scout likes to reply that polyjuice is fictional and has no boiling point vs calling our custom tool unless it's prodded a bit. Also, the remote-vllm distribution doesn't use input/output shields by default so test_multi_tool_calls was adjusted to only expect the shield results if shields are in use and otherwise not check for shield usage. Note that it requires changes to the vLLM pythonic tool parser to pass these tests - those are listed at https://gist.github.com/bbrowning/4734240ce96b4264340caa9584e47c9e With this change, all of the agent tests pass with Llama 4 Scout and remote-vllm except one of the RAG tests, that looks to be an unrelated (and pre-existing) failure. ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/agents/test_agents.py --text-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>

This code was unnecessarily verbose when calculating has_input_shield and has_output_shield values, and was simplified to take advantage of Python's intrinsic treatment of empty lists and None values as false. Signed-off-by: Ben Browning <bbrownin@redhat.com>

bbrowning · 2025-05-15T01:06:07Z

Rebased on top of some conflicting vllm changes, and I re-ran all the remote-vllm unit tests as well as all test_text_inference, test_vision_inference, and test_agents with the commands outlined in the PR description, ensuring they all still pass (except that 1 agent test that was failing for remote-vllm previously, and needs a separate fix unrelated to this).

terrytangyuan · 2025-05-15T19:28:59Z

Thanks Ben!

bbrowning requested review from SLR722, ashwinb, dineshyv, dltn, ehhuang, hardikjshah, leseb, raghotham, sixianyi0721, terrytangyuan, vladimirivic and yanxi0830 as code owners May 14, 2025 14:48

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 14, 2025

ashwinb reviewed May 14, 2025

View reviewed changes

tests/integration/agents/test_agents.py Outdated Show resolved Hide resolved

ashwinb approved these changes May 14, 2025

View reviewed changes

bbrowning added 3 commits May 14, 2025 20:58

bbrowning force-pushed the remote-vllm-multi-tools branch from 44799a7 to 18c2494 Compare May 15, 2025 01:04

raghotham merged commit 10b1056 into llamastack:main May 15, 2025
23 checks passed

bbrowning deleted the remote-vllm-multi-tools branch May 15, 2025 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: multiple tool calls in remote-vllm chat_completion #2161

fix: multiple tool calls in remote-vllm chat_completion #2161

Uh oh!

bbrowning commented May 14, 2025

Uh oh!

Uh oh!

ashwinb left a comment

Uh oh!

bbrowning commented May 15, 2025

Uh oh!

Uh oh!

terrytangyuan commented May 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: multiple tool calls in remote-vllm chat_completion #2161

fix: multiple tool calls in remote-vllm chat_completion #2161

Uh oh!

Conversation

bbrowning commented May 14, 2025

What does this PR do?

Test Plan

Uh oh!

Uh oh!

ashwinb left a comment

Choose a reason for hiding this comment

Uh oh!

bbrowning commented May 15, 2025

Uh oh!

Uh oh!

terrytangyuan commented May 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants