Skip to content

Conversation

bbrowning
Copy link
Collaborator

What does this PR do?

This fixes an issue in how we used the tool_call_buf from streaming tool calls in the remote-vllm provider where it would end up concatenating parameters from multiple different tool call results instead of aggregating the results from each tool call separately.

It also fixes an issue found while digging into that where we were accidentally mixing the json string form of tool call parameters with the string representation of the python form, which mean we'd end up with single quotes in what should be double-quoted json strings.

Closes #1120

Test Plan

The following tests are now passing 100% for the remote-vllm provider, where some of the test_text_inference were failing before this change:

VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/inference/test_text_inference.py --text-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic"

VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/inference/test_vision_inference.py --vision-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic"

All but one of the agent tests are passing (including the multi-tool one). See the PR at vllm-project/vllm#17917 and a gist at https://gist.github.com/bbrowning/4734240ce96b4264340caa9584e47c9e for changes needed there, which will have to get made upstream in vLLM.

Agent tests:

VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/agents/test_agents.py --text-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic"

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 14, 2025
Copy link
Contributor

@ashwinb ashwinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

bbrowning added 3 commits May 14, 2025 20:58
This fixes an issue in how we used the tool_call_buf from streaming
tool calls in the remote-vllm provider where it would end up
concatenating parameters from multiple different tool call results
instead of aggregating the results from each tool call separately.

It also fixes an issue found while digging into that where we were
accidentally mixing the json string form of tool call parameters with
the string representation of the python form, which mean we'd end up
with single quotes in what should be double-quoted json strings.

The following tests are now passing 100% for the remote-vllm provider,
where some of the test_text_inference were failing before this change:

```
VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/inference/test_text_inference.py --text-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic"

VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/inference/test_vision_inference.py --vision-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic"

```

Many of the agent tests are passing, although some are failing due to
bugs in vLLM's pythonic tool parser for Llama models. See the PR at
vllm-project/vllm#17917 and a gist at
https://gist.github.com/bbrowning/b5007709015cb2aabd85e0bd08e6d60f for
changes needed there, which will have to get made upstream in vLLM.

Agent tests:

```
VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/agents/test_agents.py --text-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic"
````

Signed-off-by: Ben Browning <bbrownin@redhat.com>
This updates test_agents.py a bit after testing with Llama 4 Scout and
the remote-vllm provider. The main difference here is a bit more
verbose prompting to encourage tool calls because Llama 4 Scout likes
to reply that polyjuice is fictional and has no boiling point vs
calling our custom tool unless it's prodded a bit.

Also, the remote-vllm distribution doesn't use input/output shields by
default so test_multi_tool_calls was adjusted to only expect the
shield results if shields are in use and otherwise not check for
shield usage.

Note that it requires changes to the vLLM pythonic tool parser to pass
these tests - those are listed at
https://gist.github.com/bbrowning/4734240ce96b4264340caa9584e47c9e

With this change, all of the agent tests pass with Llama 4 Scout and
remote-vllm except one of the RAG tests, that looks to be an
unrelated (and pre-existing) failure.

```
VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/agents/test_agents.py --text-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic"
```

Signed-off-by: Ben Browning <bbrownin@redhat.com>
This code was unnecessarily verbose when calculating has_input_shield
and has_output_shield values, and was simplified to take advantage of
Python's intrinsic treatment of empty lists and None values as false.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
@bbrowning bbrowning force-pushed the remote-vllm-multi-tools branch from 44799a7 to 18c2494 Compare May 15, 2025 01:04
@bbrowning
Copy link
Collaborator Author

Rebased on top of some conflicting vllm changes, and I re-ran all the remote-vllm unit tests as well as all test_text_inference, test_vision_inference, and test_agents with the commands outlined in the PR description, ensuring they all still pass (except that 1 agent test that was failing for remote-vllm previously, and needs a separate fix unrelated to this).

@raghotham raghotham merged commit 10b1056 into llamastack:main May 15, 2025
23 checks passed
@bbrowning bbrowning deleted the remote-vllm-multi-tools branch May 15, 2025 18:26
@terrytangyuan
Copy link
Collaborator

Thanks Ben!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support multiple tool call functions in remote vLLM inference provider

5 participants