Add xLAM tool parser support #17148

zuxin666 · 2025-04-25T00:33:01Z

Description

This PR adds support for xLAM-2 models in vLLM's tool calling feature. The xLAM tool parser is designed to support models that generate tool calls in various JSON formats, including Salesforce's Llama-xLAM and Qwen-xLAM models.

Key highlights:

Implemented xLAMToolParser class that can detect function calls in multiple output styles:
- Direct JSON arrays
- JSON within <think>...</think> tags
- JSON within code blocks
- JSON within [TOOL_CALLS] tags
- JSON within <tool_call>...</tool_call> tags
Added support for both streaming and non-streaming modes for tool calls
Implemented robust JSON parsing with fallback mechanisms to handle various output formats
Added support for parallel function calls with effective separation of text content from tool calls

Supported Models

Salesforce Llama-xLAM models: Salesforce/Llama-xLAM-2-8B-fc-r, Salesforce/Llama-xLAM-2-70B-fc-r
Qwen-xLAM models: Salesforce/xLAM-1B-fc-r, Salesforce/xLAM-3B-fc-r, Salesforce/xLAM-32B-fc-r

Fix

Enhances vLLM's tool calling capability by adding support for the xLAM-2 model family.

github-actions · 2025-04-25T00:33:09Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mgoin

Looks reasonable to me, thanks for the clear code and testing. Just a few comments. It would be nice to add a dedicated format example to examples/offline_inference or online_serving

vllm/entrypoints/openai/tool_parsers/xlam_tool_parser.py

zuxin666 · 2025-04-28T20:02:22Z

Hi @mgoin , would you mind taking a look again for this PR? Thank you!

dhaneshsabane · 2025-05-05T16:35:03Z

@zuxin666

Tried this parser on my hosted Salesforce/xLAM-32B-fc-r as a custom parser plugin for vLLM and faced the following error:

llm-1  | INFO:     172.18.0.1:37142 - "POST /v1/chat/completions HTTP/1.1" 200 OK
llm-1  | INFO 05-05 09:22:04 [async_llm.py:228] Added request chatcmpl-e543138e0c3647f197935cbc69e5234d.
llm-1  | Error in streaming tool calls
llm-1  | Traceback (most recent call last):
llm-1  |   File "/xlam_tool_parser.py", line 230, in extract_tool_calls_streaming
llm-1  |     function_name = current_tool_call.get("name")
llm-1  |                     ^^^^^^^^^^^^^^^^^^^^^
llm-1  | AttributeError: 'list' object has no attribute 'get'

A potential bug?

zuxin666 · 2025-05-08T22:45:19Z

Hi @dhaneshsabane , thanks for catching this. I have fixed the streaming issue. I used the following test scripts to test our xLAM models and it works well:

Serving:

vllm serve Salesforce/Llama-xLAM-2-8b-fc-r --enable-auto-tool-choice --tool-call-parser xlam

Testing scripts:

import json
import time

from openai import OpenAI

# Connect to vLLM server
client = OpenAI(base_url="http://localhost:8000/v1", api_key="empty")


# Define tool functions
def get_weather(location: str, unit: str):
    return f"Weather in {location} is 22 degrees {unit}."


def calculate_expression(expression: str):
    try:
        result = eval(expression)
        return f"The result of {expression} is {result}"
    except:
        return f"Could not calculate {expression}"


def search_info(query: str):
    return f"Search results for '{query}': Found multiple relevant documents."


def translate_text(text: str, target_language: str):
    return f"Translation of '{text}' to {target_language}: [translated content]"


# Define tools
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and state, e.g., 'San Francisco, CA'"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"]
                }
            },
            "required": ["location", "unit"]
        }
    }
}, {
    "type": "function",
    "function": {
        "name": "calculate_expression",
        "description": "Calculate a mathematical expression",
        "parameters": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Mathematical expression to evaluate"
                }
            },
            "required": ["expression"]
        }
    }
}, {
    "type": "function",
    "function": {
        "name": "search_info",
        "description": "Search for information on a topic",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query"
                }
            },
            "required": ["query"]
        }
    }
}, {
    "type": "function",
    "function": {
        "name": "translate_text",
        "description": "Translate text to another language",
        "parameters": {
            "type": "object",
            "properties": {
                "text": {
                    "type": "string",
                    "description": "Text to translate"
                },
                "target_language": {
                    "type": "string",
                    "description": "Target language for translation"
                }
            },
            "required": ["text", "target_language"]
        }
    }
}]

# Map of function names to implementations
tool_functions = {
    "get_weather": get_weather,
    "calculate_expression": calculate_expression,
    "search_info": search_info,
    "translate_text": translate_text
}


def process_stream(response, tool_functions):
    """Process a streaming response with possible tool calls"""
    function_name = None
    function_args = ""
    function_id = None

    print("\n--- Stream Output ---")
    for chunk in response:
        # Handle tool calls in the stream
        if chunk.choices[0].delta.tool_calls:
            tool_call = chunk.choices[0].delta.tool_calls[0]

            # Extract function information as it comes in chunks
            if hasattr(tool_call, 'function'):
                if hasattr(tool_call.function,
                           'name') and tool_call.function.name:
                    function_name = tool_call.function.name
                    print(f"Function called: {function_name}")

                if hasattr(tool_call.function,
                           'arguments') and tool_call.function.arguments:
                    function_args += tool_call.function.arguments
                    print(f"Arguments chunk: {tool_call.function.arguments}")

            if hasattr(tool_call, 'id') and tool_call.id:
                function_id = tool_call.id

        # Handle regular content in the stream
        elif chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")

    print("\n--- End Stream ---\n")

    # Execute the function if we received a complete function call
    if function_name and function_args:
        try:
            # Parse the JSON arguments
            args = json.loads(function_args)

            # Call the function with the arguments
            function_result = tool_functions[function_name](**args)

            # Create a follow-up message with the function result
            follow_up_response = client.chat.completions.create(
                model=client.models.list().data[0].id,
                messages=[{
                    "role":
                    "user",
                    "content":
                    "What's the weather like in San Francisco?"
                }, {
                    "role":
                    "assistant",
                    "tool_calls": [{
                        "id": function_id or "call_123",
                        "type": "function",
                        "function": {
                            "name": function_name,
                            "arguments": function_args
                        }
                    }]
                }, {
                    "role": "tool",
                    "tool_call_id": function_id or "call_123",
                    "content": function_result
                }],
                stream=True)

            print(f"\n--- Function Result ---\n{function_result}\n")
            print("\n--- Follow-up Response ---")
            for chunk in follow_up_response:
                if chunk.choices[0].delta.content:
                    print(chunk.choices[0].delta.content, end="")
            print("\n--- End Follow-up ---\n")

        except Exception as e:
            print(f"Error executing function: {e}")


def run_test_case(query, test_name):
    """Run a single test case with the given query"""
    print(f"\n{'='*50}\nTEST CASE: {test_name}\n{'='*50}")
    print(f"Query: '{query}'")

    start_time = time.time()

    # Create streaming chat completion request
    response = client.chat.completions.create(
        model=client.models.list().data[0].id,
        messages=[{
            "role": "user",
            "content": query
        }],
        tools=tools,
        tool_choice="auto",
        stream=True)

    # Process the streaming response
    process_stream(response, tool_functions)

    end_time = time.time()
    print(f"Test completed in {end_time - start_time:.2f} seconds")


# Run test cases
test_cases = [
    ("What's the weather like in San Francisco?", "Weather Information"),
    ("Calculate 25 * 17 + 31", "Math Calculation"),
    ("Search for information about quantum computing", "Information Search"),
    ("Translate 'Hello world' to Spanish", "Text Translation"),
    ("What is the weather in Tokyo in celsius and then calculate 15% of 230",
     "Multiple Tool Usage")
]

# Execute all test cases
for query, test_name in test_cases:
    run_test_case(query, test_name)
    time.sleep(1)  # Small delay between tests

print("\nAll tests completed.")

Please let me know if you find any other issues.

dhaneshsabane · 2025-05-09T17:11:51Z

@zuxin666

The error has disappeared but the tool call in itself is still incorrect. Here's the output of your test script:

==================================================
TEST CASE: Weather Information
==================================================
Query: 'What's the weather like in San Francisco?'

--- Stream Output ---
[{"name": "get_weather", "arguments": {"location": "San Francisco, CA", "unit": "fahrenheit"}}
--- End Stream ---

Test completed in 1.29 seconds

==================================================
TEST CASE: Math Calculation
==================================================
Query: 'Calculate 25 * 17 + 31'

--- Stream Output ---
[{"name": "calculate_expression", "arguments": {"expression": "25 * 17 + 31"}}
--- End Stream ---

Test completed in 1.00 seconds

==================================================
TEST CASE: Information Search
==================================================
Query: 'Search for information about quantum computing'

--- Stream Output ---
[{"name": "search_info", "arguments": {"query": "quantum computing"}}
--- End Stream ---

Test completed in 0.79 seconds

==================================================
TEST CASE: Text Translation
==================================================
Query: 'Translate 'Hello world' to Spanish'

--- Stream Output ---
[{"name": "translate_text", "arguments": {"text": "Hello world", "target_language": "Spanish"}}
--- End Stream ---

Test completed in 0.95 seconds

==================================================
TEST CASE: Multiple Tool Usage
==================================================
Query: 'What is the weather in Tokyo in celsius and then calculate 15% of 230'

--- Stream Output ---
[{"name": "get_weather", "arguments": {"location": "Tokyo", "unit": "celsius"}}, {"name": "calculate_expression", "arguments": {"expression": "0.15 * 230"}}
--- End Stream ---

Test completed in 1.63 seconds

All tests completed.

Notice the missing ] at the end of the tool call. That results in frameworks and libraries to ignore it as a tool call and forward the stream as is as output.

vxtra1973 · 2025-05-09T17:12:57Z

I tried it with a simple langflow agent, it returns:
[{"name": "evaluate_expression", "arguments": {"expression": "14 * 20"}}

rather than calling the tool

zuxin666 · 2025-05-10T23:52:48Z

Hi @dhaneshsabane and @vxtra1973 , sorry about the previous mistakes, I didn't understand the streaming function calling mode very well. The parallel function calls in streaming mode is more complex and difficult to implement than I expected.
I investigated other models like llama's streaming mode behavior and figured out the issue. Now I have revised the streaming parser and uploaded the two example test scripts:

one in streaming fc mode: examples/online_serving/openai_chat_completion_client_with_tools_xlam_streaming.py
one in standard non-streaming fc mode: examples/online_serving/openai_chat_completion_client_with_tools_xlam.py

After serving the model
vllm serve --model Salesforce/Llama-xLAM-2-8b-fc-r --enable-auto-tool-choice --tool-call-parser xlam
and run:
python examples/online_serving/openai_chat_completion_client_with_tools_xlam_streaming.py

The outcome is:

==================================================
TEST CASE: Weather Information
==================================================
Query: 'I want to know the weather in San Francisco'

--- Stream Output ---
[{"Function called: get_weather
Arguments chunk: {
Arguments chunk: "location": "San Francisco", "unit": "celsius"}

--- End Stream ---


--- Function Result (get_weather) ---
Weather in San Francisco is 22 degrees celsius.


--- Follow-up Response ---
The weather in San Francisco is 22 degrees celsius.
--- End Follow-up ---

Test completed in 0.27 seconds

==================================================
TEST CASE: Math Calculation
==================================================
Query: 'Calculate 25 * 17 + 31'

--- Stream Output ---
[{"Function called: calculate_expression
Arguments chunk: {
Arguments chunk: "expression": "25 * 17 + 31"}

--- End Stream ---


--- Function Result (calculate_expression) ---
The result of 25 * 17 + 31 is 456


--- Follow-up Response ---
The result of 25 * 17 + 31 is 456.
--- End Follow-up ---

Test completed in 0.26 seconds

==================================================
TEST CASE: Text Translation
==================================================
Query: 'Translate 'Hello world' to Spanish'

--- Stream Output ---
[{"Function called: translate_text
Arguments chunk: {
Arguments chunk: "text": "Hello world", "target_language": "Spanish"}

--- End Stream ---


--- Function Result (translate_text) ---
Translation of 'Hello world' to Spanish: [translated content]


--- Follow-up Response ---
The translation of 'Hello world' to Spanish is 'Hola mundo'.
--- End Follow-up ---

Test completed in 0.27 seconds

==================================================
TEST CASE: Multiple Tool Usage
==================================================
Query: 'What is the weather in Tokyo and New York in celsius'

--- Stream Output ---
[{"Function called: get_weather
Arguments chunk: {
Arguments chunk: "location": "Tokyo", "unit": "celsius"}
Function called: get_weather
Arguments chunk: {
Arguments chunk: "location": "New York", "unit": "celsius"}

--- End Stream ---


--- Function Result (get_weather) ---
Weather in Tokyo is 22 degrees celsius.


--- Function Result (get_weather) ---
Weather in New York is 22 degrees celsius.


--- Follow-up Response ---
The weather in Tokyo and New York is the same, which is 22 degrees celsius.
--- End Follow-up ---

Test completed in 0.45 seconds

All tests completed.

This should be the expected behavior, right? Let me know your thoughts. Thanks.

mergify · 2025-05-12T08:49:36Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zuxin666.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

loki369loki · 2025-05-15T03:14:30Z

@zuxin666

I am using the latest code and running inference as follows:

vllm serve /data/models/Salesforce/Llama-xLAM-2-8b-fc-r \
  --enable-auto-tool-choice \
  --tool-parser-plugin /data/models/Salesforce/xlam_tool_call_parser_unmerged.py \
  --tool-call-parser xlam \
  --tensor-parallel-size 1 \
  --max-model-len 16000 \
  --host 127.0.0.1 \
  --port 8000 \
  --gpu-memory-utilization 0.80

It seems that every tool call prints "[{"", which is not necessary.

Could you please check this issue? Thank you!

zuxin666 · 2025-05-15T18:36:11Z

Hi @loki369loki , it has been solved, it was because the function call prefix detection logic here.
In the streaming mode if the prefix was not determined as function call, it will simply return the content.

zuxin666 · 2025-05-15T20:50:36Z

Hi @mgoin , can you also please check this PR when you are available? Thx.

mgoin

This looks good to me, thanks for the tests and examples!

zuxin666 · 2025-05-22T23:40:06Z

@mgoin Thanks! Seems like the above CI failing are not related to this PR? Any other blockers to merge it?

mergify · 2025-05-22T23:45:28Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zuxin666.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

This reverts commit 337486885aa0c28bcca123c1ac646afc14435ab7. Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

zuxin666 · 2025-06-18T17:18:50Z

Hi @aarnphm @mgoin , I think the above check failure is not relevant to this PR. Is there any block for merging this PR? Thanks.

houseroad · 2025-06-19T01:15:54Z

Re-triggered the CI.

zuxin666 · 2025-06-19T06:24:58Z

Thanks! Seems that it is good to be merged? @houseroad @aarnphm @mgoin

mergify bot added documentation Improvements or additions to documentation frontend tool-calling labels Apr 25, 2025

mgoin requested review from mgoin and russellb April 25, 2025 21:00

mgoin reviewed Apr 25, 2025

View reviewed changes

vllm/entrypoints/openai/tool_parsers/xlam_tool_parser.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/tool_parsers/xlam_tool_parser.py Outdated Show resolved Hide resolved

russellb added this to Tool Calling Apr 29, 2025

zuxin666 force-pushed the main branch from 29cc40e to ab4dcdd Compare May 12, 2025 05:58

mergify bot added the needs-rebase label May 12, 2025

zuxin666 force-pushed the main branch from ab4dcdd to a0f1392 Compare May 15, 2025 18:33

mergify bot removed the needs-rebase label May 15, 2025

mgoin requested a review from aarnphm May 22, 2025 18:14

mgoin approved these changes May 22, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label May 22, 2025

mergify bot added the needs-rebase label May 22, 2025

zuxin666 added 2 commits June 13, 2025 21:25

add xLAM tool parser support

8b49391

Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

revise the name

d00bea5

Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

zuxin666 added 8 commits June 13, 2025 21:25

fix thinking content and remove print

200aab9

Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

fix xlam parser streaming mode issue

bc0f6b4

Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

let mypy happy

6ea1211

Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

Revert "revise the name"

92c7fe3

This reverts commit 337486885aa0c28bcca123c1ac646afc14435ab7. Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

fix streaming mode issue and add test example scripts

7eda96e

Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

make mypy happy

d2b1e1f

Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

make mypy happy

126f049

Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

Solve the prefix detection in streaming mode

aa7c4a6

Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

zuxin666 force-pushed the main branch from a0f1392 to aa7c4a6 Compare June 13, 2025 21:26

zuxin666 requested a review from hmellor as a code owner June 13, 2025 21:26

mergify bot removed the needs-rebase label Jun 13, 2025

format

eb0d72b

Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

aarnphm approved these changes Jun 17, 2025

View reviewed changes

format

fcb5ee8

Signed-off-by: Zuxin Liu <zuxin.liu@salesforce.com>

mergify bot added the qwen Related to Qwen models label Jun 18, 2025

houseroad merged commit 1d0ae26 into vllm-project:main Jun 19, 2025
66 checks passed

github-project-automation bot moved this to Done in Tool Calling Jun 19, 2025

Uh oh!

Add xLAM tool parser support #17148

Add xLAM tool parser support #17148

Uh oh!

Conversation

zuxin666 commented Apr 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Supported Models

Fix

Uh oh!

github-actions bot commented Apr 25, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zuxin666 commented Apr 28, 2025

Uh oh!

dhaneshsabane commented May 5, 2025

Uh oh!

zuxin666 commented May 8, 2025

Uh oh!

dhaneshsabane commented May 9, 2025

Uh oh!

vxtra1973 commented May 9, 2025

Uh oh!

zuxin666 commented May 10, 2025

Uh oh!

mergify bot commented May 12, 2025

Uh oh!

loki369loki commented May 15, 2025

@zuxin666

Uh oh!

zuxin666 commented May 15, 2025

Uh oh!

zuxin666 commented May 15, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

zuxin666 commented May 22, 2025

Uh oh!

mergify bot commented May 22, 2025

Uh oh!

zuxin666 commented Jun 18, 2025

Uh oh!

houseroad commented Jun 19, 2025

Uh oh!

zuxin666 commented Jun 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

zuxin666 commented Apr 25, 2025 •

edited by github-actions bot

Loading