-
-
Couldn't load subscription status.
- Fork 10.8k
[Frontend] Add a new xml-based tool parser for qwen3-coder #25028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] Add a new xml-based tool parser for qwen3-coder #25028
Conversation
Signed-off-by: Zhikaiiii <1658973216@qq.com>
Signed-off-by: Zhikaiiii <1658973216@qq.com>
Signed-off-by: Zhikaiiii <1658973216@qq.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new XML-based tool parser, Qwen3CoderXMLToolParser, designed for streaming and handling various corner cases in tool call parsing. The existing tests have been refactored to use a parameterized fixture, which is a great approach to ensure both the original and the new parser are tested against the same comprehensive suite of tests.
My review focuses on the new parser implementation. I've identified a critical security vulnerability related to the use of ast.literal_eval which could lead to a Denial of Service. I've also found a couple of high-severity issues concerning silent exception handling and overly greedy regex patterns that could affect the parser's robustness and maintainability. Addressing these points will significantly improve the quality and security of the new parser.
| raw_for_parse = raw_text + '\n' | ||
| else: | ||
| raw_for_parse = raw_text | ||
| parsed_value = ast.literal_eval(raw_for_parse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ast.literal_eval on input from an LLM without any size restrictions can expose the system to a Denial of Service (DoS) attack. A malicious or malformed model output could provide a very large or deeply nested structure that consumes excessive CPU or memory, or causes a stack overflow during parsing. This can crash the server process. It's critical to add a size limit check before evaluating the raw text.
# A reasonable limit to prevent DoS attacks.
# This can be made configurable if needed.
MAX_LITERAL_SIZE = 1_000_000
if len(raw_for_parse) > MAX_LITERAL_SIZE:
raise ValueError(
f"Parameter value size ({len(raw_for_parse)}) "
f"exceeds the limit for literal_eval "
f"({MAX_LITERAL_SIZE}).")
parsed_value = ast.literal_eval(raw_for_parse)| except Exception: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silently ignoring all exceptions during XML parsing can hide critical bugs and make debugging extremely difficult. If self.parser.Parse fails due to malformed input from _find_next_complete_element or other issues, the error is swallowed. This can lead to incorrect or incomplete tool call generation without any warning. It's crucial to log these exceptions to aid in debugging and to improve the parser's robustness.
| except Exception: | |
| pass | |
| except Exception: | |
| logger.warning("Failed to parse XML chunk: '%s'", | |
| preprocessed_element, | |
| exc_info=True) | |
| pass |
| processed = re.sub(r'<function=([^>]+)>', r'<function name="\1">', | ||
| chunk) | ||
| # Handle <parameter=name> format -> <parameter name="name"> | ||
| processed = re.sub(r'<parameter=([^>]+)>', r'<parameter name="\1">', | ||
| processed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex ([^>]+) used to capture function and parameter names is too greedy and can lead to invalid XML if the model generates a malformed name. For instance, an output like <function=my_func(arg="val")> would result in a broken XML tag <function name="my_func(arg="val")">. To improve robustness, the regex should be more restrictive, allowing only a specific set of characters that are valid for identifiers.
| processed = re.sub(r'<function=([^>]+)>', r'<function name="\1">', | |
| chunk) | |
| # Handle <parameter=name> format -> <parameter name="name"> | |
| processed = re.sub(r'<parameter=([^>]+)>', r'<parameter name="\1">', | |
| processed) | |
| processed = re.sub(r'<function=([a-zA-Z0-9_.-]+)>', r'<function name="\1">', | |
| chunk) | |
| # Handle <parameter=name> format -> <parameter name="name"> | |
| processed = re.sub(r'<parameter=([a-zA-Z0-9_.-]+)>', r'<parameter name="\1">', | |
| processed) |
Signed-off-by: Zhikaiiii <1658973216@qq.com>
Hi, @Zhikaiiii Thanks~, Are you from the Qwen team? |
|
Yes this is from the Qwen team |
vllm/entrypoints/openai/tool_parsers/qwen3coder_xml_tool_parser.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Zhikaiiii <1658973216@qq.com>
Signed-off-by: Zhikaiiii <1658973216@qq.com>
Signed-off-by: Zhikaiiii <1658973216@qq.com>
| self.deferred_param_raw_value = "" | ||
|
|
||
|
|
||
| @ToolParserManager.register_module("qwen3_coder_xml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Zhikaiiii, just a small question: the existing qwen3_coder should also be contributed by the Qwen team’s @ranpox.
From the unit tests, it looks like qwen3_coder_xml can completely replace qwen3_coder.
So should we deprecate the existing qwen3_coder and adopt the newer, more stable qwen3_coder_xml instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don’t deprecate it(qwen3_coder), how can end users determine whether they should use qwen3_coder_xml or qwen3_coder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @simon-mo @DarkLight1337 @aarnphm WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we intent to use qwen3_coder_xml replace qwen3_coder, but for a clear review and some accuracy problems, we did not directly replace it. After we fix the accuracy problem, and all review comments been resolved, we will rename it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accuracy problem is eliminated, so we can move on to review and merge this PR @Zhikaiiii @chaunceyjiang
Signed-off-by: Zhikaiiii <1658973216@qq.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Zhikaiiii <1658973216@qq.com>
Signed-off-by: Zhikaiiii <1658973216@qq.com>
| self.deferred_param_raw_value = "" | ||
|
|
||
|
|
||
| @ToolParserManager.register_module("qwen3_xml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest renaming it to qwen3_coder, and deprecating the original qwen3_coder.
Since both are intended for use with the code3-coder model, maintaining two different tool_parser implementations would incur high maintenance costs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chaunceyjiang
This is because in future qwen3-series model, we might also use this parser, not just in the coder model. Therefore, we have named it to qwen3_xml instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see.
docs/features/tool_calling.md
Could you please update the documentation to recommend which models should use this parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Zhikaiiii <1658973216@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks~
…ect#25028) Signed-off-by: Zhikaiiii <1658973216@qq.com>
…ect#25028) Signed-off-by: Zhikaiiii <1658973216@qq.com> Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Zhikaiiii <1658973216@qq.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
Hi, I tried this tool call parser in v0.11.0 with cmd: It will report error: (APIServer pid=1904320) ERROR 10-04 10:52:12 [serving_chat.py:1145] Error in chat completion stream generator.
(APIServer pid=1904320) ERROR 10-04 10:52:12 [serving_chat.py:1145] Traceback (most recent call last):
(APIServer pid=1904320) ERROR 10-04 10:52:12 [serving_chat.py:1145] File "/home/v-kenanli/venvs/serve/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 1033, in chat_completion_stream_generator
(APIServer pid=1904320) ERROR 10-04 10:52:12 [serving_chat.py:1145] tool_parser.prev_tool_call_arr[index].get(
(APIServer pid=1904320) ERROR 10-04 10:52:12 [serving_chat.py:1145] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
(APIServer pid=1904320) ERROR 10-04 10:52:12 [serving_chat.py:1145] IndexError: list index out of rangeDo you have any suggestion on resolving this issue? |
sry , I will check this bug asap. You can try use old parser |
Running into the exact same issue docker run --runtime nvidia --shm-size=2g --gpus all -p 8000:8000 -v ~/.cache/huggingface:/root/.cache/huggingface --env "HUGGING_FACE_HUB_TOKEN=REMOVED" --env "PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True" --env "CUDA_DEVICE_ORDER=PCI_BUS_ID" --env "VLLM_LOGGING_LEVEL=DEBUG" --ipc=host vllm/vllm-openai:v0.11.0 --model QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ --quantization awq --tensor-parallel-size 2 --max-model-len 65536 --gpu-memory-utilization 0.85 --dtype float16 --enable-auto-tool-choice --tool-call-parser qwen3_xml --host 0.0.0.0 --port 8000 --enable-log-requests (APIServer pid=1) ERROR 10-06 15:16:11 [serving_chat.py:1145] Error in chat completion stream generator. Basically trying to tool call an MCP server using qwen code qwen -v |
|
@geraldthewes @zxgx hi, I have fixed this problem in #26345. |
@Zhikaiiii Sorry about the delay, I had to learnhow to rebuild the vllm docker image and it took time to build it. Sort answer is it seems to have fixed the issue I reported as I no longer see it and qwen code CLI seems to work fine at this point for common tasks. But I seem to have run into a different issue - not sure what your advice is but some tool call are failing, looking at the output from my model seems like a problem with the following model: model QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ As I see it outputs <|im_start|>user\nUse run_shell_command to run "jobforge submit-job deploy/build.yaml"<|im_end|>\n<|im_start|>assistant\nI'll run the jobforge submit-job command to deploy the build.yaml file.\n\n<function=run_shell_command>\n<parameter=command>\njobforge submit-job deploy/build.yaml\n\n<parameter=is_background>\nFalse\n\n<parameter=description>\nRunning jobforge submit-job command to deploy build.yaml\n\n\n</tool_call><|im_end|> So somehow seems like the model failed to include the starting <tool_call> token for some reason. |
…ect#25028) Signed-off-by: Zhikaiiii <1658973216@qq.com> Signed-off-by: gaojc <1055866782@qq.com>
Thank you for the verification. The latter issue appears to be caused by the parser's fallback logic failing in this particular case. How frequently does this case occur? I'll try to fix this problem asap. |
@geraldthewes I'll run the jobforge submit-job command to deploy the build.yaml file.\n\n<function=run_shell_command>\n<parameter=command>\njobforge submit-job deploy/build.yaml\n\n<parameter=is_background>\nFalse\n\n<parameter=description>\nRunning jobforge submit-job command to deploy build.yaml\n\n\n</tool_call>it seems that the parser return correct result, tool_calls=[ToolCall(id='chatcmpl-tool-67620c18dccc425abafda6fb9172a493', type='function', function=FunctionCall(name='run_shell_command', arguments='{"command": "jobforge submit-job deploy/build.yaml\\n", "is_background": "False\\n", "description": "Running jobforge submit-job command to deploy build.yaml\\n\\n'))] content="I'll run the jobforge submit-job command to deploy the build.yaml file.\n\n"What is the parsed output in your sevrer look like? |
|
Hi, I want to know why Qwen no longer uses JSON and instead uses XML? |
…ect#25028) Signed-off-by: Zhikaiiii <1658973216@qq.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
|
I'm not sure what you are asking. I run qwen coder as follows docker run --runtime nvidia --shm-size=2g --gpus all -p 8000:8000 -v /mnt/data1/huggingface:/root/.cache/huggingface --env "HUGGING_FACE_HUB_TOKEN=REMOVED" --env "PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True" --env "CUDA_DEVICE_ORDER=PCI_BUS_ID" --ipc=host qwen3_xml_parser_fix:latest --model QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ --quantization awq --tensor-parallel-size 2 --max-model-len 65536 --gpu-memory-utilization 0.85 --dtype float16 --enable-auto-tool-choice --tool-call-parser qwen3_xml --host 0.0.0.0 --port 8000 --enable-log-requests And look at the output logs - I had assumed the output is after your parsing tool, but I guess that is incorrect - since it's my understanding the whole point of your parser is to emit JSON tool calling format that is more widely understood by code agents. Let me know if there is an easy way to capture that output - otherwise I will setup a proxy to see what is really returned to the coding agent.
|
I'll run the jobforge submit-job command to deploy the build.yaml file.\n\n<function=run_shell_command>\n<parameter=command>\njobforge submit-job deploy/build.yaml\n\n<parameter=is_background>\nFalse\n\n<parameter=description>\nRunning jobforge submit-job command to deploy build.yaml\n\n\n</tool_call>Normally, as you understand, the code shown above is the raw output from the model. The parser will convert this output into JSON tool format for downstream use like qwen-code or other scenarios. I directly used the model output from your example as the input to the parser. Although this output appears to be missing
|
Fixes an issue where Qwen3-Coder models sometimes generate tool calls starting directly with <function=...> instead of the expected <tool_call><function=...> structure. Changes: - Enhanced _find_next_complete_element to detect <function= tags as potential tool call starts when not currently parsing a tool call - Updated _should_skip_element to properly handle function tags that appear without tool_call wrappers - Added logic to wait for complete <function= tags before processing to avoid treating partial tags as text The parser now gracefully handles: 1. Missing opening <tool_call> tag (starts with <function=) 2. Missing both <tool_call> tags (only function wrapper) 3. Streaming mode with missing tags Leverages existing fallback logic (line 628-630) that auto-creates a tool_call when a function element is encountered without a parent tool_call context. Tests: - test_extract_tool_calls_missing_opening_tool_call_tag: Tests the exact scenario from the bug report with run_shell_command - test_extract_tool_calls_missing_both_tool_call_tags: Tests when both opening and closing tool_call tags are missing - test_extract_tool_calls_streaming_missing_opening_tag: Validates streaming behavior with missing tags Fixes: vllm-project#25028 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
ok, Yesterday I asked claude code to make a fix for this. Unfortunately creating the docker images for me to test takes forever. I was able to test it today and now the example I have now works. I have pushed my patch here https://github.com/geraldthewes/vllm/tree/fix/qwen3_xml_missing_tool_call_tag That said multiple caveats. I know nothing on the parser setup, I have not looked at the code changes made by Claude code. While it did add a test, not sure it knew how to run the test. But it is good news that at least fixed my symptoms. Let me know what makes more sense to do at this point. I will test it more today to see if this finally gives me a working combination of qwen code with 4-bit QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ running locally using vllm. But nice to see qwen code finally tool call my example after asking my permission: "I'll run the jobforge submit-job command to deploy the build.yaml file. ╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
|
|
Thank you very much for your reply. Based on the code you provided, I have fixed this issue and merged it into the #26345
|
…ect#25028) Signed-off-by: Zhikaiiii <1658973216@qq.com>
|
Thank you so much, this version works too. Very appreciated, hope it gets merged into the main vllm branch. -- gerald
|
|
@geraldthewes your huggingface token was included in multiple comments on this PR. I suggest revoking it. |
Yes, already done. Thanks. |
…ect#25028) Signed-off-by: Zhikaiiii <1658973216@qq.com>
…ect#25028) Signed-off-by: Zhikaiiii <1658973216@qq.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
Contribute the internal tool parser used at Qwen API Service, which use a standard xml parser to parse text streamingly, and handles a lot of corner cases:
Test Plan
We test both origin parser and new parser in test_qwen3coder_tool_parser.py
Test Result