Skip to content

Conversation

@CSWYF3634076
Copy link
Contributor

@CSWYF3634076 CSWYF3634076 commented Oct 15, 2025

Purpose

Fix the following issue

Due to SharedFusedMoE forward return is tuple (#26145), it is no flatten() method

(EngineCore_DP0 pid=54051)   File "/root/paddlejob/wangyafeng/myGithub/vllm/vllm/model_executor/models/ernie45_vl_moe.py", line 486, in forward
(EngineCore_DP0 pid=54051)     hidden_states = self.mlp(hidden_states, visual_token_mask, **kwargs)
(EngineCore_DP0 pid=54051)                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=54051)   File "/root/paddlejob/wangyafeng/py312env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(EngineCore_DP0 pid=54051)     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=54051)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=54051)   File "/root/paddlejob/wangyafeng/py312env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(EngineCore_DP0 pid=54051)     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=54051)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=54051)   File "/root/paddlejob/wangyafeng/myGithub/vllm/vllm/model_executor/models/ernie45_vl_moe.py", line 358, in forward
(EngineCore_DP0 pid=54051)     ).flatten()
(EngineCore_DP0 pid=54051)       ^^^^^^^
(EngineCore_DP0 pid=54051) AttributeError: 'tuple' object has no attribute 'flatten'

Test Plan

vllm serve baidu/ERNIE-4.5-VL-28B-A3B-PT --served-model-name ERNIE-45-VL-28B --port 8503  --gpu-memory-utilization 0.95 --trust-remote-code
import base64
import os

import requests
from openai import OpenAI
from urllib.parse import urlparse

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://127.0.0.1:8503/v1"

client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)

def encode_base64_content_from_url(content_url: str) -> str:
    """Encode a content retrieved from a remote url to base64 format."""

    with requests.get(content_url) as response:
        response.raise_for_status()
        result = base64.b64encode(response.content).decode("utf-8")

    return result

def to_base64(content_path: str) -> str:
    """Encode content from a remote URL or local file to base64 format."""
    parsed = urlparse(content_path)
    if parsed.scheme in ("http", "https", "ftp"):
        print(content_path)
        with requests.get(content_path) as response:
            response.raise_for_status()
            data = response.content
    else:
        print(content_path)
        if not os.path.exists(content_path):
            raise FileNotFoundError(f"File not found: {content_path}")
        with open(content_path, "rb") as f:
            data = f.read()

    return base64.b64encode(data).decode("utf-8")

# Single-image input inference
def run_image() -> None:
    
    image_url_1 = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
    image_base64_1 = to_base64(image_url_1)
    chat_stream = client.chat.completions.create(
        model="ERNIE-45-VL-28B",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Is the dog on the left or right"},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{image_base64_1}"},
                    },
                ],
            }
        ],
        max_completion_tokens=1024,
        temperature=0,
        top_p=1,
        stream=True,
        extra_body={
            "skip_special_tokens": False,
            "chat_template_kwargs":{"enable_thinking": False}
        }
    )


    reasoning_content_list = []
    content_list = []
    for chunk in chat_stream:
        # print(chunk)
        reasoning_content = getattr(chunk.choices[0].delta, "reasoning_content", None)
        content = chunk.choices[0].delta.content
        if reasoning_content:
            print(reasoning_content, end="", flush=True)
            reasoning_content_list.append(reasoning_content)
        if content:
            print(content, end="", flush=True)
            content_list.append(content)


def main(args) -> None:
    run_image()


if __name__ == "__main__":
    # args = parse_args()
    args = ""
    main(args)

Test Result

图中展示了一只坐在沙滩上的狗和一位坐在狗旁边的女性。狗位于图片的左侧,它穿着带有彩色图案的背带,前爪抬起与女性相握,似乎在互动玩耍。女性穿着格子衬衫和深色裤子,坐在狗的右侧,面带微笑,看向狗的方向。背景是广阔的海滩和海洋,海浪轻轻拍打着岸边,天空呈现出柔和的光线,可能是日出或日落时分,整个画面给人一种温馨、宁静的感觉。

English translation is

The picture shows a dog sitting on the beach and a woman sitting next to the dog. The dog is located on the left side of the picture, wearing a colorful patterned harness, with its front paws raised and held by a woman, seemingly interacting and playing. The woman was wearing a checkered shirt and dark pants, sitting on the right side of the dog with a smile on her face, looking in the direction of the dog. The background is a vast beach and ocean, with waves gently crashing against the shore. The sky presents a soft light, perhaps at sunrise or sunset, giving the whole picture a warm and peaceful feeling.

… optimization pr

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a crash in the ernie45_vl_moe model that occurred when processing mixed-modality inputs. The original code incorrectly assumed the MoE layer returns a tensor, while it returns a tuple, leading to an AttributeError. The fix correctly unpacks the tuple and handles the outputs from shared and regular experts.

However, I've identified a critical issue with the current implementation. For layers that are MoE for one modality (e.g., vision) but a standard MLP for another (e.g., text), the code will crash. This is because it unconditionally tries to unpack a tuple, but the MLP returns a single tensor. I've provided a suggestion to make the code robust by checking the type of the expert module before processing its output, which will resolve this issue.

@CSWYF3634076
Copy link
Contributor Author

cc @bnellnm

text_token_mask = ~visual_token_mask
final_hidden_states = torch.zeros_like(hidden_states)
final_experts_hidden_states = torch.zeros_like(hidden_states)
final_shard_ouput = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shard -> shared

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@bnellnm bnellnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

… optimization pr v2

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
@CSWYF3634076
Copy link
Contributor Author

Thanks for fixing this!

@bnellnm Can you trigger the CI?

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 16, 2025
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 16, 2025 07:06
@vllm-bot vllm-bot merged commit e519287 into vllm-project:main Oct 16, 2025
53 of 55 checks passed
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 16, 2025
…ation (vllm-project#26885)

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Zhuul pushed a commit to Zhuul/vllm that referenced this pull request Oct 17, 2025
…ation (vllm-project#26885)

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Oct 17, 2025
…ation (vllm-project#26885)

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…ation (vllm-project#26885)

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…ation (vllm-project#26885)

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…ation (vllm-project#26885)

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…ation (vllm-project#26885)

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…ation (vllm-project#26885)

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…ation (vllm-project#26885)

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants