Skip to content

Conversation

@wxsIcey
Copy link
Collaborator

@wxsIcey wxsIcey commented Oct 13, 2025

What this PR does / why we need it?

Fix Qwen3NextGatedDeltaNet, caused by vllm-project/vllm#26437

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

def main():
    prompts = [
        "窗前明月光,",
        "The president of the United States is Mr.",
        "The capital of France is",
        "The future of AI is",
        "感时花溅泪,",
        "家书抵万金啥意思?",
        "plz tell me a story: ",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95)
    # Create an LLM.
    llm = LLM(
        model="/root/.cache/modelscope/hub/models/Qwen/Qwen3-Next-80B-A3B-Instruct",
              tensor_parallel_size=4,
              enforce_eager=True,
              trust_remote_code=True,
              max_model_len=256,
              gpu_memory_utilization=0.7,
              block_size=64
              )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@wxsIcey
Copy link
Collaborator Author

wxsIcey commented Oct 13, 2025

needs #3719

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug with the Qwen3-next model on vLLM. The changes involve vendoring a Triton kernel for layer normalization and fixing a configuration issue in the Mamba model setup. The fixes appear to be correct and address the immediate problem. My primary feedback concerns the long-term maintainability of vendoring the Triton kernel. While this is a valid approach for a hotfix, it introduces risks of divergence from the upstream vLLM project. I've provided a comment with a suggestion to mitigate this risk.

@wxsIcey
Copy link
Collaborator Author

wxsIcey commented Oct 13, 2025

Incompatible with upstream changes because ''torch_npu._C._NPUDeviceProperties' object has no attribute 'multi_processor_count'

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@wxsIcey wxsIcey changed the title [BugFix][main] Fix Qwen3-next because of vllm #26207 [BugFix][0.11.1] Fix Qwen3-next because of vllm #26207 Oct 16, 2025
@wxsIcey wxsIcey changed the title [BugFix][0.11.1] Fix Qwen3-next because of vllm #26207 [BugFix][0.11.1] Fix Qwen3-next Oct 17, 2025
@wxsIcey wxsIcey changed the title [BugFix][0.11.1] Fix Qwen3-next [BugFix][main] Fix Qwen3-next break Oct 17, 2025
@wxsIcey
Copy link
Collaborator Author

wxsIcey commented Oct 17, 2025

Test Results:

Prompt: '窗前明月光,', Generated text: '疑是地上霜。\nA. 举头望明月\nB. 疑是地上霜\nC. 举头望明月\nD. 低头思故乡\n答案:\n C\n\n在1999年3月14日,以____为单位的欧元在11个欧洲国家开始使用\nA. 1989\nB. 1990\nC. 1998\nD. 199'
Prompt: 'The president of the United States is Mr.', Generated text: " Joe Biden. In 2021, President Biden declared that he would take steps to protect the rights of the LGBTQ+ community. He signed an executive order on January 20, 2021, directing federal agencies to ensure that all federal laws and policies are enforced in a way that protects the rights of LGBTQ+ individuals. This action was taken in response to the Trump administration's policies, which had rolled back many protections for LGBTQ+ individuals. The Biden administration has also taken"
Prompt: 'The capital of France is', Generated text: ' Paris. What is the capital city of France?\n\nThe capital of France is Paris.'
Prompt: 'The future of AI is', Generated text: ' a topic of intense debate and exploration, particularly as the field continues to evolve at an unprecedented pace. One of the most intriguing and influential figures in this domain is Yann LeCun, a French computer scientist and a leading figure in the development of artificial intelligence (AI) and deep learning. In 2021, LeCun delivered a keynote address that sparked intense debate within the AI community, challenging the conventional wisdom that scale alone—increasing the number of parameters in neural networks—'
Prompt: '感时花溅泪,', Generated text: '恨别鸟惊心。是写实还是写意?请结合你对本诗的理解,谈谈对这一问题的看法。\n\n这个问题涉及对杜甫《春望》一诗的解读,尤其是对“感时花溅泪,恨别鸟惊心”这一联的解读。我们先明确题干:“**感时花溅泪,恨别鸟惊心**”出自杜甫《春望》,而非“感”与“不”——题干中'
Prompt: '家书抵万金啥意思?', Generated text: '?????????????\n\n“**海阔凭鱼跃,天高任鸟飞**”——这句话你可能听多了,但你问的这个“**书信**”其实是**杜甫的诗**,不是“书信”哦 😊\n\n我们来认真、温柔地聊聊~\n\n---\n\n🌟 **你问的其实是:**  \n> “**烽火连三月,家书抵万金**” ——'
Prompt: 'plz tell me a story: ', Generated text: ' a dog named Max, 1000 years old, who is a magical dog and can turn into a doggo\n\n**Title: Max, the Dog Who Lived a Thousand Years (And Still Barked Like a Goof)**\n\nLong, long ago—before the pyramids stood tall, before kings wore crowns made of gold instead of just *thinking* they looked cool in hats—there lived a dog.\n\nNot just *any* dog.\n\nA **great, goofy,

It seems there is a accuracy problem

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@wxsIcey
Copy link
Collaborator Author

wxsIcey commented Oct 20, 2025

Test Results:

Prompt: '窗前明月光,', Generated text: '疑是地上霜。\nA. 举头望明月\nB. 疑是地上霜\nC. 举头望明月\nD. 低头思故乡\n答案:\n C\n\n在1999年3月14日,以____为单位的欧元在11个欧洲国家开始使用\nA. 1989\nB. 1990\nC. 1998\nD. 199'
Prompt: 'The president of the United States is Mr.', Generated text: " Joe Biden. In 2021, President Biden declared that he would take steps to protect the rights of the LGBTQ+ community. He signed an executive order on January 20, 2021, directing federal agencies to ensure that all federal laws and policies are enforced in a way that protects the rights of LGBTQ+ individuals. This action was taken in response to the Trump administration's policies, which had rolled back many protections for LGBTQ+ individuals. The Biden administration has also taken"
Prompt: 'The capital of France is', Generated text: ' Paris. What is the capital city of France?\n\nThe capital of France is Paris.'
Prompt: 'The future of AI is', Generated text: ' a topic of intense debate and exploration, particularly as the field continues to evolve at an unprecedented pace. One of the most intriguing and influential figures in this domain is Yann LeCun, a French computer scientist and a leading figure in the development of artificial intelligence (AI) and deep learning. In 2021, LeCun delivered a keynote address that sparked intense debate within the AI community, challenging the conventional wisdom that scale alone—increasing the number of parameters in neural networks—'
Prompt: '感时花溅泪,', Generated text: '恨别鸟惊心。是写实还是写意?请结合你对本诗的理解,谈谈对这一问题的看法。\n\n这个问题涉及对杜甫《春望》一诗的解读,尤其是对“感时花溅泪,恨别鸟惊心”这一联的解读。我们先明确题干:“**感时花溅泪,恨别鸟惊心**”出自杜甫《春望》,而非“感”与“不”——题干中'
Prompt: '家书抵万金啥意思?', Generated text: '?????????????\n\n“**海阔凭鱼跃,天高任鸟飞**”——这句话你可能听多了,但你问的这个“**书信**”其实是**杜甫的诗**,不是“书信”哦 😊\n\n我们来认真、温柔地聊聊~\n\n---\n\n🌟 **你问的其实是:**  \n> “**烽火连三月,家书抵万金**” ——'
Prompt: 'plz tell me a story: ', Generated text: ' a dog named Max, 1000 years old, who is a magical dog and can turn into a doggo\n\n**Title: Max, the Dog Who Lived a Thousand Years (And Still Barked Like a Goof)**\n\nLong, long ago—before the pyramids stood tall, before kings wore crowns made of gold instead of just *thinking* they looked cool in hats—there lived a dog.\n\nNot just *any* dog.\n\nA **great, goofy,

It seems there is a accuracy problem

This have been sovled by 6c65dd8

Signed-off-by: Icey <1790571317@qq.com>
@wxsIcey wxsIcey changed the title [BugFix][main] Fix Qwen3-next break [BugFix][0.11.1] Fix Qwen3-next break Oct 23, 2025
@wxsIcey
Copy link
Collaborator Author

wxsIcey commented Oct 23, 2025

Incompatible with upstream changes because ''torch_npu._C._NPUDeviceProperties' object has no attribute 'multi_processor_count'

has been solved by #3549

Signed-off-by: Icey <1790571317@qq.com>
@wxsIcey wxsIcey added ready read for review ready-for-test start test by label for PR labels Oct 25, 2025
@wxsIcey wxsIcey changed the title [BugFix][0.11.1] Fix Qwen3-next break [BugFix] Fix Qwen3-next break Oct 25, 2025
@wxsIcey wxsIcey closed this Oct 25, 2025
@wxsIcey wxsIcey reopened this Oct 25, 2025
@MengqingCao MengqingCao merged commit bb5f16d into vllm-project:main Oct 25, 2025
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants