-
-
Couldn't load subscription status.
- Fork 10.8k
[Bugfix][Model] Fix inference for Hunyuan dense models #25354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: anion <1005128408@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes a regression that prevented dense Hunyuan v1 models from running. The fix involves refactoring the model classes to better separate the logic for dense and Mixture-of-Experts (MoE) variants, introducing a common base class HunyuanV1ModelBase. While the fix is correct, there is significant code duplication in the __init__ methods of HunYuanDenseV1Base and HunYuanMoEV1Base. I've provided suggestions to move the common initialization logic to the base class to improve maintainability and reduce redundancy.
Signed-off-by: anion <1005128408@qq.com>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses a regression that prevented inference on Hunyuan v1 dense models. The refactoring of the class hierarchy, by introducing HunyuanV1ModelBase for common logic and creating separate HunYuanMoEV1Base and HunYuanDenseV1Base for MoE and dense variants respectively, is a clean and correct approach. This properly isolates the MoE-specific logic that was causing runtime errors for dense models. The changes are well-structured and directly fix the reported issue. I have no further recommendations.
|
Hi @abmfy, I noticed that you have reviewed related modifications and familiar with this area. Would you mind helping to review this? Let me know if you have any further suggestions for this PR.Thanks :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for fixing this issue! @simon-mo
|
Hi @abmfy |
Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>
|
@jeejeelee @Isotr0py It seems the other reviewers are quite busy. Could you please help merge this PR? Thank you! |
…25354) Signed-off-by: anion <1005128408@qq.com> Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>
Signed-off-by: anion <1005128408@qq.com> Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
…25354) Signed-off-by: anion <1005128408@qq.com> Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
…25354) Signed-off-by: anion <1005128408@qq.com> Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…25354) Signed-off-by: anion <1005128408@qq.com> Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>
…25354) Signed-off-by: anion <1005128408@qq.com> Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>
…25354) Signed-off-by: anion <1005128408@qq.com> Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
This PR fixes a regression that broke inference support for Hunyuan v1 dense models, such as Hunyuan-1.8B-Instruct.
After recent updates to the model support files, the logic no longer correctly handles the architecture of the dense variant, causing the inference process to fail. This functionality was working correctly in previous versions.
This pr restores the necessary logic to ensure that Hunyuan v1 dense models can be loaded and used for inference successfully.
Test Plan
export VLLM_USE_MODELSCOPE=True python3 -m vllm.entrypoints.openai.api_server --model Tencent-Hunyuan/Hunyuan-1.8B-Instruct --trust_remote_codeTest Result
{"id":"chatcmpl-4d13a0bc43544e2fa6aca38e0d9733c2","object":"chat.completion","created":1758472457,"model":"Tencent-Hunyuan/Hunyuan-1.8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"<think>\nOkay, the user asked me to introduce myself. Let me start by thinking about what they might need. They could be a friend looking for an introduction, or maybe a new person they're trying to meet. Since I don't have personal experiences, I should keep it friendly and general.\n\nFirst, I'll mention my name and where I'm from to give context. Then, highlight key aspects like personality traits—curious, approachable, maybe a bit adventurous. Mentioning hobbies","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":6,"total_tokens":106,"completion_tokens":100,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.