-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Model] NemotronH Support #22349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] NemotronH Support #22349
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for heterogeneous FFN and explicit head_dim for Nemotron-H models. The changes are well-aligned with the description and seem correct. However, I've identified one critical issue where a parameter with a None default value is used in an operation that would cause a TypeError, which could lead to a runtime crash. I've provided a suggestion to make this parameter required, which should resolve the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As layer_idx can be None, layer_idx + 1 will fail if it's None, adding a guard for layer_idx and ensure hybrid_override_pattern exists before using them would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
change it that now layer_idx cant be None.
for the config.hybrid_override_pattern we do assume that all nemtoronH will have this configuration.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
fa734a4 to
18cdba1
Compare
|
CI fails on the a test, listed in CI Failures Dashboard - not cause by my changes |
|
Can you merge from main to fix Hybrid Models test? |
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> layer_idx cant be None Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> fix head_dim in config Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
18cdba1 to
7d73434
Compare
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for supporting this model in vLLM!
|
use vllm-ascend and DeepSeek -V3 -w8a8 |
|
Looks like you commented on the wrong PR |
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
config.expand * config.hidden_sizetoconfig.mamba_num_heads * config.mamba_head_dimhead_dimconfiguration parameter - Falls back to computedhidden_size // total_num_headswhenhead_dimis not specified (the models does not forcehead_dim = hidden_size // total_num_heads