-
-
Couldn't load subscription status.
- Fork 10.8k
[Model] New model support for Motif-1-Tiny #23414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the Motif model, including a new PolyNorm custom kernel and a DifferentialFlashAttention backend. The changes are extensive and well-structured. However, I've identified a critical issue in the MotifDecoderLayer implementation where the forward pass logic deviates from the reference model's pre-normalization architecture, which will likely result in incorrect outputs. Additionally, there is a minor typo in the new benchmark script for PolyNorm.
Signed-off-by: WyldeCat <skan1543@gmail.com> Signed-off-by: ca1207 <ca1207zzz@gmail.com>
Sync with https://huggingface.co/Motif-Technologies/Motif-2.6B/blob/main/modeling_motif.py#L366 Signed-off-by: WyldeCat <skan1543@gmail.com> Signed-off-by: ca1207 <ca1207zzz@gmail.com>
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
| "QuantMixtralForCausalLM": _HfExamplesInfo("mistral-community/Mixtral-8x22B-v0.1-AWQ"), # noqa: E501 | ||
| "MotifForCausalLM": _HfExamplesInfo("Motif-Technologies/Motif-2.6B", | ||
| trust_remote_code=True, | ||
| v0_only=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: Why does this model only support V0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DarkLight1337 If this model only supports V0, how should we handle this now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have the model inherit from SupportsV0Only explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And state the reason why via code comment for future reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WoosukKwon is it possible to turn off chunked prefill for generative models in V1?
|
Hi @jeejeelee , just a gentle ping on this PR 🙂. |
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com>
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
|
@jeejeelee I have addressed your PR review comments. Please feel free to take a look whenever you have some time. |
|
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
|
Please resolve the branch conflicts, then considering that the changes in this PR won't significantly affect other features, we can go ahead and merge. |
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work! Overall looks good to me, just a few comments
| } | ||
| }; | ||
|
|
||
| using BlockReduce = cub::BlockReduce<float3, 1024>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we set BlockReduce with blockDim.x 1024
But blockDim.x is decided in
const int max_block_size = (num_tokens < 256) ? 1024 : 256;
dim3 block(std::min(hidden_size, max_block_size)); In this case, worried the inconsistency will cause trouble, could you test on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. In test_layernorm.py, we already have a test case for num_tokens < 256, and it works fine in that case.
NUM_TOKENS = [7, 83, 4096] # Arbitrary values for testing
HIDDEN_SIZES = [8, 768, 769, 770, 771, 5120, 5124, 5125, 5126, 8192,
8199] # Arbitrary values for testing
ADD_RESIDUAL = [False, True]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the work!
|
Please fix the merge conflicts |
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
|
@DarkLight1337 |
|
No need to update as long as there are no merge conflicts. |
Signed-off-by: ca1207 <ca1207zzz@gmail.com> Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com> Co-authored-by: WyldeCat <skan1543@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: ca1207 <ca1207zzz@gmail.com> Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com> Co-authored-by: WyldeCat <skan1543@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: ca1207 <ca1207zzz@gmail.com> Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com> Co-authored-by: WyldeCat <skan1543@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: ca1207 <ca1207zzz@gmail.com> Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com> Co-authored-by: WyldeCat <skan1543@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: ca1207 <ca1207zzz@gmail.com> Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com> Co-authored-by: WyldeCat <skan1543@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
New model for :
https://huggingface.co/Motif-Technologies/Motif-2.6B
https://huggingface.co/Motif-Technologies/Motif-2.6b-v1.1-LC
Tech report (https://arxiv.org/pdf/2508.09148)
Implemented the Polynorm custom kernel based on https://arxiv.org/abs/2411.03884
co-author : @WyldeCat
Test Plan
benchmark for Polynorm kernel (benchmarks/kernels/benchmark_polynorm.py)
Test Result
Benchmark Results
Results for dim = 2048
Results for dim = 4096
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.