Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama: support Qwen3 #12501

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

llama: support Qwen3 #12501

wants to merge 3 commits into from

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented Mar 21, 2025

Initial draft based on huggingface/transformers#36878

Waiting for model files to become available...

CISC added 3 commits March 21, 2025 17:20

Verified

This commit was signed with the committer’s verified signature. The key has expired.
@github-actions github-actions bot added the python python script changes label Mar 21, 2025
@x0wllaar
Copy link

Are you planning to add MoE support?

@CISC
Copy link
Collaborator Author

CISC commented Mar 21, 2025

Are you planning to add MoE support?

I'm focusing on non-MoE for now, so if someone wants to work on Qwen3MoE in the mean time they are more than welcome to. :)

@x0wllaar
Copy link

Thank you! I not sure I'm up to the task though lol

@ngxson
Copy link
Collaborator

ngxson commented Mar 21, 2025

I had a look at the quen3 MoE python code, it's not much difference from qwen2 MoE. Diff are:

  • Shared experts are removed
  • Added k_norm and q_norm (similar to qwen3 dense)

@CISC
Copy link
Collaborator Author

CISC commented Mar 21, 2025

I had a look at the quen3 MoE python code, it's not much difference from qwen2 MoE.

That was my initial impression too, I can have a stab at it if no-one else volunteers, just didn't want to bite off too much at once (esp. given the flustercuck 57B-A14B was). :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants