-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[megatron] feat: support qwen2 megatron backend #261
[megatron] feat: support qwen2 megatron backend #261
Conversation
@@ -282,6 +282,7 @@ def mistral_megatron_weight_loader(actor_weights: Dict, vllm_model: nn.Module) - | |||
'LlamaForCausalLM': llama_megatron_weight_loader, # use te backend for open-source megatron | |||
'LLaMAForCausalLM': llama_megatron_weight_loader, | |||
'MistralForCausalLM': mistral_megatron_weight_loader, | |||
'Qwen2ForCausalLM': llama_megatron_weight_loader, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you test them all? maybe only including v0.6.3 is sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have tested the loaders on v0.4.2, v0.5.3, and v0.6.3. But the score was only tested on v0.6.3. Maybe i remove the it in v0.4.2 and v0.5.3?
@@ -0,0 +1,42 @@ | |||
set -x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could u later also add a section to https://github.com/volcengine/verl/blob/main/docs/experiment/ppo.rst , add a new table for MATH, and include the logs, command, and test score in next PR
Could you add QWen 2.5 0.5b to the CI? |
Hi, Dude~ Thank you for your work. Is there any plan to support saver for Qwen2? |
For models that enable tie_word_embedding, we cannot get "lm_head". |
Yes, I have noticed that. I am working on it. |
It can now handle tie_word_embedding=True. The result will be updated in next PR. |
return layer_map | ||
|
||
|
||
def merge_megatron_ckpt_llama(wrapped_models, config, is_value_model=False, dtype='bf16'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the function name here should be a typo? (llama
-> qwen2
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here is a typo. This PR has not supported the saver yet, but it may support it in the future.
Support Qwen2 Megatron backend
The code is primarily adapted from the llama folder, with modifications to use QKV bias and remove the rope_scaling of RoPE in
verl/models/qwen2/megatron/layers/parallel_attention.py
.