Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix fused_qkv model accuracy issue #5217

Merged
merged 2 commits into from
Mar 6, 2024

Conversation

Yejing-Lai
Copy link
Contributor

Fused_qkv model can not correctly choose the fused_qkv type. Need to update the module_name_matches.

@Yejing-Lai
Copy link
Contributor Author

Hi @mrwyattii @delock. Please kindly review, Thanks!

@delock
Copy link
Collaborator

delock commented Mar 1, 2024

@Yejing-Lai what specific value of module_str and k caused the issue?

@Yejing-Lai
Copy link
Contributor Author

@Yejing-Lai what specific value of module_str and k caused the issue?
It is a logic error. The k is contained in module_str. The module_name_matches cannot get the correct fused_type now. This will cause all models to choose bloom_type.

@delock
Copy link
Collaborator

delock commented Mar 1, 2024

if k in module_str does it mean k is a substring of module_str or k is an element of module_str as a list?

@Yejing-Lai
Copy link
Contributor Author

if k in module_str does it mean k is a substring of module_str or k is an element of module_str as a list?
Yes.

@Yejing-Lai
Copy link
Contributor Author

For example:
k = "CodeGenBlock"
module_str="
(module): CodeGenForCausalLM(
(transformer): CodeGenModel(
(wte): Embedding(51200, 2560)
(drop): Dropout(p=0.0, inplace=False)
(h): ModuleList(
(0-31): 32 x CodeGenBlock(
(ln_1): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
(attn): CodeGenAttention(
(attn_dropout): Dropout(p=0.0, inplace=False)
(resid_dropout): Dropout(p=0.0, inplace=False)
(qkv_proj): LinearLayer()
(out_proj): LinearAllreduce()
)
(mlp): CodeGenMLP(
(fc_in): LinearLayer()
(fc_out): LinearAllreduce()
(act): NewGELUActivation()
(dropout): Dropout(p=0.0, inplace=False)
)
)
)
(ln_f): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
)
(lm_head): LmHeadLinearAllreduce()
)
"
We need to judge if k in module_str rather than module_str in k.
Thanks.

@delock
Copy link
Collaborator

delock commented Mar 5, 2024

Hi @mrwyattii can you help review this PR? This PR fixed an accuracy issue for various models with fused qkv. i.e. Baichuan, code gen, bloom, mpt.

@loadams loadams added this pull request to the merge queue Mar 5, 2024
Merged via the queue into microsoft:master with commit bc0d246 Mar 6, 2024
12 checks passed
ShellyNR pushed a commit to ShellyNR/DeepSpeed that referenced this pull request Mar 11, 2024
Fused_qkv model can not correctly choose the fused_qkv type. Need to
update the module_name_matches.

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
rraminen pushed a commit to ROCm/DeepSpeed that referenced this pull request May 9, 2024
Fused_qkv model can not correctly choose the fused_qkv type. Need to
update the module_name_matches.

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants