Add Pytorch Tensor Parallel support for Qwen2, Qwen2Moe, Starcoder2 #35007

VladOS95-cyber · 2024-11-28T17:23:43Z

What does this PR do?

This PR uses the torch.distributed.tensor.parallel subpackage to implement Tensor Parallel for Qwen2, Qwen2Moe, Starcoder2.
Fix qkv states dims for Mistral

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. Link: Add Tensor Parallel support for ALL models #34789
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker

VladOS95-cyber · 2024-11-28T17:25:45Z

Hey @ArthurZucker! This PR is ready for review, please, take a look.

VladOS95-cyber · 2024-11-29T12:24:54Z

I think I should Fix Starcoder2 as well

ArthurZucker · 2024-12-02T10:37:30Z

Thanks, sorry a bit slow this week, reviewing asap!

ArthurZucker

Lovely! Thanks for fixing and bringing more support!

src/transformers/models/qwen2_moe/configuration_qwen2_moe.py

ArthurZucker · 2024-12-04T13:43:45Z

Thanks @VladOS95-cyber 🤗

…uggingface#35007) * add base tp plan for qwen2 and qwen2moe * add parallel tp for starcoder2 * fix modular conversion * add infer dim for qkv states * Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

VladOS95-cyber force-pushed the add-tensor-parallel-support-for-qwen2 branch 2 times, most recently from 2601f22 to 73d5df4 Compare November 29, 2024 12:05

VladOS95-cyber force-pushed the add-tensor-parallel-support-for-qwen2 branch from 73d5df4 to 6d6014e Compare November 29, 2024 17:25

VladOS95-cyber changed the title ~~Add Pytorch Tensor Parallel support for Qwen2 and Qwen2Moe~~ Add Pytorch Tensor Parallel support for Qwen2, Qwen2Moe, Starcoder2 Nov 29, 2024

VladOS95-cyber mentioned this pull request Nov 29, 2024

Add Pytorch Tensor Parallel support for Mistral #34927

Merged

4 tasks

VladOS95-cyber force-pushed the add-tensor-parallel-support-for-qwen2 branch from 8228ce7 to 7292f79 Compare December 2, 2024 11:12

VladOS95-cyber added 4 commits December 4, 2024 08:42

add base tp plan for qwen2 and qwen2moe

76033bb

add parallel tp for starcoder2

e9609e0

fix modular conversion

2b8ee51

add infer dim for qkv states

8da9c10

VladOS95-cyber force-pushed the add-tensor-parallel-support-for-qwen2 branch from 7292f79 to 8da9c10 Compare December 4, 2024 07:42

ArthurZucker approved these changes Dec 4, 2024

View reviewed changes

src/transformers/models/qwen2_moe/configuration_qwen2_moe.py Outdated Show resolved Hide resolved

Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py

a6e47ed

ArthurZucker merged commit accb720 into huggingface:main Dec 4, 2024
2 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Pytorch Tensor Parallel support for Qwen2, Qwen2Moe, Starcoder2 #35007

Add Pytorch Tensor Parallel support for Qwen2, Qwen2Moe, Starcoder2 #35007

VladOS95-cyber commented Nov 28, 2024 •

edited

Loading

VladOS95-cyber commented Nov 28, 2024

VladOS95-cyber commented Nov 29, 2024

ArthurZucker commented Dec 2, 2024

ArthurZucker left a comment

ArthurZucker commented Dec 4, 2024 •

edited

Loading

Add Pytorch Tensor Parallel support for Qwen2, Qwen2Moe, Starcoder2 #35007

Add Pytorch Tensor Parallel support for Qwen2, Qwen2Moe, Starcoder2 #35007

Conversation

VladOS95-cyber commented Nov 28, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

VladOS95-cyber commented Nov 28, 2024

VladOS95-cyber commented Nov 29, 2024

ArthurZucker commented Dec 2, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker commented Dec 4, 2024 • edited Loading

VladOS95-cyber commented Nov 28, 2024 •

edited

Loading

ArthurZucker commented Dec 4, 2024 •

edited

Loading