Add Tensor Parallel to torch_native_llama #1876

kwen2501 · 2024-11-02T00:47:38Z

Motivation

The torch_native_llama model does not have Tensor Parallel support today. This PR adds it, using torch.distributed APIs.

Modifications

Added a .tensor_parallel() utility;
Added ColwiseParallel and RowwiseParallel annotations to related sub-modules;

Tests

pytest test/srt/test_torch_tp.py

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

cc: @jerryzh168 @merrymercy @wz337

python/sglang/srt/models/torch_native_llama.py

python/sglang/srt/model_executor/model_runner.py

python/sglang/srt/models/torch_native_llama.py

Move tp to utils Add ColwiseParallelSharded

python/sglang/srt/models/torch_native_llama.py

jerryzh168

LGTM, I think this is the best we can do for now until we don't use fused qkv and rely on torch.compile for speedup

python/sglang/srt/models/torch_native_llama.py

merrymercy

LGTM. We can merge this soon.

Can we add a unit test for this? Example: https://github.com/sgl-project/sglang/blob/main/test/srt/test_torchao.py
Can you fix the lint errors: https://sgl-project.github.io/references/contributor_guide.html#format-your-code

merrymercy · 2024-11-13T05:30:51Z

python/sglang/srt/models/torch_native_llama.py

@@ -24,7 +24,10 @@
 from torch import nn


Add a unit test or at least add some doc strings here on how to run this model?

Thanks for the suggestion. I added a section describing how to run the model with TP.

kwen2501 · 2024-11-15T20:02:57Z

@merrymercy Thanks much for your review. Was in roadmapping hence the code change was delayed.
I added a unit test: test/srt/test_torch_tp.py.
It can be triggered by:
pytest test/srt/test_torch_tp.py
and would by default use two GPUs.
It is similar to test/srt/test_bench_latency.py.

Thanks for your rebase and would appreciate your review!

Add Tensor Parallel to torch_native_llama

d25b19e

kwen2501 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners November 2, 2024 00:47

kwen2501 marked this pull request as draft November 2, 2024 00:48

jerryzh168 reviewed Nov 2, 2024

View reviewed changes

python/sglang/srt/models/torch_native_llama.py Outdated Show resolved Hide resolved

merrymercy force-pushed the main branch from 55311eb to 2134f08 Compare November 2, 2024 01:26

merrymercy reviewed Nov 2, 2024

View reviewed changes

python/sglang/srt/model_executor/model_runner.py Outdated Show resolved Hide resolved

python/sglang/srt/models/torch_native_llama.py Outdated Show resolved Hide resolved

kwen2501 added 2 commits November 8, 2024 15:31

Support loading in sharded mode

ebb4c75

Move tp to utils Add ColwiseParallelSharded

Add supports_torch_tp gate

e78520c

kwen2501 commented Nov 8, 2024

View reviewed changes

Add torch version

11a153c

jerryzh168 reviewed Nov 9, 2024

View reviewed changes

python/sglang/srt/models/torch_native_llama.py Outdated Show resolved Hide resolved

Modularize TP application

5cc3ca6

jerryzh168 approved these changes Nov 9, 2024

View reviewed changes

kwen2501 marked this pull request as ready for review November 9, 2024 00:38

jerryzh168 reviewed Nov 9, 2024

View reviewed changes

python/sglang/srt/models/torch_native_llama.py Show resolved Hide resolved

Move tp_size to weight loader

ee80b5d

merrymercy approved these changes Nov 9, 2024

View reviewed changes

merrymercy added the await-response label Nov 9, 2024

merrymercy self-assigned this Nov 9, 2024

merrymercy added the high priority label Nov 10, 2024

kwen2501 changed the title ~~[Draft] Add Tensor Parallel to torch_native_llama~~ Add Tensor Parallel to torch_native_llama Nov 11, 2024

kwen2501 added 3 commits November 12, 2024 10:42

Wait async tensor; fix param size; conditional inference mode

4488442

Lint

1173be9

black lint

95ee811

merrymercy reviewed Nov 13, 2024

View reviewed changes

merrymercy and others added 3 commits November 12, 2024 23:39

Merge branch 'main' into tp_llama

4c3cd2d

Add test_torch_tp

049747c

Add documentation for TP usage

531a174

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Tensor Parallel to torch_native_llama #1876

Add Tensor Parallel to torch_native_llama #1876

kwen2501 commented Nov 2, 2024 •

edited

Loading

jerryzh168 left a comment

merrymercy left a comment •

edited

Loading

merrymercy Nov 13, 2024

kwen2501 Nov 15, 2024

kwen2501 commented Nov 15, 2024

Add Tensor Parallel to torch_native_llama #1876

Are you sure you want to change the base?

Add Tensor Parallel to torch_native_llama #1876

Conversation

kwen2501 commented Nov 2, 2024 • edited Loading

Motivation

Modifications

Tests

Checklist

jerryzh168 left a comment

Choose a reason for hiding this comment

merrymercy left a comment • edited Loading

Choose a reason for hiding this comment

merrymercy Nov 13, 2024

Choose a reason for hiding this comment

kwen2501 Nov 15, 2024

Choose a reason for hiding this comment

kwen2501 commented Nov 15, 2024

kwen2501 commented Nov 2, 2024 •

edited

Loading

merrymercy left a comment •

edited

Loading