[Model] Pipeline Parallel Support for DeepSeek v2 #6519

tjohnson31415 · 2024-07-17T20:27:22Z

Adds pipeline parallel support for DeepSeek v2.

Tested with https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct using --tensor-parallel-size 1 --pipeline-parallel-size 2

github-actions · 2024-07-17T20:27:33Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only trigger fastcheck CI to run, which consists only a small and essential subset of tests to quickly catch errors with the flexibility to run extra individual tests on top (you can do this by unblocking test steps in the Buildkite run).

Full CI run is still required to merge this PR so once the PR is ready to go, please make sure to run it. If you need all test signals in between PR commits, you can trigger full CI as well.

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2024-07-17T22:44:55Z

can you test the correctness locally, using https://github.com/vllm-project/vllm/blob/main/tests/distributed/test_pipeline_parallel.py ?

tjohnson31415 · 2024-07-18T16:00:40Z

can you test the correctness locally, using https://github.com/vllm-project/vllm/blob/main/tests/distributed/test_pipeline_parallel.py ?

Sure. I edited the file to set the model to "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct"and added --trust-remote-code to the launch args and was able to see the test_pipeline_parallel tests passing:

$ pytest -s tests/distributed/test_pipeline_parallel.py

...
.INFO:     Shutting down


=============================== warnings summary ===============================
my-vllm/lib64/python3.11/site-packages/transformers/utils/hub.py:127
  /workspace/my-vllm/lib64/python3.11/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================== 5 passed, 1 warning in 393.58s (0:06:33) ===================
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [75971]
INFO 07-18 15:40:48 multiproc_worker_utils.py:136] Terminating local vLLM worker processes
(VllmWorkerProcess pid=76043) INFO 07-18 15:40:48 multiproc_worker_utils.py:237] Worker exiting
INFO 07-18 15:40:48 async_llm_engine.py:51] Engine is gracefully shutting down.
[rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
/usr/lib64/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

tjohnson31415 · 2024-07-18T16:21:43Z

Rebased to resolve conflict from main. Reran the tests and they pass still.

youkaichao · 2024-07-18T18:19:28Z

vllm/model_executor/models/deepseek_v2.py

+        self.start_layer, self.end_layer, self.layers = make_layers(
+            config.num_hidden_layers,
+            # layer_idx is still an argument
+            functools.partial(DeepseekV2DecoderLayer,
+                              config,
+                              cache_config=cache_config,
+                              quant_config=quant_config),


this lambda function will have prefix= shortly after #6515 .

vllm/model_executor/models/deepseek_v2.py

vllm/model_executor/models/utils.py

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

youkaichao

thanks for addressing my comments! please test the correctness locally.

tjohnson31415 · 2024-07-23T19:18:19Z

please test the correctness locally.

I ran the updated test_pipeline_parallel.py pytest tests locally with the deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct model. It took a few tries to run through the whole suite. A couple times 1 test failed with RuntimeError: Server exited unexpectedly., but a different test each time. Rerunning the failed test by itself it would pass. The third time I got all 10 to pass:

====================================================== 10 passed, 1 warning in 764.67s (0:12:44) ======================================================

youkaichao · 2024-07-23T19:21:56Z

Thanks, that might be caused by the flakiness of pp tests. I'll merge as this PR looks good to me now.

Thanks for your contribution!

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Alvant <alvasian@yandex.ru>

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

tjohnson31415 force-pushed the deepseek-v2-pp branch 2 times, most recently from 568c1d9 to 2522798 Compare July 18, 2024 16:21

youkaichao reviewed Jul 18, 2024

View reviewed changes

tjohnson31415 force-pushed the deepseek-v2-pp branch from 2522798 to c83350f Compare July 19, 2024 22:10

youkaichao reviewed Jul 20, 2024

View reviewed changes

vllm/model_executor/models/deepseek_v2.py Show resolved Hide resolved

youkaichao reviewed Jul 20, 2024

View reviewed changes

vllm/model_executor/models/utils.py Outdated Show resolved Hide resolved

tjohnson31415 added 3 commits July 23, 2024 11:21

feat: pipeline parallel support for DeepSeek v2

1e8a08c

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

refactor: parse layer_idx from prefix

aef64ee

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

refactor: remove unused layer_idx parameter from DeepseekV2Attention

f22cb28

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

tjohnson31415 force-pushed the deepseek-v2-pp branch from c83350f to f22cb28 Compare July 23, 2024 17:51

lint: fix linting errors

455ef74

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

youkaichao approved these changes Jul 23, 2024

View reviewed changes

youkaichao merged commit 507ef78 into vllm-project:main Jul 23, 2024
27 checks passed

tjohnson31415 deleted the deepseek-v2-pp branch July 23, 2024 20:15

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Model] Pipeline Parallel Support for DeepSeek v2 (vllm-project#6519)

2fb0766

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Model] Pipeline Parallel Support for DeepSeek v2 (vllm-project#6519)

aff3158

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 24, 2024

[Model] Pipeline Parallel Support for DeepSeek v2 (vllm-project#6519)

af1eaf2

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

dtrifiro mentioned this pull request Aug 5, 2024

Sync with upstream@v0.5.4-7-g9118217f opendatahub-io/vllm#120

Closed

cduk pushed a commit to cduk/vllm-pascal that referenced this pull request Aug 6, 2024

[Model] Pipeline Parallel Support for DeepSeek v2 (vllm-project#6519)

4c7f401

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024

[Model] Pipeline Parallel Support for DeepSeek v2 (vllm-project#6519)

0ec9a60

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Model] Pipeline Parallel Support for DeepSeek v2 (vllm-project#6519)

7c12307

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Alvant <alvasian@yandex.ru>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[Model] Pipeline Parallel Support for DeepSeek v2 (vllm-project#6519)

b962e84

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Pipeline Parallel Support for DeepSeek v2 #6519

[Model] Pipeline Parallel Support for DeepSeek v2 #6519

tjohnson31415 commented Jul 17, 2024

github-actions bot commented Jul 17, 2024

youkaichao commented Jul 17, 2024

tjohnson31415 commented Jul 18, 2024

tjohnson31415 commented Jul 18, 2024

youkaichao Jul 18, 2024

youkaichao left a comment

tjohnson31415 commented Jul 23, 2024

youkaichao commented Jul 23, 2024

[Model] Pipeline Parallel Support for DeepSeek v2 #6519

[Model] Pipeline Parallel Support for DeepSeek v2 #6519

Conversation

tjohnson31415 commented Jul 17, 2024

github-actions bot commented Jul 17, 2024

youkaichao commented Jul 17, 2024

tjohnson31415 commented Jul 18, 2024

tjohnson31415 commented Jul 18, 2024

youkaichao Jul 18, 2024

Choose a reason for hiding this comment

youkaichao left a comment

Choose a reason for hiding this comment

tjohnson31415 commented Jul 23, 2024

youkaichao commented Jul 23, 2024