Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Pipeline Parallel Support for DeepSeek v2 #6519

Merged
merged 4 commits into from
Jul 23, 2024

Conversation

tjohnson31415
Copy link
Contributor

Adds pipeline parallel support for DeepSeek v2.

Tested with https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct using --tensor-parallel-size 1 --pipeline-parallel-size 2

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only trigger fastcheck CI to run, which consists only a small and essential subset of tests to quickly catch errors with the flexibility to run extra individual tests on top (you can do this by unblocking test steps in the Buildkite run).

Full CI run is still required to merge this PR so once the PR is ready to go, please make sure to run it. If you need all test signals in between PR commits, you can trigger full CI as well.

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@youkaichao
Copy link
Member

@tjohnson31415
Copy link
Contributor Author

can you test the correctness locally, using https://github.com/vllm-project/vllm/blob/main/tests/distributed/test_pipeline_parallel.py ?

Sure. I edited the file to set the model to "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct"and added --trust-remote-code to the launch args and was able to see the test_pipeline_parallel tests passing:

$ pytest -s tests/distributed/test_pipeline_parallel.py
...
.INFO:     Shutting down


=============================== warnings summary ===============================
my-vllm/lib64/python3.11/site-packages/transformers/utils/hub.py:127
  /workspace/my-vllm/lib64/python3.11/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================== 5 passed, 1 warning in 393.58s (0:06:33) ===================
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [75971]
INFO 07-18 15:40:48 multiproc_worker_utils.py:136] Terminating local vLLM worker processes
(VllmWorkerProcess pid=76043) INFO 07-18 15:40:48 multiproc_worker_utils.py:237] Worker exiting
INFO 07-18 15:40:48 async_llm_engine.py:51] Engine is gracefully shutting down.
[rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
/usr/lib64/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@tjohnson31415 tjohnson31415 force-pushed the deepseek-v2-pp branch 2 times, most recently from 568c1d9 to 2522798 Compare July 18, 2024 16:21
@tjohnson31415
Copy link
Contributor Author

Rebased to resolve conflict from main. Reran the tests and they pass still.

Comment on lines 409 to 436
self.start_layer, self.end_layer, self.layers = make_layers(
config.num_hidden_layers,
# layer_idx is still an argument
functools.partial(DeepseekV2DecoderLayer,
config,
cache_config=cache_config,
quant_config=quant_config),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lambda function will have prefix= shortly after #6515 .

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for addressing my comments! please test the correctness locally.

@tjohnson31415
Copy link
Contributor Author

please test the correctness locally.

I ran the updated test_pipeline_parallel.py pytest tests locally with the deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct model. It took a few tries to run through the whole suite. A couple times 1 test failed with RuntimeError: Server exited unexpectedly., but a different test each time. Rerunning the failed test by itself it would pass. The third time I got all 10 to pass:

====================================================== 10 passed, 1 warning in 764.67s (0:12:44) ======================================================

@youkaichao
Copy link
Member

Thanks, that might be caused by the flakiness of pp tests. I'll merge as this PR looks good to me now.

Thanks for your contribution!

@youkaichao youkaichao merged commit 507ef78 into vllm-project:main Jul 23, 2024
27 checks passed
@tjohnson31415 tjohnson31415 deleted the deepseek-v2-pp branch July 23, 2024 20:15
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 24, 2024
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
cduk pushed a commit to cduk/vllm-pascal that referenced this pull request Aug 6, 2024
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Alvant <alvasian@yandex.ru>
KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants