Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Pipeline parallel support for Mixtral #6516

Merged
merged 4 commits into from
Jul 18, 2024

Conversation

comaniac
Copy link
Collaborator

@comaniac comaniac commented Jul 17, 2024

Take from #6403. Co-authored by @binxuan

cc @youkaichao

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only trigger fastcheck CI to run, which consists only a small and essential subset of tests to quickly catch errors with the flexibility to run extra individual tests on top (you can do this by unblocking test steps in the Buildkite run).

Full CI run is still required to merge this PR so once the PR is ready to go, please make sure to run it. If you need all test signals in between PR commits, you can trigger full CI as well.

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@comaniac comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 17, 2024
@comaniac comaniac force-pushed the mixtral-pp branch 2 times, most recently from f83603e to d74f2e6 Compare July 17, 2024 21:50
@comaniac
Copy link
Collaborator Author

Tested locally with PP=8 and worked.

@youkaichao
Copy link
Member

can you test the correctness locally, using https://github.com/vllm-project/vllm/blob/main/tests/distributed/test_pipeline_parallel.py ?

@comaniac
Copy link
Collaborator Author

Passed with the following configures. Note that I tested it on 8xL4 so I have to use 8 GPUs to host the model.

    "TP_SIZE, PP_SIZE, EAGER_MODE, CHUNKED_PREFILL, MODEL_NAME",
    [
        (2, 4, 0, 1, "mistralai/Mixtral-8x7B-Instruct-v0.1"),
        (2, 4, 1, 0, "mistralai/Mixtral-8x7B-Instruct-v0.1"),
        (1, 8, 0, 1, "mistralai/Mixtral-8x7B-Instruct-v0.1"),
        (1, 8, 1, 0, "mistralai/Mixtral-8x7B-Instruct-v0.1"),
    ])

Also fixed some issues in the test file:

  • Use TP_SIZE x PP_SIZE as the TP size of the reference. The current max(TP_SIZE, 2) doesn't work for larger models.
  • Do not use 0's as the token ID. This may generate random outputs for certain models/tokenizers.

Comment on lines 40 to 42
# Use the same number or at most 8 GPUs to hold the model.
# In this test we assume the model can fit in 8 GPUs.
str(min(TP_SIZE * PP_SIZE, 8)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not going to work. this will run in multi-node tests with mp backend, and we can use at most 2 GPUs.

you can revert this change, keep it only for your local testing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted with comments.

Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if tests pass

@youkaichao youkaichao merged commit b5af8c2 into vllm-project:main Jul 18, 2024
69 of 72 checks passed
@comaniac comaniac deleted the mixtral-pp branch July 18, 2024 02:27
fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 19, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
gnpinkert pushed a commit to gnpinkert/vllm that referenced this pull request Jul 26, 2024
gnpinkert pushed a commit to gnpinkert/vllm that referenced this pull request Jul 26, 2024
gnpinkert pushed a commit to gnpinkert/vllm that referenced this pull request Jul 27, 2024
gnpinkert pushed a commit to gnpinkert/vllm that referenced this pull request Aug 26, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants