-
Notifications
You must be signed in to change notification settings - Fork 617
add Deepseek-R1 tutorial. #4566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Gongdayao <gongdayao@foxmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new tutorial for deploying the DeepSeek-R1 model. The tutorial is comprehensive, covering environment setup, deployment on A2 and A3 series hardware, functional verification, and performance/accuracy evaluation. However, I've found a few critical issues in the documentation that could prevent users from successfully following the steps. These include a typo in a command-line argument, incomplete installation instructions, incorrect markdown syntax, and inconsistent model naming. Addressing these issues will significantly improve the quality and usability of the tutorial.
docs/source/tutorials/DeepSeek-R1.md
Outdated
| --host 0.0.0.0 \ | ||
| --port 8000 \ | ||
| --data-parallel-size 4 \ | ||
| --data-parallel-size_local 2 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ## Introduction | ||
|
|
||
| DeepSeek-R1 is a high-performance Mixture-of-Experts (MoE) large language model developed by DeepSeek Company. It excels in complex logical reasoning, mathematical problem-solving, and code generation. By dynamically activating its expert networks, it delivers exceptional performance while maintaining computational efficiency. Building upon R1, DeepSeek-R1-W8A8 is a fully quantized version of the model. It employs 8-bit integer (INT8) quantization for both weights and activations, which significantly reduces the model's memory footprint and computational requirements, enabling more efficient deployment and application in resource-constrained environments. | ||
| This article takes the deepseek- R1-w8a8 version as an example to introduce the deployment of the R1 series models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model name deepseek- R1-w8a8 is used here (with a space and lowercase w). However, the vllm serve commands (e.g., line 88) and the official model download link use DeepSeek-R1-W8A8 with an uppercase W. This inconsistency is present throughout the document and can lead to 'file not found' errors on case-sensitive filesystems. Please use a consistent naming convention, preferably DeepSeek-R1-W8A8.
|
|
||
| - Install `vllm-ascend` from source, refer to [installation](../installation.md). | ||
|
|
||
| - Install extra operator for supporting `DeepSeek-R1-w8a8`, refer to the above tab. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Gongdayao <gongdayao@foxmail.com>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Signed-off-by: Gongdayao <gongdayao@foxmail.com>
Signed-off-by: Gongdayao <gongdayao@foxmail.com>
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?