Skip to content

Commit 1bfa341

Browse files
xuedinge233sjhloadams
authored
add Huawei Ascend NPU setup guide (#6445)
This PR adds the setup instructions for Huawei Ascend NPU. Please refer to the remainder of the guide for instructions on other devices. --------- Co-authored-by: sjh <sjh1270@163.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Logan Adams <loadams@microsoft.com>
1 parent 8ac42ed commit 1bfa341

File tree

3 files changed

+115
-1
lines changed

3 files changed

+115
-1
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ repos:
5959
# Do not check files that are automatically generated
6060
'--skip=docs/Gemfile.lock,tests/unit/gpt2-merges.txt,tests/unit/gpt2-vocab.json',
6161
'--ignore-regex=\\n', # Do not count the 'n' in an escaped newline as part of a word
62-
'--ignore-words-list=youn,unsupport,noe', # Word used in error messages that need rewording
62+
'--ignore-words-list=youn,unsupport,noe,cann', # Word used in error messages that need rewording
6363
--check-filenames,
6464
--check-hidden
6565
]

docs/_tutorials/accelerator-abstraction-interface.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ torch.distributed.init_process_group(get_accelerator().communication_backend_nam
8181
[Accelerator Setup Guide](accelerator-setup-guide.md) provides a guide on how to setup different accelerators for DeepSpeed. It also comes with simple example how to run deepspeed for different accelerators. The following guides are provided:
8282
1. Run DeepSpeed model on CPU
8383
2. Run DeepSpeed model on XPU
84+
3. Run DeepSpeed model on Huawei Ascend NPU
8485

8586
# Implement new accelerator extension
8687
It is possible to implement a new DeepSpeed accelerator extension to support new accelerator in DeepSpeed. An example to follow is _[Intel Extension For DeepSpeed](https://github.com/intel/intel-extension-for-deepspeed/)_. An accelerator extension contains the following components:

docs/_tutorials/accelerator-setup-guide.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ tags: getting-started
88
- [Introduction](#introduction)
99
- [Intel Architecture (IA) CPU](#intel-architecture-ia-cpu)
1010
- [Intel XPU](#intel-xpu)
11+
- [Huawei Ascend NPU](#huawei-ascend-npu)
1112

1213
# Introduction
1314
DeepSpeed supports different accelerators from different companies. Setup steps to run DeepSpeed on certain accelerators might be different. This guide allows user to lookup setup instructions for the accelerator family and hardware they are using.
@@ -132,3 +133,115 @@ accelerator: xpu
132133

133134
## More example for using DeepSpeed on Intel XPU
134135
Refer to https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.1.40/examples/gpu/inference/python/llm for more extensive guide.
136+
137+
138+
# Huawei Ascend NPU
139+
140+
DeepSpeed has been verified on the following Huawei Ascend NPU products:
141+
* Atlas 300T A2
142+
143+
## Installation steps for Huawei Ascend NPU
144+
145+
The following steps outline the process for installing DeepSpeed on an Huawei Ascend NPU:
146+
1. Install the Huawei Ascend NPU Driver and Firmware
147+
<details>
148+
<summary>Click to expand</summary>
149+
150+
Before proceeding with the installation, please download the necessary files from [Huawei Ascend NPU Driver and Firmware](https://www.hiascend.com/en/hardware/firmware-drivers/commercial?product=4&model=11).
151+
152+
The following instructions below are sourced from the [Ascend Community](https://www.hiascend.com/document/detail/en/canncommercial/700/quickstart/quickstart/quickstart_18_0002.html) (refer to the [Chinese version](https://www.hiascend.com/document/detail/zh/canncommercial/700/quickstart/quickstart/quickstart_18_0002.html)):
153+
154+
- Execute the following command to install the driver:
155+
```
156+
./Ascend-hdk-<soc_version>-npu-driver_x.x.x_linux-{arch}.run --full --install-for-all
157+
```
158+
159+
- Execute the following command to install the firmware:
160+
```
161+
./Ascend-hdk-<soc_version>-npu-firmware_x.x.x.x.X.run --full
162+
```
163+
</details>
164+
165+
2. Install CANN
166+
<details>
167+
<summary>Click to expand</summary>
168+
169+
Prior to installation, download the [CANN Toolkit](https://www.hiascend.com/en/software/cann/commercial).
170+
171+
- Install third-party dependencies.
172+
- Ubuntu (The operations are the same for Debian, UOS20, and Linux.)
173+
```
174+
apt-get install -y gcc g++ make cmake zlib1g zlib1g-dev openssl libsqlite3-dev libssl-dev libffi-dev unzip pciutils net-tools libblas-dev gfortran libblas3
175+
```
176+
- openEuler (The operations are the same for EulerOS, CentOS, and BC-Linux.)
177+
```
178+
yum install -y gcc gcc-c++ make cmake unzip zlib-devel libffi-devel openssl-devel pciutils net-tools sqlite-devel lapack-devel gcc-gfortran
179+
```
180+
- Install the required Python dependencies:
181+
```
182+
pip3 install attrs numpy decorator sympy cffi pyyaml pathlib2 psutil protobuf scipy requests absl-py wheel typing_extensions
183+
```
184+
- Install the CANN Toolkit.
185+
```
186+
./Ascend-cann-toolkit_x.x.x_linux-{arch}.run --install
187+
```
188+
</details>
189+
190+
3. Install PyTorch \
191+
`pip install torch torch_npu`
192+
193+
4. Install DeepSpeed \
194+
`pip install deepspeed`
195+
196+
You can view the installation results using the `ds_report` command, Here is an example:
197+
```
198+
--------------------------------------------------
199+
DeepSpeed C++/CUDA extension op report
200+
--------------------------------------------------
201+
NOTE: Ops not installed will be just-in-time (JIT) compiled at
202+
runtime if needed. Op compatibility means that your system
203+
meet the required dependencies to JIT install the op.
204+
--------------------------------------------------
205+
JIT compiled ops requires ninja
206+
ninja .................. [OKAY]
207+
--------------------------------------------------
208+
op name ................ installed .. compatible
209+
--------------------------------------------------
210+
deepspeed_not_implemented [NO] ....... [OKAY]
211+
async_io ............... [NO] ....... [OKAY]
212+
cpu_adagrad ............ [NO] ....... [OKAY]
213+
cpu_adam ............... [NO] ....... [OKAY]
214+
cpu_lion ............... [NO] ....... [OKAY]
215+
fused_adam ............. [NO] ....... [OKAY]
216+
transformer_inference .. [NO] ....... [OKAY]
217+
--------------------------------------------------
218+
DeepSpeed general environment info:
219+
torch install path ............... ['/root/miniconda3/envs/ds/lib/python3.10/site-packages/torch']
220+
torch version .................... 2.2.0
221+
deepspeed install path ........... ['/root/miniconda3/envs/ds/lib/python3.10/site-packages/deepspeed']
222+
deepspeed info ................... 0.14.4, unknown, unknown
223+
deepspeed wheel compiled w. ...... torch 2.2
224+
torch_npu install path ........... ['/root/miniconda3/envs/ds/lib/python3.10/site-packages/torch_npu']
225+
torch_npu version ................ 2.2.0
226+
ascend_cann version .............. 8.0.RC2.alpha002
227+
shared memory (/dev/shm) size .... 20.00 GB
228+
```
229+
230+
## How to launch DeepSpeed on Huawei Ascend NPU
231+
232+
To validate the Huawei Ascend NPU availability and if the accelerator is correctly chosen, here is an example(Huawei Ascend NPU detection is automatic starting with DeepSpeed v0.12.6):
233+
```
234+
>>> import torch
235+
>>> print('torch:',torch.__version__)
236+
torch: 2.2.0
237+
>>> import torch_npu
238+
>>> print('torch_npu:',torch.npu.is_available(),",version:",torch_npu.__version__)
239+
torch_npu: True ,version: 2.2.0
240+
>>> from deepspeed.accelerator import get_accelerator
241+
>>> print('accelerator:', get_accelerator()._name)
242+
accelerator: npu
243+
```
244+
245+
## Multi-card parallel training using Huawei Ascend NPU
246+
247+
To perform model training across multiple Huawei Ascend NPU cards using DeepSpeed, see the examples provided in [DeepSpeed Examples](https://github.com/microsoft/DeepSpeedExamples/blob/master/training/cifar/cifar10_deepspeed.py).

0 commit comments

Comments
 (0)