Skip to content

Commit

Permalink
[docs] Update ascend910b-support.md (#816)
Browse files Browse the repository at this point in the history
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
  • Loading branch information
windsonsea authored Jan 21, 2025
1 parent 1e23508 commit 759be1c
Show file tree
Hide file tree
Showing 2 changed files with 64 additions and 43 deletions.
44 changes: 26 additions & 18 deletions docs/ascend910b-support.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
## Introduction
# Introduction to huawei.com/Ascend910 support

**We now support huawei.com/Ascend910 by implementing most device-sharing features as nvidia-GPU**, including:
**HAMi now supports huawei.com/Ascend910 by implementing most device-sharing features as nvidia-GPU**, including:

***NPU sharing***: Each task can allocate a portion of Ascend NPU instead of a whole NLU card, thus NPU can be shared among multiple tasks.
* **_NPU sharing_**: Each task can allocate a portion of Ascend NPU instead of a whole NLU card, thus NPU can be shared among multiple tasks.

***Device Memory Control***: Ascend NPUs can be allocated with certain device memory size and guarantee it that it does not exceed the boundary.

***Device Core Control***: Ascend NPUs can be allocated with certain compute cores and guarantee it that it does not exceed the boundary.
* **_Device Memory Control_**: Ascend NPUs can be allocated with certain device memory size and guarantee it that it does not exceed the boundary.

* **_Device Core Control_**: Ascend NPUs can be allocated with certain compute cores and guarantee it that it does not exceed the boundary.

## Prerequisites

Expand All @@ -20,29 +19,33 @@
* Install the chart using helm, See 'enabling vGPU support in kubernetes' section [here](https://github.com/Project-HAMi/HAMi#enabling-vgpu-support-in-kubernetes)

* Tag Ascend-910B node with the following command
```

```bash
kubectl label node {ascend-node} accelerator=huawei-Ascend910
```

* Install [Ascend docker runtime](https://gitee.com/ascend/ascend-docker-runtime)

* Download yaml for Ascend-vgpu-device-plugin from HAMi Project [here](https://github.com/Project-HAMi/ascend-device-plugin/blob/master/build/ascendplugin-910-hami.yaml), and deploy

```
```bash
wget https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/master/build/ascendplugin-910-hami.yaml
kubectl apply -f ascendplugin-910-hami.yaml
```

## Custom ascend share configuration

HAMi currently has a [built-in share configuration](https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/templates/scheduler/device-configmap.yaml) for ascend.

You can customize the ascend share configuration by following the steps below:

<details>
<summary>customize ascend config</summary>

### Create a new directory files in hami charts, the directory structure is as follows

### Create a new directory in hami charts

The directory structure is as follows:

```bash
tree -L 1
.
Expand All @@ -52,11 +55,13 @@ You can customize the ascend share configuration by following the steps below:
└── values.yaml
```

### Create the device-config.yaml file, the content is as follows
### Create device-config.yaml

The content is as follows:

```yaml
vnpus:
- chipName: 910B
- chipName: 910B
commonWord: Ascend910A
resourceName: huawei.com/Ascend910A
resourceMemoryName: huawei.com/Ascend910A-memory
Expand All @@ -76,7 +81,7 @@ You can customize the ascend share configuration by following the steps below:
- name: vir16
memory: 17476
aiCore: 16
- chipName: 910B3
- chipName: 910B3
commonWord: Ascend910B
resourceName: huawei.com/Ascend910B
resourceMemoryName: huawei.com/Ascend910B-memory
Expand All @@ -93,7 +98,7 @@ You can customize the ascend share configuration by following the steps below:
memory: 32768
aiCore: 10
aiCPU: 3
- chipName: 310P3
- chipName: 310P3
commonWord: Ascend310P
resourceName: huawei.com/Ascend310P
resourceMemoryName: huawei.com/Ascend310P-memory
Expand All @@ -115,16 +120,19 @@ You can customize the ascend share configuration by following the steps below:
aiCore: 4
aiCPU: 4
```
### Helm installation and updates will be based on the configuration in this file, overwriting the built-in configuration of Helm
</details>
### Install and update with Helm
Helm installation and updates will be based on the configuration in this file, overwriting the built-in configuration of Helm.
</details>
## Running Ascend jobs
Ascend 910Bs can now be requested by a container
using the `huawei.com/ascend910` and `huawei.com/ascend910-memory` resource type:

```
```yaml
apiVersion: v1
kind: Pod
metadata:
Expand All @@ -146,4 +154,4 @@ spec:

1. Ascend-910B-sharing in init container is not supported.

2. `huawei.com/Ascend910-memory` only work when `huawei.com/Ascend910=1`.
1. `huawei.com/Ascend910-memory` only work when `huawei.com/Ascend910=1`.
63 changes: 38 additions & 25 deletions docs/ascend910b-support_cn.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,51 @@
## 简介
# huawei.com/Ascend910 支持简介

本组件支持复用华为升腾910B设备,并为此提供以下几种与vGPU类似的复用功能,包括:
HAMi 支持复用华为升腾 910B 设备,并为此提供以下几种与 vGPU 类似的复用功能,包括:

*** NPU 共享***: 每个任务可以只占用一部分显卡,多个任务可以共享一张显卡
* **_NPU 共享_**: 每个任务可以只占用一部分显卡,多个任务可以共享一张显卡

***可限制分配的显存大小***: 你现在可以用显存值(例如3000M)来分配NPU,本组件会确保任务使用的显存不会超过分配数值
* **_可限制分配的显存大小_**: 你现在可以用显存值(例如 3000M)来分配 NPU,本组件会确保任务使用的显存不会超过分配数值

***可限制分配的算力大小***: 你现在可以用百分比来分配 NPU的算力,本组件会确保任务使用的算力不会超过分配数值
* **_可限制分配的算力大小_**: 你现在可以用百分比来分配 NPU 的算力,本组件会确保任务使用的算力不会超过分配数值

## 节点需求

* Ascend docker runtime
* driver version > 24.1.rc1
* Ascend device type: 910B,910B3,310P

## 开启NPU复用
## 开启 NPU 复用

* 通过helm部署本组件, 参照[主文档中的开启vgpu支持章节](https://github.com/Project-HAMi/HAMi/blob/master/README_cn.md#kubernetes开启vgpu支持)
* 通过 helm 部署本组件, 参照[主文档中的开启 vGPU 支持章节](https://github.com/Project-HAMi/HAMi/blob/master/README_cn.md#kubernetes开启vgpu支持)

* 使用以下指令,为Ascend 910B所在节点打上label
```
* 使用以下指令,为 Ascend 910B 所在节点打上 label

```bash
kubectl label node {ascend-node} accelerator=huawei-Ascend910
```

* 部署[Ascend docker runtime](https://gitee.com/ascend/ascend-docker-runtime)

* 从HAMi项目中获取并安装[ascend-device-plugin](https://github.com/Project-HAMi/ascend-device-plugin/blob/master/build/ascendplugin-910-hami.yaml),并进行部署
* 从 HAMi 项目中获取并安装[ascend-device-plugin](https://github.com/Project-HAMi/ascend-device-plugin/blob/master/build/ascendplugin-910-hami.yaml),并进行部署

```
```bash
wget https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/master/build/ascendplugin-910-hami.yaml
kubectl apply -f ascendplugin-910-hami.yaml
```

## 自定义 NPU 虚拟化参数

HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/templates/scheduler/device-configmap.yaml).

当然 HAMi 也支持通过以下方式自定义虚拟化参数:

<details>
<summary>自定义配置</summary>

### 在 HAMi charts 创建 files 的目录,创建后的目录架构应为如下所示

### 在 HAMi charts 创建 files 的目录

创建后的目录架构应为如下所示:

```bash
tree -L 1
.
Expand All @@ -50,11 +55,13 @@ HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Projec
└── values.yaml
```

### 在 files 目录下创建 Create the device-config.yaml 文件,配置文件如下所示, 可以按需调整
### 在 files 目录下创建 device-config.yaml

配置文件如下所示,可以按需调整:

```yaml
vnpus:
- chipName: 910B
- chipName: 910B
commonWord: Ascend910A
resourceName: huawei.com/Ascend910A
resourceMemoryName: huawei.com/Ascend910A-memory
Expand All @@ -74,7 +81,7 @@ HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Projec
- name: vir16
memory: 17476
aiCore: 16
- chipName: 910B3
- chipName: 910B3
commonWord: Ascend910B
resourceName: huawei.com/Ascend910B
resourceMemoryName: huawei.com/Ascend910B-memory
Expand All @@ -91,7 +98,7 @@ HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Projec
memory: 32768
aiCore: 10
aiCPU: 3
- chipName: 310P3
- chipName: 310P3
commonWord: Ascend310P
resourceName: huawei.com/Ascend310P
resourceMemoryName: huawei.com/Ascend310P-memory
Expand All @@ -113,13 +120,19 @@ HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Projec
aiCore: 4
aiCPU: 4
```
### Helm 安装、更新将基于该配置文件,覆盖默认的配置文件
### Helm 安装和更新
Helm 安装、更新将基于该配置文件,覆盖默认的配置文件
</details>
## 运行 NPU 任务
## 运行NPU任务
现在使用 `huawei.com/ascend910` 和 `huawei.com/ascend910-memory` 资源类型,
可以通过容器来请求 Ascend 910B:

```
```yaml
apiVersion: v1
kind: Pod
metadata:
Expand All @@ -131,14 +144,14 @@ spec:
command: ["bash", "-c", "sleep 86400"]
resources:
limits:
huawei.com/Ascend910: 1 # requesting 1 vGPUs
huawei.com/Ascend910-memory: 2000 # requesting 2000m device memory
huawei.com/Ascend910: 1 # 请求 1 个 vGPU
huawei.com/Ascend910-memory: 2000 # 请求 2000m 设备内容
```

## 注意事项

1. 目前Ascend910B设备,只支持2种粒度的切分,分别是1/4卡和1/2卡,分配的显存会自动对齐到在分配额之上最近的粒度上
1. 目前 Ascend910B 设备,只支持 2 种粒度的切分,分别是 1/4 卡和 1/2 卡,分配的显存会自动对齐到在分配额之上最近的粒度上

2. 在init container中无法使用NPU复用功能
2. 在 init container 中无法使用 NPU 复用功能

3. 只有申请单MLU的任务可以指定显存`Ascend910-memory`的数值,若申请的NPU数量大于1,则所有申请的NPU都会被整卡分配
3. 只有申请单 MLU 的任务可以指定显存 `Ascend910-memory` 的数值,若申请的 NPU 数量大于 1,则所有申请的 NPU 都会被整卡分配

0 comments on commit 759be1c

Please sign in to comment.