From 759be1c69d083ade38dccd0076ab0745154bec18 Mon Sep 17 00:00:00 2001 From: Michael Yao Date: Tue, 21 Jan 2025 10:25:16 +0800 Subject: [PATCH] [docs] Update ascend910b-support.md (#816) Signed-off-by: windsonsea --- docs/ascend910b-support.md | 44 ++++++++++++++---------- docs/ascend910b-support_cn.md | 63 +++++++++++++++++++++-------------- 2 files changed, 64 insertions(+), 43 deletions(-) diff --git a/docs/ascend910b-support.md b/docs/ascend910b-support.md index 6c970ff5d..4b33f9a27 100644 --- a/docs/ascend910b-support.md +++ b/docs/ascend910b-support.md @@ -1,13 +1,12 @@ -## Introduction +# Introduction to huawei.com/Ascend910 support -**We now support huawei.com/Ascend910 by implementing most device-sharing features as nvidia-GPU**, including: +**HAMi now supports huawei.com/Ascend910 by implementing most device-sharing features as nvidia-GPU**, including: -***NPU sharing***: Each task can allocate a portion of Ascend NPU instead of a whole NLU card, thus NPU can be shared among multiple tasks. +* **_NPU sharing_**: Each task can allocate a portion of Ascend NPU instead of a whole NLU card, thus NPU can be shared among multiple tasks. -***Device Memory Control***: Ascend NPUs can be allocated with certain device memory size and guarantee it that it does not exceed the boundary. - -***Device Core Control***: Ascend NPUs can be allocated with certain compute cores and guarantee it that it does not exceed the boundary. +* **_Device Memory Control_**: Ascend NPUs can be allocated with certain device memory size and guarantee it that it does not exceed the boundary. +* **_Device Core Control_**: Ascend NPUs can be allocated with certain compute cores and guarantee it that it does not exceed the boundary. ## Prerequisites @@ -20,7 +19,8 @@ * Install the chart using helm, See 'enabling vGPU support in kubernetes' section [here](https://github.com/Project-HAMi/HAMi#enabling-vgpu-support-in-kubernetes) * Tag Ascend-910B node with the following command -``` + +```bash kubectl label node {ascend-node} accelerator=huawei-Ascend910 ``` @@ -28,12 +28,13 @@ kubectl label node {ascend-node} accelerator=huawei-Ascend910 * Download yaml for Ascend-vgpu-device-plugin from HAMi Project [here](https://github.com/Project-HAMi/ascend-device-plugin/blob/master/build/ascendplugin-910-hami.yaml), and deploy -``` +```bash wget https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/master/build/ascendplugin-910-hami.yaml kubectl apply -f ascendplugin-910-hami.yaml ``` ## Custom ascend share configuration + HAMi currently has a [built-in share configuration](https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/templates/scheduler/device-configmap.yaml) for ascend. You can customize the ascend share configuration by following the steps below: @@ -41,8 +42,10 @@ You can customize the ascend share configuration by following the steps below:
customize ascend config - ### Create a new directory files in hami charts, the directory structure is as follows - + ### Create a new directory in hami charts + + The directory structure is as follows: + ```bash tree -L 1 . @@ -52,11 +55,13 @@ You can customize the ascend share configuration by following the steps below: └── values.yaml ``` - ### Create the device-config.yaml file, the content is as follows + ### Create device-config.yaml + + The content is as follows: ```yaml vnpus: -- chipName: 910B + - chipName: 910B commonWord: Ascend910A resourceName: huawei.com/Ascend910A resourceMemoryName: huawei.com/Ascend910A-memory @@ -76,7 +81,7 @@ You can customize the ascend share configuration by following the steps below: - name: vir16 memory: 17476 aiCore: 16 -- chipName: 910B3 + - chipName: 910B3 commonWord: Ascend910B resourceName: huawei.com/Ascend910B resourceMemoryName: huawei.com/Ascend910B-memory @@ -93,7 +98,7 @@ You can customize the ascend share configuration by following the steps below: memory: 32768 aiCore: 10 aiCPU: 3 -- chipName: 310P3 + - chipName: 310P3 commonWord: Ascend310P resourceName: huawei.com/Ascend310P resourceMemoryName: huawei.com/Ascend310P-memory @@ -115,16 +120,19 @@ You can customize the ascend share configuration by following the steps below: aiCore: 4 aiCPU: 4 ``` - ### Helm installation and updates will be based on the configuration in this file, overwriting the built-in configuration of Helm -
+ ### Install and update with Helm + + Helm installation and updates will be based on the configuration in this file, overwriting the built-in configuration of Helm. + + ## Running Ascend jobs Ascend 910Bs can now be requested by a container using the `huawei.com/ascend910` and `huawei.com/ascend910-memory` resource type: -``` +```yaml apiVersion: v1 kind: Pod metadata: @@ -146,4 +154,4 @@ spec: 1. Ascend-910B-sharing in init container is not supported. -2. `huawei.com/Ascend910-memory` only work when `huawei.com/Ascend910=1`. \ No newline at end of file +1. `huawei.com/Ascend910-memory` only work when `huawei.com/Ascend910=1`. diff --git a/docs/ascend910b-support_cn.md b/docs/ascend910b-support_cn.md index 291b36ac7..9544ee17b 100644 --- a/docs/ascend910b-support_cn.md +++ b/docs/ascend910b-support_cn.md @@ -1,12 +1,12 @@ -## 简介 +# huawei.com/Ascend910 支持简介 -本组件支持复用华为升腾910B设备,并为此提供以下几种与vGPU类似的复用功能,包括: +HAMi 支持复用华为升腾 910B 设备,并为此提供以下几种与 vGPU 类似的复用功能,包括: -*** NPU 共享***: 每个任务可以只占用一部分显卡,多个任务可以共享一张显卡 +* **_NPU 共享_**: 每个任务可以只占用一部分显卡,多个任务可以共享一张显卡 -***可限制分配的显存大小***: 你现在可以用显存值(例如3000M)来分配NPU,本组件会确保任务使用的显存不会超过分配数值 +* **_可限制分配的显存大小_**: 你现在可以用显存值(例如 3000M)来分配 NPU,本组件会确保任务使用的显存不会超过分配数值 -***可限制分配的算力大小***: 你现在可以用百分比来分配 NPU的算力,本组件会确保任务使用的算力不会超过分配数值 +* **_可限制分配的算力大小_**: 你现在可以用百分比来分配 NPU 的算力,本组件会确保任务使用的算力不会超过分配数值 ## 节点需求 @@ -14,33 +14,38 @@ * driver version > 24.1.rc1 * Ascend device type: 910B,910B3,310P -## 开启NPU复用 +## 开启 NPU 复用 -* 通过helm部署本组件, 参照[主文档中的开启vgpu支持章节](https://github.com/Project-HAMi/HAMi/blob/master/README_cn.md#kubernetes开启vgpu支持) +* 通过 helm 部署本组件, 参照[主文档中的开启 vGPU 支持章节](https://github.com/Project-HAMi/HAMi/blob/master/README_cn.md#kubernetes开启vgpu支持) -* 使用以下指令,为Ascend 910B所在节点打上label -``` +* 使用以下指令,为 Ascend 910B 所在节点打上 label + +```bash kubectl label node {ascend-node} accelerator=huawei-Ascend910 ``` * 部署[Ascend docker runtime](https://gitee.com/ascend/ascend-docker-runtime) -* 从HAMi项目中获取并安装[ascend-device-plugin](https://github.com/Project-HAMi/ascend-device-plugin/blob/master/build/ascendplugin-910-hami.yaml),并进行部署 +* 从 HAMi 项目中获取并安装[ascend-device-plugin](https://github.com/Project-HAMi/ascend-device-plugin/blob/master/build/ascendplugin-910-hami.yaml),并进行部署 -``` +```bash wget https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/master/build/ascendplugin-910-hami.yaml kubectl apply -f ascendplugin-910-hami.yaml ``` ## 自定义 NPU 虚拟化参数 + HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Project-HAMi/HAMi/blob/master/charts/hami/templates/scheduler/device-configmap.yaml). 当然 HAMi 也支持通过以下方式自定义虚拟化参数: +
自定义配置 - ### 在 HAMi charts 创建 files 的目录,创建后的目录架构应为如下所示 - + ### 在 HAMi charts 创建 files 的目录 + + 创建后的目录架构应为如下所示: + ```bash tree -L 1 . @@ -50,11 +55,13 @@ HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Projec └── values.yaml ``` - ### 在 files 目录下创建 Create the device-config.yaml 文件,配置文件如下所示, 可以按需调整 + ### 在 files 目录下创建 device-config.yaml + + 配置文件如下所示,可以按需调整: ```yaml vnpus: -- chipName: 910B + - chipName: 910B commonWord: Ascend910A resourceName: huawei.com/Ascend910A resourceMemoryName: huawei.com/Ascend910A-memory @@ -74,7 +81,7 @@ HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Projec - name: vir16 memory: 17476 aiCore: 16 -- chipName: 910B3 + - chipName: 910B3 commonWord: Ascend910B resourceName: huawei.com/Ascend910B resourceMemoryName: huawei.com/Ascend910B-memory @@ -91,7 +98,7 @@ HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Projec memory: 32768 aiCore: 10 aiCPU: 3 -- chipName: 310P3 + - chipName: 310P3 commonWord: Ascend310P resourceName: huawei.com/Ascend310P resourceMemoryName: huawei.com/Ascend310P-memory @@ -113,13 +120,19 @@ HAMi 目前有一个 NPU 内置[虚拟化配置文件](https://github.com/Projec aiCore: 4 aiCPU: 4 ``` - ### Helm 安装、更新将基于该配置文件,覆盖默认的配置文件 + + ### Helm 安装和更新 + + Helm 安装、更新将基于该配置文件,覆盖默认的配置文件 +
+## 运行 NPU 任务 -## 运行NPU任务 +现在使用 `huawei.com/ascend910` 和 `huawei.com/ascend910-memory` 资源类型, +可以通过容器来请求 Ascend 910B: -``` +```yaml apiVersion: v1 kind: Pod metadata: @@ -131,14 +144,14 @@ spec: command: ["bash", "-c", "sleep 86400"] resources: limits: - huawei.com/Ascend910: 1 # requesting 1 vGPUs - huawei.com/Ascend910-memory: 2000 # requesting 2000m device memory + huawei.com/Ascend910: 1 # 请求 1 个 vGPU + huawei.com/Ascend910-memory: 2000 # 请求 2000m 设备内容 ``` ## 注意事项 -1. 目前Ascend910B设备,只支持2种粒度的切分,分别是1/4卡和1/2卡,分配的显存会自动对齐到在分配额之上最近的粒度上 +1. 目前 Ascend910B 设备,只支持 2 种粒度的切分,分别是 1/4 卡和 1/2 卡,分配的显存会自动对齐到在分配额之上最近的粒度上 -2. 在init container中无法使用NPU复用功能 +2. 在 init container 中无法使用 NPU 复用功能 -3. 只有申请单MLU的任务可以指定显存`Ascend910-memory`的数值,若申请的NPU数量大于1,则所有申请的NPU都会被整卡分配 +3. 只有申请单 MLU 的任务可以指定显存 `Ascend910-memory` 的数值,若申请的 NPU 数量大于 1,则所有申请的 NPU 都会被整卡分配