Skip to content

Commit

Permalink
Gpt2 kunlunxin (#273)
Browse files Browse the repository at this point in the history
* Fit gpt2 on kunlunxin

* Add kunlunxin readme

* Refine task kind  kunlunxin readme

* Fix unit of p_whole in README.md

* Refine 1x1 config

---------

Co-authored-by: root <root@szzj-isa-ai-chip0.szzj.baidu.com>
  • Loading branch information
KungYork and root authored Oct 8, 2023
1 parent 274a51b commit f64df50
Show file tree
Hide file tree
Showing 8 changed files with 67 additions and 3 deletions.
4 changes: 2 additions & 2 deletions training/benchmarks/gpt2/pytorch/optimizer/clip_grads.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,9 @@ def clip_grad_norm_fp32(parameters, grads_for_norm,
# Multi-tensor applier takes a function and a list of list
# and performs the operation on that list all in one kernel.
if grads_for_norm:
grad_norm = torch.cuda.FloatTensor([item.norm() for item in grads_for_norm]).norm()
grad_norm = torch.FloatTensor([item.norm() for item in grads_for_norm]).norm().cuda()
else:
grad_norm = torch.cuda.FloatTensor([0])
grad_norm = torch.FloatTensor([0]).cuda()
# Since we will be summing across data parallel groups,
# we need the pow(norm-type).
total_norm = grad_norm ** norm_type
Expand Down
46 changes: 46 additions & 0 deletions training/kunlunxin/gpt2-pytorch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
### 测试数据集下载
[测试数据集下载](../../benchmarks/gpt2/README.md#测试数据集下载)

### 昆仑芯XPU配置与运行信息参考
#### 环境配置
- ##### 硬件环境
- 机器型号: 昆仑芯AI加速器组R480-X8
- 加速卡型号: 昆仑芯AI加速卡R300
- 多机网络类型、带宽: InfiniBand,200Gb/s

- ##### 软件环境
- OS版本:Ubuntu 20.04
- OS kernel版本: 5.4.0-26-generic
- 加速卡驱动版本:4.0.25
- Docker镜像和版本:pytorch1.12.1-cpu-ubuntu20.04:v0.01
- 训练框架版本:xmlir
- 训练编译器版本:xacc
- 依赖软件版本:pytorch-1.12.1+cpu

### 运行情况

* 通用指标

| 指标名称 | 指标值 | 特殊说明 |
| -------------- | ----------------------- | ------------------------------------------- |
| 任务类别 | Text2Text Generation | |
| 模型 | megatron-gpt2-345m | |
| 数据集 | lambada | |
| 数据精度 | precision,见“性能指标” | 可选fp32/amp/fp16 |
| 超参修改 | fix_hp,见“性能指标” | 跑满硬件设备评测吞吐量所需特殊超参 |
| 硬件设备简称 | R300 | |
| 硬件存储使用 | mem,见“性能指标” | 通常称为“显存”,单位为GiB |
| 端到端时间 | e2e_time,见“性能指标” | 总时间+Perf初始化等时间 |
| 总吞吐量 | p_whole,见“性能指标” | 实际训练样本数除以总时间(performance_whole) |
| 训练吞吐量 | p_train,见“性能指标” | 不包含每个epoch末尾的评估部分耗时 |
| **计算吞吐量** | **p_core,见“性能指标”** | 不包含数据IO部分的耗时(p3>p2>p1),单位为samples/s(seq_length=1024)|
| 训练结果 | lambada_acc,见“性能指标” | lambada任务准确率 | |

* 性能指标

| 配置 | precision | fix_hp | e2e_time | p_whole | p_train | p_core | lambada_acc | mem |
| ------------------- | --------- | ---------------- | -------- | ------- | ------- | ------ | ------- | --------- |
| R300单机单卡(1x1) | | | | | | | | |
| R300单机8卡(1x8) | fp32 | bs=32,lr=0.00015 | | | | | 0.60 | 20.7/32.0 |
| R300两机8卡(2x8) | | | | | | | | |

5 changes: 5 additions & 0 deletions training/kunlunxin/gpt2-pytorch/config/config_R300x1x1.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from config_common import *

train_batch_size = 2
max_steps = 369120
gradient_accumulation_steps = 8
4 changes: 4 additions & 0 deletions training/kunlunxin/gpt2-pytorch/config/config_R300x1x8.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from config_common import *

train_batch_size = 2
max_steps = 46140
3 changes: 3 additions & 0 deletions training/kunlunxin/gpt2-pytorch/config/config_R300x2x8.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from config_common import *

train_batch_size = 2
6 changes: 6 additions & 0 deletions training/kunlunxin/gpt2-pytorch/config/config_common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
vendor = 'kunlunxin'

# disable fp16
fp16 = False

dist_backend = "xccl"
Empty file.
2 changes: 1 addition & 1 deletion training/nvidia/gpt2-pytorch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

| 指标名称 | 指标值 | 特殊说明 |
| -------------- | ----------------------- | ------------------------------------------- |
| 任务类别 | 自然语言编码 | |
| 任务类别 | Text2Text Generation | |
| 模型 | megatron-gpt2-345m | |
| 数据集 | lambada | |
| 数据精度 | precision,见“性能指标” | 可选fp32/amp/fp16 |
Expand Down

0 comments on commit f64df50

Please sign in to comment.