Skip to content

Commit

Permalink
update iluvatar cpm config (FlagOpen#210)
Browse files Browse the repository at this point in the history
* 1.update iluvatar cpm config.
2.update iluvatar sdk info.

* update cpm 1x1 2x8 mem info

* update cpm performance info
  • Loading branch information
forestlee95 authored and zhoujiamin01 committed Aug 31, 2023
1 parent 74f0aa2 commit a9295d0
Show file tree
Hide file tree
Showing 8 changed files with 48 additions and 87 deletions.
2 changes: 1 addition & 1 deletion training/benchmarks/cpm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,6 @@ Permission is hereby granted, free of charge, to any person obtaining a copy of
| | Pytorch |Paddle|TensorFlow2|
| ---- | ---- | ---- | ---- |
| Nvidia GPU ||N/A |N/A|

| 天数智芯 || N/A |N/A|


10 changes: 5 additions & 5 deletions training/iluvatar/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
- 加速卡型号: Iluvatar BI-V100 32G
- 软件
- OS kernel版本:
Linux 4.15.0-156-generic x86_64
Linux 5.4.0-148-generic x86_64
- Docker 版本:
20.10.8

Expand All @@ -38,17 +38,17 @@
- 核心软件信息
- AI框架&版本

torch: 1.10.2+corex.3.0.0
torch: 1.13.1+corex.3.1.0

- 其它软件版本

cuda: 10.2

corex: 3.0.0
corex: 3.1.0

torchtext: 0.11.2+corex.3.0.0
torchtext: 0.14.1+corex.3.1.0

apex: 0.1+corex.3.0.0
apex: 0.1+corex.3.1.0



Expand Down
70 changes: 34 additions & 36 deletions training/iluvatar/cpm-pytorch/README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,10 @@
### 模型信息
- 模型介绍
>中文预训练语言模型(CPM)是基于transformers 的自回归语言模型,其训练使用了100G中文数据,最大版本包含26亿参数,支持文本分类、文本生成。
>获取CPM论文了解更多
>[CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413)
- 模型代码来源

| repo | commmit_id | date |
| ------- | ----------- |----- |
| [CPM-1-Finetune](https://github.com/TsinghuaAI/CPM-1-Finetune) | c0d892185912b28f8efeaeb55905f3f4fb227e46|2021-10-17 21:53:00|

### 模型Checkpoint下载
> [下载页](https://model.baai.ac.cn/model-detail/100017)
文件及版本tab页下,pytorch_model.bin.
参数数:2.6B
参见[模型Checkpoint下载](../../benchmarks/cpm/README.md#模型checkpoint)


### 测试数据集下载
> Dataset : https://drive.google.com/drive/folders/1gL01xbFBcrgP0TmgOhJ_uplkeG-BCwvM
参见[测试数据集下载](../../benchmarks/cpm/README.md#测试数据集下载地址)

- 预处理
> 无需预处理

### 天数智芯 BI-V100 GPU配置与运行信息参考
#### 环境配置
Expand All @@ -28,24 +13,37 @@

- ##### 软件环境
- OS版本:Ubuntu 20.04
- OS kernel版本: 4.15.0-156-generic x86_64
- 加速卡驱动版本:3.0.0
- OS kernel版本: 5.4.0-148-generic x86_64
- 加速卡驱动版本:3.1.0
- Docker 版本:20.10.8
- 训练框架版本:torch-1.10.2+corex.3.0.0
- 训练框架版本:torch-1.13.1+corex.3.1.0
- 依赖软件版本:无


### 运行情况
| 训练资源 | 配置文件 | 运行时长(s) | 目标精度 | 收敛精度 | Steps数 | 性能(samples/s) |
| -------- | ------------------ | ---------- | ------- | ------- | ------- | --------------- |
| 单机1卡 | config_BI-V100x1x1 | 6200.51 | 0.8 | 0.8047 | 4375 |23.89 |
| 单机2卡 | config_BI-V100x1x2 | 5291.47 | 0.8 | 0.8015 | 3756 |47.52 |
| 单机4卡 | config_BI-V100x1x4 | 5035.74 | 0.8 | 0.8013 | 3454 |91.27 |
| 单机8卡 | config_BI-V100x1x8 | 5154.91 | 0.92 | 0.9203 | 3476 |178.51 |
| 两机8卡 | config_BI-V100x2x8 | pending | 0.92 | pending | pending |pending |

### 许可证

本项目基于Apache 2.0 license。

本项目部分代码基于MLCommons https://github.com/mlcommons/training_results_v1.0/tree/master/NVIDIA 实现。
#### 运行情况

* 通用指标

| 指标名称 | 指标值 | 特殊说明 |
| -------------- | ------------------------------ | ------------------------------------------- |
| 任务类别 | 文本分类、文本生成 | |
| 模型 | cpm | |
| 数据集 | CPM-Finetune-data | |
| 数据精度 | precision,见“性能指标” | 可选fp32/amp/fp16 |
| 超参修改 | fix_hp,见“性能指标” | 跑满硬件设备评测吞吐量所需特殊超参 |
| 硬件设备简称 | BI-V100 | |
| 硬件存储使用 | mem(actual/total),见“性能指标” | 通常称为“显存”,单位为GiB |
| 端到端时间 | e2e_time,见“性能指标” | 总时间+Perf初始化等时间 |
| 总吞吐量 | p_whole,见“性能指标” | 实际训练样本数除以总时间(performance_whole) |
| 训练吞吐量 | p_train,见“性能指标” | 不包含每个epoch末尾的评估部分耗时 |
| **计算吞吐量** | **p_core,见“性能指标”** | 不包含数据IO部分的耗时(p3>p2>p1) |
| 训练结果 | acc,见“性能指标” | 分类准确率(mlm_accuracy) |
| 额外修改项 || |

* 性能指标

| 配置 | precision | fix_hp | e2e_time | p_whole | p_train | p_core | acc | mem |
| ------------------- | --------- | ---------------- | -------- | ------- | ------- | ------ | ----- | --------- |
| BI-V100单机8卡(1x8) | fp16 | / | | | | | 0.92 | 10.7/32.0 |
| BI-V100单机8卡(1x8) | fp16 | bs=128,lr=0.0005 | | | | | 0.92 | 20.3/32.0|
| BI-V100单机单卡(1x1) | fp16 | bs=192,lr=0.0005 | | | | | | 29.1/32.0 |
| BI-V100两机8卡(2x8) | fp16 | bs=128,lr=0.0005 | | | | | | 20.4/32.0 |
3 changes: 2 additions & 1 deletion training/iluvatar/cpm-pytorch/config/config_BI-V100x1x1.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

train_batch_size = 32
eval_batch_size = train_batch_size
max_steps = 4000000
max_steps = 60000
max_samples_termination = 439126000

warmup = 0.2
Expand All @@ -19,3 +19,4 @@
eps: float = 1e-08

seed = 23333
training_event = None
21 changes: 0 additions & 21 deletions training/iluvatar/cpm-pytorch/config/config_BI-V100x1x2.py

This file was deleted.

21 changes: 0 additions & 21 deletions training/iluvatar/cpm-pytorch/config/config_BI-V100x1x4.py

This file was deleted.

1 change: 1 addition & 0 deletions training/iluvatar/cpm-pytorch/config/config_BI-V100x1x8.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@
eps: float = 1e-08

seed = 23333
training_event = None
7 changes: 5 additions & 2 deletions training/iluvatar/cpm-pytorch/config/config_BI-V100x2x8.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
from config_common import *

fp16 = True
ddp_type = "apex"
dist_backend = "nccl"
target_embedding_average = 0.92

gradient_accumulation_steps = 1

train_batch_size = 32
train_batch_size = 128
eval_batch_size = train_batch_size
max_steps = 4000
max_steps = 2000

warmup = 0.2
learning_rate = 0.0005
Expand All @@ -18,3 +19,5 @@
eps: float = 1e-08

seed = 23333
max_samples_termination = 43912600
training_event = None

0 comments on commit a9295d0

Please sign in to comment.