Skip to content

Commit

Permalink
[kunlunxin] transformer model, fix running error (FlagOpen#337)
Browse files Browse the repository at this point in the history
* [kunlunxin] transformer model, fix running error

xacc args
install dlloger
compatibility with newer numpy
xpu1*1, xpu2*8 config

* [kunlunxin] update property info of transformer-pytorch with kunlunxin

* [kunlunxin] transformer model, remove save/load checkpoint

* [kunlunxin] transformer model, update README

---------

Co-authored-by: chenrui22 <chenrui22@baidu.com>
Co-authored-by: Zhou Yu <zycosmos@gmail.com>
  • Loading branch information
3 people authored Dec 15, 2023
1 parent ced2419 commit 25d4d36
Show file tree
Hide file tree
Showing 8 changed files with 49 additions and 15 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def write_longs(f, a):
3: np.int16,
4: np.int32,
5: np.int64,
6: np.float,
6: np.float32,
7: np.double,
}

Expand Down Expand Up @@ -173,7 +173,7 @@ class IndexedDatasetBuilder(object):
np.int16: 2,
np.int32: 4,
np.int64: 8,
np.float: 4,
np.float32: 4,
np.double: 8
}

Expand Down
2 changes: 0 additions & 2 deletions training/benchmarks/transformer/pytorch/train/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ def __init__(self, driver: Driver, adapter, evaluator: Evaluator,
super(Trainer, self).__init__(self.config, self.model)

def init(self, train_dataloader):
load_checkpoint(self.config, self, train_dataloader)
# Send a dummy batch to warm the caching allocator
src_dict, tgt_dict = data_utils.load_dictionaries(self.config)
add_extra_items_to_checkpoint({'src_dict': src_dict, 'tgt_dict': tgt_dict})
Expand Down Expand Up @@ -107,7 +106,6 @@ def train_one_epoch(self, train_dataloader, valid_dataloader):
state.converged_success()

trainer.lr_step(epoch_itr.epoch, state.valid_loss)
save_checkpoint(args, trainer, epoch_itr, state.valid_loss)
torch.cuda.synchronize()
driver.event(Event.EPOCH_END, state.epoch)

Expand Down
42 changes: 31 additions & 11 deletions training/kunlunxin/transformer-pytorch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,35 @@
- OS版本:Ubuntu 20.04
- OS kernel版本: 5.4.0-26-generic
- 加速卡驱动版本:4.0.25
- Docker镜像和版本:xmlir/xmlir_ubuntu_2004_x86_64:v0.24
- 训练框架版本:xmlir+a28ac56f
- 依赖软件版本:pytorch-1.12.1+cpu
- Docker镜像和版本:pytorch1.12.1-cpu-ubuntu20.04:v0.01
- 训练框架版本:XPyTorch 1.12.1

#### 运行情况

* 通用指标

| 指标名称 | 指标值 | 特殊说明 |
| -------------- | ------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| 任务类别 | Language Modelling && LLM | |
| 模型 | Transformer | |
| 数据集 | WMT14 | http://statmt.org/wmt14/translation-task.html#Download |
| 数据精度 | precision,见“性能指标” | 可选fp32 |
| 超参修改 | fix_hp,见“性能指标” | 跑满硬件设备评测吞吐量所需特殊超参 |
| 硬件设备简称 | R300 | |
| 硬件存储使用 | mem,见“性能指标” | 通常称为“显存”,单位为GiB |
| 端到端时间 | e2e_time,见“性能指标” | 总时间+Perf初始化等时间 |
| 总吞吐量 | p_whole,见“性能指标” | 实际训练样本数除以总时间(performance_whole) |
| 训练吞吐量 | p_train,见“性能指标” | 不包含每个epoch末尾的评估部分耗时 |
| 计算吞吐量 | p_core,见“性能指标” | 不包含数据IO部分的耗时(p3>p2>p1) |
| 训练结果 | bleu,见“性能指标” | BLEU (BiLingual Evaluation Understudy) 是一种自动评估机器翻译文本的指标,用于衡量机器翻译文本与一组高质量参考翻译的相似度。 |
| 额外修改项 || |


* 性能指标

| 配置 | precision | fix_hp | e2e_time | p_whole | p_train | p_core | final_bleu | mem |
| ------------------- | --------- | ------ | -------- | ------- | ------- | ------ | ----------- | --------- |
| R300单机单卡(1x1) | fp32 | | | | | | | 30.5/32.0 |
| R300单机8卡(1x8) | fp32 | | | | | | 27.07 | 26.7/32.0 |
| R300两机8卡(2x8) | fp32 | | | | | | | 27.4/32.0 |


### 运行情况

| 训练资源 | 配置文件 | 运行时长(s) | 目标精度 | 收敛精度 | Steps数 | 性能(tokens/s) |
| -------- | --------------- | ----------- | -------- | -------- | ------- | ---------------- |
| 单机8卡 | config_R300x1x8 | | 27.0 | 27.27 | 24370 | |

[官方精度](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Translation/Transformer#training-performance-nvidia-dgx-a100-8x-a100-40gb)为27.92,按照[官方配置](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Translation/Transformer#training-performance-nvidia-dgx-a100-8x-a100-40gb),训完得到的精度为27.08
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from config_common import *

max_tokens = 8192
max_epoch = 30
max_update = 6500
lr = [0.000846]
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from config_common import *

max_tokens = 8192
max_epoch = 30
max_update = 3000
lr = [0.000846]
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
export XACC=1
export XACC_ARGS="-L O0 -Lamp"
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
sacrebleu
dllogger
1 change: 1 addition & 0 deletions training/run_benchmarks/config/test_conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@
# "distilbert:pytorch:R300:1:8:1": "/raid/dataset/distilbert/",
# "swin_transformer:pytorch:R300:1:8:1": "/raid/dataset/ImageNet_1k_2012/",
# "tacotron2:pytorch:R300:1:8:1": "/raid/dataset/tacotron2/LJSpeech/",
# "transformer:pytorch:R300:1:8:1": "/raid/dataset/transformer/wmt14_en_de_joined_dict",
# "bigtransfer:pytorch:R300:1:8:1": "/raid/dataset/ImageNet_1k_2012/",

# mthreads cases
Expand Down

0 comments on commit 25d4d36

Please sign in to comment.