Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kunlunxin update glm config #236

Merged
merged 44 commits into from
Sep 7, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
c158781
glm_config
Aug 31, 2023
9eb7eb9
Merge branch 'main' into GLM-PR
GGuanl Aug 31, 2023
3f6afe7
fix_#1
GGuanl Sep 1, 2023
c48bd1b
glm-config_updated
GGuanl Sep 6, 2023
acbf4b6
glm-config-updated#2
GGuanl Sep 6, 2023
53b0999
glm_config-updated#2
GGuanl Sep 6, 2023
dd3c478
glm_config-#2
GGuanl Sep 6, 2023
7659bea
Update README.md
GGuanl Sep 6, 2023
f953f4e
Update README.md
GGuanl Sep 6, 2023
04d5bd9
Update README.md
GGuanl Sep 6, 2023
c038ab7
Update README.md
GGuanl Sep 6, 2023
046fdf7
Update README.md
GGuanl Sep 6, 2023
6b6fd85
Update README.md
GGuanl Sep 6, 2023
4f998e3
Update README.md
GGuanl Sep 6, 2023
629b37a
Update README.md
GGuanl Sep 6, 2023
be5eb37
Update pytorch_install.sh
GGuanl Sep 6, 2023
22eeefc
Create config_common
GGuanl Sep 6, 2023
0dee798
Update README.md
GGuanl Sep 6, 2023
c2f993f
Rename config_common to config_common.py
GGuanl Sep 7, 2023
ca34bb6
Update config_R300x2x8.py
GGuanl Sep 7, 2023
a9d02b8
Update config_R300x1x1.py
GGuanl Sep 7, 2023
972ed9f
Update config_R300x1x8.py
GGuanl Sep 7, 2023
f1714c5
Update README.md
GGuanl Sep 7, 2023
ace9fea
Update README.md
GGuanl Sep 7, 2023
c474b63
Update README.md
GGuanl Sep 7, 2023
9e30809
Update README.md
GGuanl Sep 7, 2023
015a751
Update README.md
GGuanl Sep 7, 2023
60dee8a
Update requirements.txt
GGuanl Sep 7, 2023
32eb6f1
Update README.md
GGuanl Sep 7, 2023
9d81ff1
Update config_R300x1x1.py
GGuanl Sep 7, 2023
7952993
Update config_R300x1x8.py
GGuanl Sep 7, 2023
27b48b5
Update config_R300x2x8.py
GGuanl Sep 7, 2023
c0e3ab4
Update config_R300x1x1.py
GGuanl Sep 7, 2023
7394352
Update config_R300x1x8.py
GGuanl Sep 7, 2023
186ae5d
Update config_R300x2x8.py
GGuanl Sep 7, 2023
2f204f5
Update config_common.py
GGuanl Sep 7, 2023
26acd75
Update config_R300x1x1.py
GGuanl Sep 7, 2023
13cc7be
Update config_R300x2x8.py
GGuanl Sep 7, 2023
d207087
Update README.md
GGuanl Sep 7, 2023
27033ba
Update README.md
GGuanl Sep 7, 2023
a44db3f
Update README.md
GGuanl Sep 7, 2023
bce5f61
Update config_R300x1x1.py
GGuanl Sep 7, 2023
42f761d
Update config_R300x1x8.py
GGuanl Sep 7, 2023
cda8284
Update config_R300x2x8.py
GGuanl Sep 7, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion training/kunlunxin/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,10 @@ R480-X8基于多芯片间高速互联技术,单机可提供高达1 Peta Ops @F
- OS版本:Ubuntu 20.04
- OS kernel版本: 5.4.0-26-generic
- 加速卡驱动版本:4.0.25
- Docker镜像和版本:pytorch1.12.1-cpu-ubuntu18.04:v0.04
- Docker镜像和版本:pytorch1.12.1-cpu-ubuntu20.04:v0.01
- 训练框架版本: xmlir+111e7d45[xmlir下载](https://bd.bcebos.com/klx-pytorch-ipipe-bd/flagperf/111e7d45/xacc-0.1.0-cp38-cp38-linux_x86_64.whl)
yuzhou03 marked this conversation as resolved.
Show resolved Hide resolved
- 训练编译器版本: xacc+111e7d45[xacc下载](https://bd.bcebos.com/klx-pytorch-ipipe-bd/flagperf/111e7d45/xmlir-0.0.1-cp38-cp38-linux_x86_64.whl)
- 依赖软件版本:pytorch-1.12.1+cpu

## 容器镜像信息
- 容器构建信息
Expand Down
1 change: 1 addition & 0 deletions training/kunlunxin/docker_image/pytorch/pytorch_install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,6 @@

set -xe


yuzhou03 marked this conversation as resolved.
Show resolved Hide resolved
pip install https://bd.bcebos.com/klx-pytorch-ipipe-bd/flagperf/latest/xacc-0.1.0-cp38-cp38-linux_x86_64.whl
pip install https://bd.bcebos.com/klx-pytorch-ipipe-bd/flagperf/latest/xmlir-0.0.1-cp38-cp38-linux_x86_64.whl
2 changes: 1 addition & 1 deletion training/kunlunxin/glm-pytorch/config/config_R300x1x1.py
yuzhou03 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
fp16 = False
Copy link
Contributor

@yuzhou03 yuzhou03 Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

冗余代码,fp16已经出现在common中了


train_batch_size = 4
GGuanl marked this conversation as resolved.
Show resolved Hide resolved
eval_batch_size = 6
eval_batch_size = 4

dist_backend = "xccl"

Expand Down
4 changes: 2 additions & 2 deletions training/kunlunxin/glm-pytorch/config/config_R300x1x8.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
vendor = 'kunlunxin'
fp16 = False

train_batch_size = 4
eval_batch_size = 6
train_batch_size = 5
GGuanl marked this conversation as resolved.
Show resolved Hide resolved
eval_batch_size = 5

dist_backend = "xccl"

Expand Down
2 changes: 1 addition & 1 deletion training/kunlunxin/glm-pytorch/config/config_R300x2x8.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
fp16 = False

train_batch_size = 4
GGuanl marked this conversation as resolved.
Show resolved Hide resolved
eval_batch_size = 6
eval_batch_size = 4

dist_backend = "xccl"
yuzhou03 marked this conversation as resolved.
Show resolved Hide resolved

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ export BKCL_TIMEOUT=1800
# when using tree allreduce, the number of nodes must be a multiple of 2
export BKCL_SOCKET_FORCE_TREE=1

export XMLIR_D_XPU_L3_SIZE=32505856

export BKCL_CCIX_RING=1
export BKCL_FORCE_SYNC=1

export ALLREDUCE_ASYNC=false
export ALLREDUCE_FUSION=0

Expand Down
3 changes: 3 additions & 0 deletions training/kunlunxin/glm-pytorch/config/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
h5sparse
boto3
h5py
numpy>=1.15.4
sentencepiece>=0.1.8
shh2000 marked this conversation as resolved.
Show resolved Hide resolved
jieba