support different pytorch image and fix some bug #2

upvenly · 2023-05-18T09:11:02Z

No description provided.

…snet50-torch

* update doc * update doc * update doc * update doc * rm unnecessary file * rm unnecessary file

support cpm on xpu

port and shell

* 1.Fix some letter numbering errors. 2.add iluvatar mobilenetv2 case. * remove some unnecessary code. * update iluvatar mobilenetv2's performance information * remove some redundant code and configuration --------- Co-authored-by: sen.li <sen.li@iluvatar.ai> Co-authored-by: Zhou Yu <zycosmos@gmail.com>

…pen#72) * update doc * update doc * update doc * update * update

* init * add efficientnet * modify config * modify config * modify config * add efficientnet * modify config * add efficientnet * bug fix * add efficientnet * add efficientnet * fix code style * fix code style * fix code style * Revert "fix code style" This reverts commit ae86109. * fix code style * fix code style * fix code style * fix code style * fix code style --------- Co-authored-by: Feilei Du <dufeilei@foxmail.com>

* init * fix * fix readme * fix print * fix * fix * fix nvidia * fix cmt * rm req * rm foo code * fix gpu, rm foo amp * upd mutable

* add wav2vec2 * update * update according to review * rm wave2vec * update * update * update * rm unnecessary * rm unnecessary * add necessary file * rm necessary file * rm necessary file * rm necessary file * rm necessary file * rm necessary file * rm necessary file & optimize according to review comments * optimize according to review comments * optimize according to review comments * optimize according to review comments

* add iluvatar bert case. * add iluvatar bert README * remove redundant and commented out code * delete unused configuration * add vendor info in config_common.py * fix format conflicts --------- Co-authored-by: sen.li <sen.li@iluvatar.ai>

Co-authored-by: zhouyu <zhouyu@baai.ac.cn>

shh2000 · 2023-06-01T09:03:33Z

may update readme.md
FlagPerf使用CASES变量中的键（key）来索引相应模型（model，如bert），框架（framework，如pytorch），硬件类型（hardware_model，如A100）,主机数量（nnodes，如1），计算卡数量（nproc，如8），和重复测试次数（repeat，如1），以冒号:为分隔符，按照“model:framework:hardware_model:nnodes:nproc:repeat”的格式以字符串存储。键对应的值为运行这一样例对应数据/模型权重所在目录
例如，用户在目录/abc/def/data/存放了模型bert在框架pytorch下面运行的数据集与预训练权重，希望在2机8卡A100（共16卡）的环境上测试这一任务，重复3次取平均值，则需要在CASES中增加"bert:pytorch:A100:2:8:3":"/abc/def/data/"这一键值对。key中的bert为模型，pytorch为框架，A100为硬件类型，2为主机数量，8为每个主机上面的计算卡数量，3为重复次数，"abc/def/data/"为数据和权重的存放路径

shh2000 · 2023-06-01T09:06:42Z

training/nvidia/docker_image/pytorch_1.13/Dockerfile

+RUN /bin/bash -c "uname -a"
+RUN /bin/bash -c alias python3=python
+RUN pip install editdistance==0.6.0 librosa==0.8.0 pyarrow==6.0.1 pip install soundfile==0.10.3.post1 pip install sox==1.4.1 pip install tqdm==4.53.0
+RUN pip install git+https://github.com/NVIDIA/dllogger@v1.0.0#egg=dllogger


may change dllogger into nvidia/case/requirements

shh2000 · 2023-06-01T09:07:43Z

training/utils/container_manager.py

@@ -29,7 +29,7 @@ def run_new(self, container_run_args, docker_image):
        run_new_cmd = "docker run " + container_run_args + \
                      " --name=" + self.name + " \"" + docker_image + "\" " + \
                      "sleep infinity"
-        print(run_new_cmd)
+        print("run_new_cmd:", run_new_cmd)


这部分已经在PR logger中全部覆盖并修复了。这里可以不管这个文件的print

* init * add efficientnet * modify config * modify config * modify config * add efficientnet * modify config * add efficientnet * bug fix * add efficientnet * add efficientnet * fix code style * fix code style * fix code style * Revert "fix code style" This reverts commit ae86109. * fix code style * fix code style * fix code style * fix code style * fix code style * add kunlunxin case --------- Co-authored-by: Feilei Du <dufeilei@foxmail.com>

* init * add efficientnet * modify config * modify config * modify config * add efficientnet * modify config * add efficientnet * bug fix * add efficientnet * add efficientnet * fix code style * fix code style * fix code style * Revert "fix code style" This reverts commit ae86109. * fix code style * fix code style * fix code style * fix code style * fix code style * add vit * bug fix * bug fix * bug fix * bug fix * bug fix * fix code style * fix code style --------- Co-authored-by: Feilei Du <dufeilei@foxmail.com>

* ILUVATAR: add resnet50 pytorch * iluvatar:resnet50:pytorch add performence data * iluvatar:resnet50:pytorch add performence data * add vendor='iluvatar' * revert HOSTS in cluster_conf.py --------- Co-authored-by: 颜瑞 <rui.yan@iluvatar.com> Co-authored-by: 颜瑞 <rui.yan@iluvatar.ai>

Fix0711

* glm_config * fix_#1 * glm-config_updated * glm-config-updated#2 * glm_config-updated#2 * glm_config-#2 * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update pytorch_install.sh * Create config_common * Update README.md * Rename config_common to config_common.py * Update config_R300x2x8.py * Update config_R300x1x1.py * Update config_R300x1x8.py * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update requirements.txt * Update README.md * Update config_R300x1x1.py * Update config_R300x1x8.py * Update config_R300x2x8.py * Update config_R300x1x1.py * Update config_R300x1x8.py * Update config_R300x2x8.py * Update config_common.py * Update config_R300x1x1.py * Update config_R300x2x8.py * Update README.md * Update README.md * Update README.md * Update config_R300x1x1.py * Update config_R300x1x8.py * Update config_R300x2x8.py --------- Co-authored-by: guanlongjie <guanlongjie@MacBook-Pro.local>

zhouyu and others added 30 commits March 27, 2023 19:19

add ResNet50 pytorch case

0177690

set bs, target_acc1. format code

b17d563

clean code

7de56ab

merge main

791997c

ResNet50: add pytorch case

68b7e09

update requirements.txt

ed9ac47

merge main

6dbba6d

Resnet50: add Pytorch standard case

5b019c6

resolve conflicts

1ab46b5

update test_conf

43f0034

ResNet50: add PyTorch standard case

2589d02

update requirements.txt

df5f4ca

change params to torchvision params

4118bd9

merge upstream

5a3250e

merge main

408863a

resolve conflicts

6716180

ILUVATAR: set seed for iluvatar

7680826

Merge branch 'FlagOpen:main' into main

35db1e6

ILUVATAR: change torch.backends.cudnn.benchmark to true as default

d475158

process resuming from checkpoint for 1x1 training

3f840f7

Merge branch 'resnet50-torch' of github.com:yuzhou03/FlagPerf into re…

c8aaf3b

…snet50-torch

rm unnecessary file (FlagOpen#67)

9a555f7

* update doc * update doc * update doc * update doc * rm unnecessary file * rm unnecessary file

removed requirements.txt

0cc900e

remove fp16, warmup from configs

1bbe738

remove gradient_accumulation_steps from configs

f1c5f2d

remove training_evenv from configs

fdf7521

remove dist_backend from configs

7f2c902

Merge branch 'FlagOpen:main' into main

a339d49

add case README.md

795a2d9

modify config

5abc0f8

upvenly and others added 2 commits May 25, 2023 09:05

Merge pull request FlagOpen#76 from ScoThunder/cpm

d583c5c

support cpm on xpu

fix

c81d025

upvenly force-pushed the wwl/add_standard_wav2vec2 branch from b7924c3 to eb9ce98 Compare May 25, 2023 05:54

upvenly and others added 8 commits May 25, 2023 17:22

Merge pull request FlagOpen#74 from shh2000/fixport

2b3289a

port and shell

update doc for modifying checkpoint and performance principles (FlagO…

09fe46b

…pen#72) * update doc * update doc * update doc * update * update

Bit (FlagOpen#77)

8695001

* init * fix * fix readme * fix print * fix * fix * fix nvidia * fix cmt * rm req * rm foo code * fix gpu, rm foo amp * upd mutable

add iluvatar bert case. (FlagOpen#56)

4151af8

* add iluvatar bert case. * add iluvatar bert README * remove redundant and commented out code * delete unused configuration * add vendor info in config_common.py * fix format conflicts --------- Co-authored-by: sen.li <sen.li@iluvatar.ai>

refine format for run-pretraining.example (FlagOpen#88)

55001e2

Co-authored-by: zhouyu <zhouyu@baai.ac.cn>

shh2000 reviewed Jun 1, 2023

View reviewed changes

ScoThunder and others added 11 commits June 1, 2023 17:09

update

885380c

update

f1f7898

update

9e8dc0c

update

f00658d

update

d53ace4

update

e80b2ce

update

d6d43c4

update

b9f10df

upvenly pushed a commit that referenced this pull request Aug 9, 2023

Merge pull request #2 from shh2000/fix0711

e40875a

Fix0711

upvenly deleted the branch wwl/add_standard_wav2vec2 August 9, 2023 02:52

upvenly closed this Aug 9, 2023

upvenly deleted the wwl/support-different-image branch August 9, 2023 02:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support different pytorch image and fix some bug #2

support different pytorch image and fix some bug #2

upvenly commented May 18, 2023

shh2000 commented Jun 1, 2023

shh2000 Jun 1, 2023

shh2000 Jun 1, 2023

support different pytorch image and fix some bug #2

support different pytorch image and fix some bug #2

Conversation

upvenly commented May 18, 2023

shh2000 commented Jun 1, 2023

shh2000 Jun 1, 2023

Choose a reason for hiding this comment

shh2000 Jun 1, 2023

Choose a reason for hiding this comment