-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
swin 数据并行,8卡线性加速比,比pytorch低 #312
Comments
@Ldpe2G 这个能不能提供下官方测试的脚本和容器环境?还有数据集。我这边测试也需要,另外,swin-transformer仓库跟libai里的有区别么? |
直接clone官方仓库,然后: python3 -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py \
--cfg configs/swin/swin_tiny_patch4_window7_224.yaml \
--batch-size 96 \
--data-path /dataset/extract/ 吞吐我是手算的,它原来没给出来,就是用 batch_size 除以每个batch的平均时间 数据集在类脑上有,就是 |
你说我们自己的swin仓库,那个是之前做实验用的,当时Libai还没支持swin,那个仓库有从troch迁移过来的普通ddp实现,https://github.com/Oneflow-Inc/swin-transformer/tree/swin_clean_ldp |
基于最新 oneflow master: eager global fp32 ,batch 961卡吞吐: ~56 samples / s 8卡吞吐:~340 samples / s 加速比:~6 倍 graph fp32 ,batch 961卡吞吐:~67 samples / s 8卡吞吐:~355 samples / s 加速比: ~5.3 倍 graph fp16, batch 1281卡吞吐: ~69 samples / s 8卡吞吐:~401 samples / s 加速比: ~5.倍 相比之前有提升 |
以前调优过很多次了吧,而且也是放进监控指标的吧,为什么现在又出现了 哦,是别的模型没问题,但swin 有问题吧 |
应该是之前swin都没关注性能,是最近flowvision有用户反馈性能问题,才测得 |
问题描述
根据 flowvision 仓库中用户的 issue 描述,libai swin 8卡加速比低的问题,在类脑上做了下实验,下面是 libai 和 官方swin数据对比
实验环境
类脑vs009
oneflow 版本:
0.8.0+cu112.git.57869e9e39
libai 版本:
de2c68f2692760e5de87ebb815541a98d1b8ebe7
pytorch 版本:
1.10.1+cu102
libai
graph, fp16, batch 128
1卡吞吐: ~ 70
8卡吞吐: ~ 280
线性加速比 ~4倍
graph, fp32, batch 32
1卡吞吐:~ 70
8卡吞吐:~ 290
线性加速比 ~4倍
eager global, fp32, batch 32
1卡吞吐: ~50
8卡吞吐: ~200
线性加速比 ~4倍
普通 ddp https://github.com/Oneflow-Inc/swin-transformer/tree/swin_clean_ldp
eager ddp, fp32, batch 32
1卡吞吐: ~152.3
8卡吞吐: ~416.2
线性加速比 ~2.7倍
官方pytorch swin: https://github.com/microsoft/Swin-Transformer
amp, batch 128,
1卡吞吐: ~320
8卡吞吐: ~2048
线性加速比:~6.4 倍
fp32, batch 96
1卡吞吐: ~208
8卡吞吐: ~1536
线性加速比: ~7.3 倍
The text was updated successfully, but these errors were encountered: