Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swin pipeline setting #215

Merged
merged 8 commits into from
Mar 30, 2022
Merged

swin pipeline setting #215

merged 8 commits into from
Mar 30, 2022

Conversation

Ldpe2G
Copy link
Collaborator

@Ldpe2G Ldpe2G commented Mar 23, 2022

TODO List (以下实验都是启动8卡)

  • eager global 3d 并行(朴素流水)
  • graph 3d 并行(朴素流水)
  • graph 3d 并行(朴素流水),打开 checkpointing + amp
  • graph 数据+流水并行, 打开 acc_grad + amp
  • graph 数据并行,打开 checkpointing + amp

会报错的case

  • graph 3d 并行,打开 acc grad 训练阶段构图报错
  • graph 数据, 打开 amp + zero stage1,训练没报错,推理阶段构图报错

@Ldpe2G Ldpe2G mentioned this pull request Mar 23, 2022
@Ldpe2G
Copy link
Collaborator Author

Ldpe2G commented Mar 29, 2022

修改了一下 graphbase 的 set_activation_checkpoint 实现,和 set_pipeline_stage_id 一样,如果用户提供的模型包含了 set_activation_checkpoint 函数则调用,否则用默认设置。

这样子用户也不用去魔改 GraphBase 类

@Ldpe2G Ldpe2G requested a review from oneflow-ci-bot March 30, 2022 00:41
@Ldpe2G Ldpe2G requested review from oneflow-ci-bot and removed request for oneflow-ci-bot March 30, 2022 00:47
@Ldpe2G Ldpe2G merged commit 7bf3820 into main Mar 30, 2022
@Ldpe2G Ldpe2G deleted the dev_swin_pipeline_parallel branch March 30, 2022 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants