Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run nn.Graph by VM #9884

Merged
merged 18 commits into from
Feb 25, 2023
Merged

Run nn.Graph by VM #9884

merged 18 commits into from
Feb 25, 2023

Conversation

daquexian
Copy link
Contributor

相关 issue:https://github.com/Oneflow-Inc/OneTeam/issues/1657

这个 PR 实现了一个实验性的功能:在 ONEFLOW_RUN_GRAPH_BY_VM=1 时用 VM 来跑 nn.Graph,这可以让 nn.Graph 接受动态输入形状(只支持了单卡),目前阶段这个方式并不完全可靠因为无法排除存在某些 op 或者图优化强依赖了 build graph 时的输入形状,要等待有了完善的 symbolic shape 的支持之后才能完全解决这个问题。

在 SD1.5 上测试用 VM 跑 Graph 和用 actor 跑 Graph 速度并没有很大的区别,不过显存稍多:

VM actor
SD 1.5 17.70 it/s
6968MB
17.75 it/s
6292MB

Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
@daquexian daquexian marked this pull request as ready for review February 21, 2023 09:19
Signed-off-by: daquexian <daquexian566@gmail.com>
print(g)
assert "broadcast_sub" not in capsys.readouterr().out
assert "cast" not in capsys.readouterr().out
assert "broadcast_mul" not in capsys.readouterr().out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个怎么看起来不像标准的 unittest,ci 能跑到这个 case 么

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以,是 pytest 的写法,比 python 自带的 unittest 好用不少,CI 已经在用 pytest 跑了

const auto& job = graph->job();
auto env = *JUST(InitEnv(graph_inputs, graph));

const auto dead_tensors = GetDeadTensorVector(job);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead tensor 的含义是什么意思呢,可以注释下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,在 GetDeadTensorVector 的定义处有一个注释,我再在这里指明一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已添加

daquexian and others added 2 commits February 23, 2023 12:45
Signed-off-by: daquexian <daquexian566@gmail.com>
@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.


// tensors in dead_tensors[i] will not be accessed any more after i-th op
// so they can be released once i-th op's execution finishes.
std::vector<std::vector<std::string>> GetDeadTensorVector(const Job& job) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead tensor 看起来主要就是会 outdated 的 activation tensor ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,dead_tensors[i] 表示第 i 个 op 之后会变为 dead 的 tensors,如果有更好的名字也可以提出

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OudatedTensorAfterOp?

Copy link
Contributor Author

@daquexian daquexian Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以 :good: 已修改

Copy link
Contributor

@strint strint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

daquexian and others added 2 commits February 23, 2023 13:18
Signed-off-by: daquexian <daquexian566@gmail.com>
@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@daquexian daquexian requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 23, 2023 15:00
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 141.0ms (= 14098.1ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 146.5ms (= 14654.0ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.04 (= 146.5ms / 141.0ms)

OneFlow resnet50 time: 80.5ms (= 8047.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.9ms (= 8386.3ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.04 (= 83.9ms / 80.5ms)

OneFlow resnet50 time: 48.4ms (= 9687.6ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 61.0ms (= 12201.5ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.26 (= 61.0ms / 48.4ms)

OneFlow resnet50 time: 32.2ms (= 6431.3ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 48.8ms (= 9750.4ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.52 (= 48.8ms / 32.2ms)

OneFlow resnet50 time: 24.9ms (= 4986.6ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 39.1ms (= 7826.4ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.57 (= 39.1ms / 24.9ms)

OneFlow swin dataloader time: 0.238s (= 47.591s / 200, num_workers=1)
PyTorch swin dataloader time: 0.156s (= 31.283s / 200, num_workers=1)
Relative speed: 0.657 (= 0.156s / 0.238s)

OneFlow swin dataloader time: 0.066s (= 13.223s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.540s / 200, num_workers=4)
Relative speed: 0.646 (= 0.043s / 0.066s)

OneFlow swin dataloader time: 0.041s (= 8.211s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.416s / 200, num_workers=8)
Relative speed: 0.538 (= 0.022s / 0.041s)

❌ OneFlow resnet50 time: 152.3ms (= 15232.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 159.9ms (= 15988.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.05 (= 159.9ms / 152.3ms)

OneFlow resnet50 time: 90.7ms (= 9072.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.1ms (= 10306.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.14 (= 103.1ms / 90.7ms)

OneFlow resnet50 time: 58.9ms (= 11786.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15539.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 77.7ms / 58.9ms)

OneFlow resnet50 time: 42.1ms (= 8412.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.3ms (= 14467.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 72.3ms / 42.1ms)

OneFlow resnet50 time: 35.5ms (= 7096.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.7ms (= 13944.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.97 (= 69.7ms / 35.5ms)

@github-actions
Copy link
Contributor

CI failed when running job: cuda-misc. PR label automerge has been removed

daquexian and others added 2 commits February 24, 2023 13:57
Signed-off-by: daquexian <daquexian566@gmail.com>
@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 141.0ms (= 14100.5ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 144.1ms (= 14412.8ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.02 (= 144.1ms / 141.0ms)

OneFlow resnet50 time: 80.7ms (= 8070.7ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.0ms (= 8503.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.05 (= 85.0ms / 80.7ms)

OneFlow resnet50 time: 50.0ms (= 9998.2ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 55.7ms (= 11130.8ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.11 (= 55.7ms / 50.0ms)

OneFlow resnet50 time: 33.4ms (= 6688.9ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 43.0ms (= 8596.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.29 (= 43.0ms / 33.4ms)

OneFlow resnet50 time: 24.9ms (= 4975.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 37.5ms (= 7496.2ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.51 (= 37.5ms / 24.9ms)

OneFlow swin dataloader time: 0.237s (= 47.377s / 200, num_workers=1)
PyTorch swin dataloader time: 0.148s (= 29.675s / 200, num_workers=1)
Relative speed: 0.626 (= 0.148s / 0.237s)

OneFlow swin dataloader time: 0.072s (= 14.341s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.124s / 200, num_workers=4)
Relative speed: 0.566 (= 0.041s / 0.072s)

OneFlow swin dataloader time: 0.039s (= 7.778s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.478s / 200, num_workers=8)
Relative speed: 0.576 (= 0.022s / 0.039s)

❌ OneFlow resnet50 time: 152.4ms (= 15242.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.6ms (= 16159.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.06 (= 161.6ms / 152.4ms)

OneFlow resnet50 time: 91.3ms (= 9127.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.3ms (= 10128.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.11 (= 101.3ms / 91.3ms)

OneFlow resnet50 time: 59.4ms (= 11887.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.9ms (= 15583.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 77.9ms / 59.4ms)

OneFlow resnet50 time: 42.5ms (= 8495.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.8ms (= 13957.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.64 (= 69.8ms / 42.5ms)

OneFlow resnet50 time: 35.8ms (= 7158.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.5ms (= 14495.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 2.02 (= 72.5ms / 35.8ms)

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9884/

@daquexian daquexian mentioned this pull request Feb 24, 2023
3 tasks
@github-actions
Copy link
Contributor

CI failed when running job: cuda-module. PR label automerge has been removed

@github-actions
Copy link
Contributor

Speed stats:

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 140.9ms (= 14093.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 144.5ms (= 14454.9ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.03 (= 144.5ms / 140.9ms)

OneFlow resnet50 time: 80.6ms (= 8060.8ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.7ms (= 8374.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.04 (= 83.7ms / 80.6ms)

OneFlow resnet50 time: 49.7ms (= 9942.3ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 60.8ms (= 12156.7ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.22 (= 60.8ms / 49.7ms)

OneFlow resnet50 time: 33.8ms (= 6762.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 49.1ms (= 9816.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.45 (= 49.1ms / 33.8ms)

OneFlow resnet50 time: 24.4ms (= 4871.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 44.9ms (= 8988.1ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.84 (= 44.9ms / 24.4ms)

OneFlow swin dataloader time: 0.237s (= 47.396s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.704s / 200, num_workers=1)
Relative speed: 0.627 (= 0.149s / 0.237s)

OneFlow swin dataloader time: 0.068s (= 13.693s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.465s / 200, num_workers=4)
Relative speed: 0.618 (= 0.042s / 0.068s)

OneFlow swin dataloader time: 0.042s (= 8.476s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.403s / 200, num_workers=8)
Relative speed: 0.519 (= 0.022s / 0.042s)

❌ OneFlow resnet50 time: 152.5ms (= 15250.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.3ms (= 16229.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.06 (= 162.3ms / 152.5ms)

OneFlow resnet50 time: 90.9ms (= 9094.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.0ms (= 10196.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.12 (= 102.0ms / 90.9ms)

OneFlow resnet50 time: 59.1ms (= 11814.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.3ms (= 15654.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 78.3ms / 59.1ms)

OneFlow resnet50 time: 42.3ms (= 8457.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.2ms (= 14043.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.66 (= 70.2ms / 42.3ms)

OneFlow resnet50 time: 36.5ms (= 7292.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.7ms (= 13948.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.91 (= 69.7ms / 36.5ms)

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9884/

@mergify mergify bot merged commit 7df12c3 into master Feb 25, 2023
@mergify mergify bot deleted the job_vm branch February 25, 2023 22:46
@rejoicesyc rejoicesyc mentioned this pull request May 11, 2023
rejoicesyc added a commit that referenced this pull request Jun 5, 2023
Running global nn.Graph by vm, following
#9884

---------

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: daquexian <daquexian566@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants