-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run nn.Graph by VM #9884
Run nn.Graph by VM #9884
Conversation
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
print(g) | ||
assert "broadcast_sub" not in capsys.readouterr().out | ||
assert "cast" not in capsys.readouterr().out | ||
assert "broadcast_mul" not in capsys.readouterr().out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个怎么看起来不像标准的 unittest,ci 能跑到这个 case 么
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以,是 pytest 的写法,比 python 自带的 unittest 好用不少,CI 已经在用 pytest 跑了
oneflow/core/job/job_interpreter.cpp
Outdated
const auto& job = graph->job(); | ||
auto env = *JUST(InitEnv(graph_inputs, graph)); | ||
|
||
const auto dead_tensors = GetDeadTensorVector(job); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead tensor 的含义是什么意思呢,可以注释下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,在 GetDeadTensorVector 的定义处有一个注释,我再在这里指明一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已添加
Signed-off-by: daquexian <daquexian566@gmail.com>
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally. |
oneflow/core/job/job_interpreter.cpp
Outdated
|
||
// tensors in dead_tensors[i] will not be accessed any more after i-th op | ||
// so they can be released once i-th op's execution finishes. | ||
std::vector<std::vector<std::string>> GetDeadTensorVector(const Job& job) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead tensor 看起来主要就是会 outdated 的 activation tensor ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,dead_tensors[i] 表示第 i 个 op 之后会变为 dead 的 tensors,如果有更好的名字也可以提出
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OudatedTensorAfterOp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以 :good: 已修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: daquexian <daquexian566@gmail.com>
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally. |
Speed stats:
|
CI failed when running job: cuda-misc. PR label automerge has been removed |
Signed-off-by: daquexian <daquexian566@gmail.com>
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally. |
Speed stats:
|
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9884/ |
CI failed when running job: cuda-module. PR label automerge has been removed |
Speed stats:
|
Speed stats:
|
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9884/ |
Running global nn.Graph by vm, following #9884 --------- Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: daquexian <daquexian566@gmail.com>
相关 issue:https://github.com/Oneflow-Inc/OneTeam/issues/1657
这个 PR 实现了一个实验性的功能:在
ONEFLOW_RUN_GRAPH_BY_VM=1
时用 VM 来跑 nn.Graph,这可以让 nn.Graph 接受动态输入形状(只支持了单卡),目前阶段这个方式并不完全可靠因为无法排除存在某些 op 或者图优化强依赖了 build graph 时的输入形状,要等待有了完善的 symbolic shape 的支持之后才能完全解决这个问题。在 SD1.5 上测试用 VM 跑 Graph 和用 actor 跑 Graph 速度并没有很大的区别,不过显存稍多:
6968MB
6292MB