nn.Graph reuse eager lbn without create duplicate variable op #6981

chengtbf · 2021-12-08T16:35:12Z

nn.Graph 捕获 nn.Module 中的 Eager Tensor（parameters）时，没有考虑 Tensor 传入会重复的问题，导致同一个 tensor 指针（但是在 module 中有多个不同的 name ）会被重复创建 Variable Op，虽然不会影响正确性（多个 Variable Op 绑定相同的 Tensor 内存），但是会在多卡情形下造成额外的梯度同步操作，影响性能。

此问题由 issue：

https://github.com/Oneflow-Inc/OneTeam/issues/828#issuecomment-987625428

反馈。

该 PR 还支持了每次 Graph build 完，清空 TensorNameScope 的功能，解决：

Oneflow-Inc/OneTeam#827

中反应的问题

oneflow/core/framework/nn_graph.cpp

oneflow/core/framework/op_interpreter/lazy_op_interpreter.cpp

oneflow/core/framework/tensor_name_scope.h

mosout · 2021-12-09T02:29:16Z

这个pr是不是也能同时解决这个问题 https://github.com/Oneflow-Inc/OneTeam/issues/827

chengtbf · 2021-12-09T02:36:17Z

这个pr是不是也能同时解决这个问题 Oneflow-Inc/OneTeam#827

是的。一并解决

…raph_catch_eager_tensor

strint · 2021-12-09T00:37:45Z

oneflow/core/framework/tensor_name_scope.h

@@ -31,6 +31,9 @@ class TensorNameScope {

  void Record(const std::shared_ptr<Tensor>& tensor, const std::string& name);

+  // NOTE(chengcheng): TensorNameScope need to be cleared after current graph build.


接口这里应该不需要加使用那里需要的注释

strint · 2021-12-20T11:17:04Z

oneflow/core/framework/op_interpreter/lazy_op_interpreter.cpp

+  if (!opt_lbn.empty()) {
+    // NOTE(chengcheng): This eager tensor has been fed as variable op before, so we just use the
+    //  lbn, and will NOT create duplicate variable op again.
+    (*outputs)[0] = input_tensor;


这里记得计划改成返回一个lazy tensor？

chengtbf · 2021-12-21T17:06:32Z

本 PR 关闭。后续的工作由：

Clean TensorNameScope after graph build #7076
啸宇即将提供的 Graph python 端对 module parameters 去重插入 Optimizer 的 bug fix

分别支持。

nn.Graph reuse eager lbn without create duplicate variable op

b9c0a86

chengtbf added enhancement system graph graph mode labels Dec 8, 2021

chengtbf requested review from strint, leaves-zwx, hjchen2 and L1aoXingyu December 8, 2021 16:35

Add log

75ea0bd

yuanms2 reviewed Dec 9, 2021

View reviewed changes

oneflow/core/framework/nn_graph.cpp Outdated Show resolved Hide resolved

yuanms2 reviewed Dec 9, 2021

View reviewed changes

oneflow/core/framework/op_interpreter/lazy_op_interpreter.cpp Outdated Show resolved Hide resolved

yuanms2 reviewed Dec 9, 2021

View reviewed changes

oneflow/core/framework/tensor_name_scope.h Outdated Show resolved Hide resolved

chengtbf added 4 commits December 10, 2021 11:42

refine note spell

6e2309a

Merge branch 'master' of github.com:Oneflow-Inc/oneflow into dev_cc_g…

f20cf13

…raph_catch_eager_tensor

Fix bug of LazyInterpret handle inplace eager tensor

14af159

Merge branch 'master' into dev_cc_graph_catch_eager_tensor

3892509

strint reviewed Dec 20, 2021

View reviewed changes

Merge branch 'master' into dev_cc_graph_catch_eager_tensor

5650047

chengtbf closed this Dec 21, 2021

mosout mentioned this pull request Dec 27, 2021

Check graph for loss operator tests #7114

Closed

chengtbf deleted the dev_cc_graph_catch_eager_tensor branch April 13, 2023 07:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nn.Graph reuse eager lbn without create duplicate variable op #6981

nn.Graph reuse eager lbn without create duplicate variable op #6981

chengtbf commented Dec 8, 2021 •

edited

Loading

mosout commented Dec 9, 2021

chengtbf commented Dec 9, 2021

strint Dec 9, 2021

strint Dec 20, 2021

chengtbf commented Dec 21, 2021

		@@ -31,6 +31,9 @@ class TensorNameScope {

		void Record(const std::shared_ptr<Tensor>& tensor, const std::string& name);

		// NOTE(chengcheng): TensorNameScope need to be cleared after current graph build.

nn.Graph reuse eager lbn without create duplicate variable op #6981

nn.Graph reuse eager lbn without create duplicate variable op #6981

Conversation

chengtbf commented Dec 8, 2021 • edited Loading

mosout commented Dec 9, 2021

chengtbf commented Dec 9, 2021

strint Dec 9, 2021

Choose a reason for hiding this comment

strint Dec 20, 2021

Choose a reason for hiding this comment

chengtbf commented Dec 21, 2021

chengtbf commented Dec 8, 2021 •

edited

Loading