Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 打开hardsigmoid(nn.functional下)的graph报错(test_functional_hardsigmoid_with_random_data) #7262

Closed
lixiang007666 opened this issue Jan 14, 2022 · 9 comments · Fixed by #7276
Labels

Comments

@lixiang007666
Copy link
Contributor

问题来源:

在graph模式下跑激活函数的测试,其中nn.functional下的hardsigmoid报错,但是nn.Hardsigmoid()没有报错(可以正常打开graph)。

报错的测试代码:

    @autotest(check_graph=True)
    def test_functional_hardsigmoid_with_random_data(test_case):
        device = random_device()
        x = random_pytorch_tensor().to(device)
        y = torch.nn.functional.hardsigmoid(x, random_bool())
        return y

错误信息:

[ERROR](GRAPH:TestGraphOfFunctional_3:TestGraphOfFunctional) building graph got error: <class 'AssertionError'> 
Tensor([1, 1, 4]).to(cuda)
hardsigmoid(Tensor([1, 1, 4]), True)
-----------------------------------------------------------
This program has 1 input tensor: 
Shape[1, 1, 4]
tensor([[[0.5397, 0.5533, 0.5142, 0.5577]]], device='cuda:0',
       grad_fn=<HardsigmoidBackward0>)
-----------------------------------------------------------
E
======================================================================
ERROR: test_functional_hardsigmoid_with_random_data (test_activation.TestHardsigmoidModule)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/lixiang/oneflow/python/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 384, in dual_method
    test_g_res = test_g()
  File "/home/lixiang/oneflow/python/oneflow/nn/graph/graph.py", line 258, in __call__
    self._compile(*args)
  File "/home/lixiang/oneflow/python/oneflow/nn/graph/graph.py", line 506, in _compile
    eager_outputs = self._build_graph(*args)
  File "/home/lixiang/oneflow/python/oneflow/nn/graph/graph.py", line 614, in _build_graph
    ) = self._build_io("output", graph_build_util.build_graph_output, *outputs)
  File "/home/lixiang/oneflow/python/oneflow/nn/graph/graph.py", line 818, in _build_io
    build_args.append(build_tensor_or_none(arg, name, repr_str))
  File "/home/lixiang/oneflow/python/oneflow/nn/graph/graph.py", line 798, in build_tensor_or_none
    build_arg = build_func(name, tensor)
  File "/home/lixiang/oneflow/python/oneflow/framework/graph_build_util.py", line 187, in build_graph_output
    assert out.is_lazy
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/lixiang/oneflow/python/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 703, in new_f
    res = f(test_case)
  File "/home/lixiang/oneflow/python/oneflow/test/modules/test_activation.py", line 263, in test_functional_hardsigmoid_with_random_data
    y = torch.nn.functional.hardsigmoid(x, random_bool())
  File "/home/lixiang/oneflow/python/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 387, in dual_method
    raise OneFlowGraphBuildOrRunError(e)
oneflow.test_utils.automated_test_util.torch_flow_dual_object.OneFlowGraphBuildOrRunError: OneFlow nn.Graph Build Or Run Error: 

----------------------------------------------------------------------
Ran 1 test in 5.454s
@lixiang007666
Copy link
Contributor Author

@strint 啸宇哥,这种错误大概是什么原因?

@strint
Copy link
Contributor

strint commented Jan 14, 2022

File "/home/lixiang/oneflow/python/oneflow/framework/graph_build_util.py", line 187, in build_graph_output
    assert out.is_lazy
AssertionError

已经在这个分支修复,你合并下试试: #7254

再打开 ONEFLOW_TEST_VERBOSE 看下 执行状态。

@lixiang007666
Copy link
Contributor Author

File "/home/lixiang/oneflow/python/oneflow/framework/graph_build_util.py", line 187, in build_graph_output
    assert out.is_lazy
AssertionError

已经在这个分支修复,你合并下试试: #7254

再打开 ONEFLOW_TEST_VERBOSE 看下 执行状态。

我合并之后,这个错误没有了,但是graph result和eager result没有对齐。

@strint
Copy link
Contributor

strint commented Jan 14, 2022

我合并之后,这个错误没有了,但是graph result和eager result没有对齐。

看下具体哪个case出错了,然后手动复现下

@lixiang007666
Copy link
Contributor Author

lixiang007666 commented Jan 17, 2022

错误复现:

nn.functional.hardsigmoid在GPU和CPU环境下,inplace版本出现eager和graph不一致的情况,正常版本一致。

进一步判断:nn.functional.hardsigmoid的inplace版本在graph模式下出错。

复现代码:

import oneflow as flow
import numpy as np 

#x = np.array([-0.5, 0, 0.5]).astype(np.float32)
x=flow.randn(1, 4)


input = flow.Tensor(x).to("cuda")
# hardsigmoid = flow.nn.Hardsigmoid()
# out = hardsigmoid(input)

# print(out)

out2 = flow.nn.functional.hardsigmoid(input).to("cuda")
print("eager, 非inplace")
print("-------------------------------------")
print(out2)

out2_ = flow.nn.functional.hardsigmoid(input,True).to("cuda")
print("eager, inplace")
print("-------------------------------------")
print(out2_)

def model(x):
    return flow.nn.functional.hardsigmoid(x,True).to("cuda")

class Graph(flow.nn.Graph):
    def __init__(self):
        super().__init__()
        self.m=model

    def build(self,x):
        out = self.m(x)
        return out

graph=Graph()
out3=graph(input)

print("graph, inplace")
print("-------------------------------------")
print(out3)

结果:

eager, 非inplace
-------------------------------------
tensor([[0.4792, 0.2901, 0.5248, 0.4905]], device='cuda:0', dtype=oneflow.float32)
eager, inplace
-------------------------------------
tensor([[0.4792, 0.2901, 0.5248, 0.4905]], device='cuda:0', dtype=oneflow.float32)
graph, inplace
-------------------------------------
tensor([[0.5799, 0.5483, 0.5875, 0.5818]], device='cuda:0', dtype=oneflow.float32)

@strint
Copy link
Contributor

strint commented Jan 17, 2022

graph的非Inplace结果也更新下看看

@strint
Copy link
Contributor

strint commented Jan 17, 2022

out2_ = flow.nn.functional.hardsigmoid(input,True).to("cuda")

这一行,因为是Inplace执行,所以把input给改了,你可以打印下看看。

这个input再给graph时,graph的输入和eager输入已经不一样了。大概率是这个原因。

如果是这个原因,autotest要改下。需要把input deepcopy下。

@lixiang007666
Copy link
Contributor Author

确实是这个原因~

@strint
Copy link
Contributor

strint commented Jan 17, 2022

@strint strint linked a pull request Jan 21, 2022 that will close this issue
3 tasks
@strint strint closed this as completed Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants