Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 128*n+1 个 Tensor 使用 flow.stack 时后向出错 #8918

Closed
wyg1997 opened this issue Aug 15, 2022 · 1 comment · Fixed by #8927
Closed

[BUG] 128*n+1 个 Tensor 使用 flow.stack 时后向出错 #8918

wyg1997 opened this issue Aug 15, 2022 · 1 comment · Fixed by #8927
Assignees
Labels
bug community events from community

Comments

@wyg1997
Copy link
Contributor

wyg1997 commented Aug 15, 2022

Summary

StackFunctor 的实现逻辑是最多每 128 个 Tensor 用一个 StackOp 来处理,递归完成更多 Tensor 的 Stack 功能(

StackFunctor() {
ops_.resize(kMaxInputCount);
for (int n = 0; n < ops_.size(); ++n) {
ops_[n] = CHECK_JUST(one::OpBuilder("stack").Input("in", n + 1).Output("out").Build());
}
}
),而 StackFunctor 是不支持单独一个 Tensor 后向的,所以当输入 Tensor 个数是 128*n+1 时,StackGrad 就会报:

oneflow._oneflow_internal.exception.RuntimeError: Check failed: (like.size()) >= (2) (1 vs 2) 
like.size() must not less than 2, but got 1 

Code to reproduce bug

import oneflow as flow

feats = [flow.ones(2, 3).requires_grad_()] * 129
b = flow.stack(feats, dim=0)
b.sum().backward()
@wyg1997 wyg1997 added bug community events from community labels Aug 15, 2022
@wyg1997 wyg1997 self-assigned this Aug 15, 2022
@wyg1997
Copy link
Contributor Author

wyg1997 commented Aug 15, 2022

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    b.sum().backward()
  File "/home/ubuntu/work/codes/oneflow/python/oneflow/framework/tensor.py", line 33, in _backward
    flow.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/ubuntu/work/codes/oneflow/python/oneflow/autograd/autograd.py", line 110, in backward
    backward_api(
oneflow._oneflow_internal.exception.RuntimeError: Check failed: (like.size()) >= (2) (1 vs 2)
like.size() must not less than 2, but got 1

这个问题的报错是这样的,看不出来是 StackGrad 的问题,由于这个报错信息只有 StackGrad 和 SplitLike 才有,所以我才能很快在模型中锁定是 Stack 的问题,才能猜出一个最小复现代码。这里报错信息还需要优化一下。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug community events from community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant