[BUG] 128*n+1 个 Tensor 使用 flow.stack 时后向出错 #8918

wyg1997 · 2022-08-15T11:12:16Z

Summary

StackFunctor 的实现逻辑是最多每 128 个 Tensor 用一个 StackOp 来处理，递归完成更多 Tensor 的 Stack 功能（

oneflow/oneflow/core/functional/impl/array_functor.cpp

Lines 567 to 572 in 9dbb458

    
           StackFunctor() { 
        
             ops_.resize(kMaxInputCount); 
        
             for (int n = 0; n < ops_.size(); ++n) { 
        
               ops_[n] = CHECK_JUST(one::OpBuilder("stack").Input("in", n + 1).Output("out").Build()); 
        
             } 
        
           }

），而 StackFunctor 是不支持单独一个 Tensor 后向的，所以当输入 Tensor 个数是 128*n+1 时，StackGrad 就会报：

oneflow._oneflow_internal.exception.RuntimeError: Check failed: (like.size()) >= (2) (1 vs 2) 
like.size() must not less than 2, but got 1

Code to reproduce bug

import oneflow as flow

feats = [flow.ones(2, 3).requires_grad_()] * 129
b = flow.stack(feats, dim=0)
b.sum().backward()

The text was updated successfully, but these errors were encountered:

wyg1997 · 2022-08-15T11:20:57Z

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    b.sum().backward()
  File "/home/ubuntu/work/codes/oneflow/python/oneflow/framework/tensor.py", line 33, in _backward
    flow.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/ubuntu/work/codes/oneflow/python/oneflow/autograd/autograd.py", line 110, in backward
    backward_api(
oneflow._oneflow_internal.exception.RuntimeError: Check failed: (like.size()) >= (2) (1 vs 2)
like.size() must not less than 2, but got 1

这个问题的报错是这样的，看不出来是 StackGrad 的问题，由于这个报错信息只有 StackGrad 和 SplitLike 才有，所以我才能很快在模型中锁定是 Stack 的问题，才能猜出一个最小复现代码。这里报错信息还需要优化一下。

fix #8918

wyg1997 added bug community events from community labels Aug 15, 2022

wyg1997 self-assigned this Aug 15, 2022

wyg1997 added a commit that referenced this issue Aug 16, 2022

fix(StackOp): fix bug when input number is 128*n+1

46eb718

fix #8918

wyg1997 mentioned this issue Aug 16, 2022

Fix stack bug for 129inputs #8927

Merged

mergify bot closed this as completed in 4a33514 Aug 17, 2022

mergify bot closed this as completed in #8927 Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] 128*n+1 个 Tensor 使用 flow.stack 时后向出错 #8918

[BUG] 128*n+1 个 Tensor 使用 flow.stack 时后向出错 #8918

wyg1997 commented Aug 15, 2022 •

edited

Loading

wyg1997 commented Aug 15, 2022 •

edited

Loading

[BUG] 128*n+1 个 Tensor 使用 flow.stack 时后向出错 #8918

[BUG] 128*n+1 个 Tensor 使用 flow.stack 时后向出错 #8918

Comments

wyg1997 commented Aug 15, 2022 • edited Loading

Summary

Code to reproduce bug

wyg1997 commented Aug 15, 2022 • edited Loading

wyg1997 commented Aug 15, 2022 •

edited

Loading

wyg1997 commented Aug 15, 2022 •

edited

Loading