相同的方法，flow会报错。 #7127

kaijieshi7 · 2021-12-28T08:53:35Z

Summary

A short description about the bug/issue

Code to reproduce bug

# import oneflow as flow
# import oneflow.nn as nn
import torch as flow
import torch.nn as nn

class N1(nn.Module):
    def __init__(self):
        super(N1, self).__init__()
        self.reduction = nn.Linear(96, 96, bias=False)
        self.norm = nn.LayerNorm(96)

    def forward(self, x):
        x = self.norm(x)
        x = self.reduction(x)
        return x


n_flow = N1().cuda()
x = flow.rand(2, 96).cuda()
label = flow.rand(2, 96).cuda()
loss_fn = nn.MSELoss()
# loss_fn = nn.BCELoss()
optimizer = flow.optim.SGD(n_flow.parameters(), lr=0.001, momentum=0.9, weight_decay=0.05)
for i in range(2):
    optimizer.zero_grad()
    y = n_flow(x)
    loss_fn(y, label).backward()
    optimizer.step()
    print(y)

报错信息

Traceback (most recent call last):
  File "/home/kaijie/Documents/code/of/large_scale_training/large_scale_training/swin_transformer/bug3.py", line 27, in <module>
    loss_fn(y, label).backward()
  File "/home/kaijie/anaconda3/envs/torch_cuda10_1/lib/python3.8/site-packages/oneflow/framework/tensor.py", line 80, in _backward
    flow.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/kaijie/anaconda3/envs/torch_cuda10_1/lib/python3.8/site-packages/oneflow/autograd/autograd.py", line 48, in backward
    backward_api(
IndexError: vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)

System Information

What is your OneFlow installation (pip, source, dockerhub):
OS:
OneFlow version (run python3 -m oneflow --doctor):
Python version:
CUDA driver version:
GPU models:
Other info:

The text was updated successfully, but these errors were encountered:

MARD1NO · 2021-12-28T09:38:42Z

先norm再linear就有问题

先linear再norm就没上述问题

TODO

strint · 2021-12-28T09:54:09Z

IndexError: vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)

这个错误是哪一段c++代码报出来的呢？看起来需要增加下检查，这个index越界导致直接没有错误栈了

wyg1997 added the bug label Dec 28, 2021

liufengwei0103 linked a pull request Dec 31, 2021 that will close this issue

Fix laynorm backward bug #7164

Merged

oneflow-ci-bot closed this as completed in #7164 Dec 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

相同的方法，flow会报错。 #7127

相同的方法，flow会报错。 #7127

kaijieshi7 commented Dec 28, 2021

MARD1NO commented Dec 28, 2021

strint commented Dec 28, 2021 •

edited

Loading

相同的方法，flow会报错。 #7127

相同的方法，flow会报错。 #7127

Comments

kaijieshi7 commented Dec 28, 2021

Summary

Code to reproduce bug

报错信息

System Information

MARD1NO commented Dec 28, 2021

strint commented Dec 28, 2021 • edited Loading

strint commented Dec 28, 2021 •

edited

Loading