Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于训练第11个epoch #57

Open
wenhaixi opened this issue Apr 9, 2021 · 1 comment
Open

关于训练第11个epoch #57

wenhaixi opened this issue Apr 9, 2021 · 1 comment

Comments

@wenhaixi
Copy link

wenhaixi commented Apr 9, 2021

用非常非常小的数据集想测试下,选用了20epoch,从10个后开始微调backbone,训练时候,前十个没问题,可以正常,但是到第11个报错如下:
Traceback (most recent call last):
File "C:/Users/Administrator/Desktop/python/siamban-master/tools/train.py", line 312, in
main()
File "C:/Users/Administrator/Desktop/python/siamban-master/tools/train.py", line 307, in main
train(train_loader, dist_model, optimizer, lr_scheduler, tb_writer)
File "C:/Users/Administrator/Desktop/python/siamban-master/tools/train.py", line 203, in train
outputs = model(data) # 此处进行前向传播进行计算.
File "C:\Users\Administrator\anaconda3\envs\py37\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Administrator\Desktop\python\siamban-master\siamban\utils\distributed.py", line 43, in forward
return self.module(*args, **kwargs)
File "C:\Users\Administrator\anaconda3\envs\py37\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Administrator\Desktop\python\siamban-master\siamban\models\model_builder.py", line 76, in forward
xf = self.backbone(search)
File "C:\Users\Administrator\anaconda3\envs\py37\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Administrator\Desktop\python\siamban-master\siamban\models\backbone\resnet_atrous.py", line 192, in forward
p3 = self.layer3(p2)
File "C:\Users\Administrator\anaconda3\envs\py37\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Administrator\anaconda3\envs\py37\lib\site-packages\torch\nn\modules\container.py", line 117, in forward
input = module(input)
File "C:\Users\Administrator\anaconda3\envs\py37\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Administrator\Desktop\python\siamban-master\siamban\models\backbone\resnet_atrous.py", line 104, in forward
residual = self.downsample(x)
File "C:\Users\Administrator\anaconda3\envs\py37\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Administrator\anaconda3\envs\py37\lib\site-packages\torch\nn\modules\container.py", line 117, in forward
input = module(input)
File "C:\Users\Administrator\anaconda3\envs\py37\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Administrator\anaconda3\envs\py37\lib\site-packages\torch\nn\modules\conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "C:\Users\Administrator\anaconda3\envs\py37\lib\site-packages\torch\nn\modules\conv.py", line 420, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

请问这是什么问题呢?

@zeduchen
Copy link
Member

Fine-tuning the backbone requires a large amount of video memory. You can check if there are other processes occupying the video memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants