关于 find_unused_parameters 的含义和影响 #3105

linyangsdu · 2021-05-21T06:42:30Z

我在训练过程中遇到了如下错误:

Traceback (most recent call last):
File "tools/train.py", line 140, in
main()
File "tools/train.py", line 136, in main
run(FLAGS, cfg)
File "tools/train.py", line 111, in run
trainer.train(FLAGS.eval)
File "/root/paddlejob/workspace/code/PaddleDetection/ppdet/engine/trainer.py", line 307, in train
outputs = model(data)
File "/opt/conda/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 898, in call
outputs = self.forward(*inputs, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/paddle/fluid/dygraph/parallel.py", line 581, in forward
list(self._find_varbase(outputs)))
RuntimeError: (PreconditionNotMet) A serious error has occurred here. Please set find_unused_parameters=True to traverse backward graph in each step to prepare reduce in advance. If you have set, There may be several reasons for this error: 1) Please note that all forward outputs derived from the module parameters must participate in the calculation of losses and subsequent gradient calculations. If not, the wrapper will hang, waiting for autograd to generate gradients for these parameters. you can use detach or stop_gradient to make the unused parameters detached from the autograd graph. 2) Used multiple forwards and one backward. You may be able to wrap multiple forwards in a model.

请问 find_unused_parameters 参数是什么含义，添加为True有什么影响，这个问题该如何解决？

jerrywgz · 2021-05-21T09:17:06Z

请问是哪个模型遇到了这个问题呢，目前可以在配置文件中设置find_unused_parameters, 可以参考https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/mot/jde/_base_/jde_darknet53.yml#L3
关于这个参数的含义和影响，可以参考这个pr中的说明PaddlePaddle/Paddle#32826

linyangsdu · 2021-05-21T12:54:14Z

是在训练 cascadercnn r50 dcn的时候遇到的问题，在aistudio中训练的，链接为:
https://aistudio.baidu.com/studio/project/partial/verify/1849265/49507f7ab6544a4fad12a10436560701

jerrywgz · 2021-05-24T02:46:48Z

能否贴下添加find_unused_parameters后的报错信息呢

nemonameless · 2021-05-26T08:15:28Z

可以使用paddle2.1版本，否则之前的paddle版本可能没有这个find_unused_parameters，会报错

paddle-bot-old · 2022-03-16T06:38:35Z

Since this issue has not been updated for more than three months, it will be closed, if it is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
It is recommended to pull and try the latest code first.
由于该问题超过三个月未更新，将会被关闭，若问题未解决或有后续问题，请随时重新打开（建议先拉取最新代码进行尝试），我们会继续跟进。

jerrywgz added the training Training question label May 21, 2021

jerrywgz self-assigned this May 24, 2021

paddle-bot-old bot closed this as completed Mar 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于 find_unused_parameters 的含义和影响 #3105

关于 find_unused_parameters 的含义和影响 #3105

linyangsdu commented May 21, 2021

jerrywgz commented May 21, 2021

linyangsdu commented May 21, 2021

jerrywgz commented May 24, 2021

nemonameless commented May 26, 2021 •

edited

Loading

paddle-bot-old bot commented Mar 16, 2022

关于 find_unused_parameters 的含义和影响 #3105

关于 find_unused_parameters 的含义和影响 #3105

Comments

linyangsdu commented May 21, 2021

jerrywgz commented May 21, 2021

linyangsdu commented May 21, 2021

jerrywgz commented May 24, 2021

nemonameless commented May 26, 2021 • edited Loading

paddle-bot-old bot commented Mar 16, 2022

nemonameless commented May 26, 2021 •

edited

Loading