[Dy2St]Fix cond_block_grad error when handle no need grad vras #43034
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
Bug fixes
PR changes
Others
Describe
这个问题是在套件中ppyoloe模型中报出,排查后尝试使用demo进行了复现
复现case:
报错分析
报错信息:
打印program,报错信息中未初始化var应该是
reshape2_0.tmp_0@GRAD
,后面reshape_grad时报出该var没有被初始化。这个var是的conditional_block_grad计算得到的对应的sublock是4号,但是block4中并没有做reshape2_0.tmp_0@GRAD
的计算。为什么block0中有
reshape2_0.tmp_0@GRAD
这个var,但是反向的block4中没有计算这个var呢?在反向_append_backward_ops_的逻辑中会找到sub_block_path
表示sub_block中需要求导op的路径,在block1中可以看到reshape2_0.tmp_0
在arg_max之后其输出varargmax_0.tmp_0
的stop_gradient属性是True,那么对于argmax_0.tmp_0
和reshape2_0.tmp_0
应该是不需要求导的,arg_max这个op应该不会在sub_block_path
中,所以block4中也就不会对reshape2_0.tmp_0
进行求导,即不会计算reshape2_0.tmp_0@GRAD
。那按理说在block0中也不不应该存在
reshape2_0.tmp_0@GRAD
这个var呀,为什么还是会有呢?no_grad_dict
中对于不需要求导var的分析似乎有点问题?reshape2_0.tmp_0
应该存在于这个dict中(把reshape2_0.tmp_0
放到no_grad_dict
似乎也不合理,因为在block0中reshape2_0.tmp_0
的stop_gradient=False),然后调用get_grad_op_desc
生成反向op_desc是将reshape2_0.tmp_0@GRAD
对应位置的Input@GRAD用@empty@代替。但是reshape2_0.tmp_0
并不在no_grad_dict
中,导致生成反向op的时候没有用@empty@进行替换。修复
尝试修改
_append_backward_ops_
中sub_block_path
和no_grad_dict
的处理逻辑,阅读相关代码后发现修改这部分的逻辑有些困难。于是转而修改cond_block_grad计算后返回的逻辑,cond_block_grad计算逻辑最后会调用AssignLocalGradientToParentScope
将sub_block
的grad_var拷贝到parent_block
,我们只需要将在AssignLocalGradientToParentScope
中找到parent_block
需要计算反向的grad_var,但是在sub_block
却没有计算,将这些grad_var赋值为0。