-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add flattern weight of lstm #27192
add flattern weight of lstm #27192
Conversation
Thanks for your contribution! |
29cef36
to
ab8d51c
Compare
@@ -271,6 +363,8 @@ class CudnnLSTMGPUGradKernel : public framework::OpKernel<T> { | |||
"of cudnn is larger than 7.2.1")); | |||
#endif | |||
} | |||
weight_to_tensor_list<T>(place, stream, &weight_grad_list, weight_list, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是否有更好的处理方式呢,感觉如果weight是不连续的grad才需要拷贝,如果weight是连续的话grad最好也是连续的,可能还是sharedata好些
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已设置成和weight相同的策略,只有不连续时才拷贝。
是否需要同时使用
test模式下,若提供了 |
ab8d51c
to
9a7388e
Compare
166a514
to
8414988
Compare
} | ||
|
||
bool grad_continuous = | ||
is_continuous<T, std::vector<Tensor *>>(weight_grad_list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
由于weight_grad_list中的各weight grad是由weight_grad_list[i]->mutable_data<T>(place)
分配的,这里很大可能会是不连续的,可能还是会每次都拷贝梯度。可否直接使用大块的weight grad,然后各小weight grad从中ShareDataWith。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
尝试过,输入是weight list,在c++端只能也用weight list的方式得到grad的计算。但是python端应该可以通过一些方式,把grad的内存分配为连续内存。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改为输入是weight list与大W共享内存的方式。
a99032a
to
ebd3b35
Compare
f3dfa83
to
8cf5cc7
Compare
1511ea5
to
92f6ea6
Compare
auto weight_list = ctx.MultiInput<framework::Tensor>("WeightList"); | ||
W->mutable_data<T>({weight_numel}, place); | ||
weight_to_tensor<T>(place, stream, weight_list, W); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里is_test==true
时是否会每次拷贝呢,可否在W未被初始化的时候拷贝呢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python端预测时会初始化W,这时候用的是W, 而且不会拷贝数据。
C++预测时不会初始化W,用的是weight_list,但是会拷贝weight_list到W。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* add flattern weight of lstm
PR types
New features
PR changes
OPs
Describe
将lstm cudnn的大块weight改成weight list 输入。如果python端使用的相邻内存,c++ 端直接只用首指针和大小调用;否则需要在c++端复制成一大块内存。
test模式下,若提供了W并且已初始化就优先使用W,否则使用WeightList,并将转换得到的参数保存在W中。
需要approve的内容
You must have one RD (cyj1986, Superjomn) approval for the changes of Inputs/Output/Attrs of OPs. The changes of OPs will cause that the new version inference fails to load model trained by the old version. Please modify your code.
You must have one RD (zhiqiu (Recommend) or phlrain) approval for the api change for the opreator-related api without 'core.ops'.
fluid.layers.lstm接口接下来会废除,使用PR27217中定义的新接口。
You must have one RD (XiaoguangHu01,Xreki,luotao1) approval for the usage (either add or delete) of const_cast.
Using ShareDataWith or ShareBufferWith is not recommended. You must have one RD's (zhhsplendid (Recommend), zhiqiu or luotao1 or lanxianghit) approval to use these methods. For more information, please refer to https://github.com/PaddlePaddle/Paddle/wiki/ShareDataWith-is-prohibited-in-OP. The error lines are as follows:
It is an Op accuracy problem, please take care of it. You must have one RD (zhangting2020 (Recommend), luotao1 or phlrain) approval for the usage (either add or delete) of @skip_check_grad_ci. For more information, please refer to: https://github.com/PaddlePaddle/Paddle/wiki/Gradient-Check-Is-Required-for-Op-Test.
接口兼容性问题的论证
增加的Reserve,StateOut输出接口是为了支持cudnn lstm C++ kernel在动态图中的使用。
原有lstm接口的双向在计算结果的维度会产生错误。
原有lstm接口的多层的结果有问题,原有接口一直用的输入是padding数据。但是用的cudnn的接口是处理unpadding数据的接口,虽然可以调用原有API进行计算,但是计算的结果和精度都是存在问题的。目前自有模型库仅存在1个调用原有API进行计算,且为多层的计算,应该修改为使用新接口计算。
外部用户因为API计算错误不可能在用。附上两个外部用户提出的issue lstm错误 #24300 fluid.layers.lstm接口参数is_bidirec不能发挥双向的作用 #22979
所以计划lstm op在2.0会进行大幅度的修改。以后也不会推荐用老的API接口,而是使用新增的API接口。