-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug diff of convolution layer in OCR CTC model #855
Comments
fluid CPU VS V1 CPU
并将diff在1e-05和0.0001之间的元素打印如下:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
背景
用paddle fluid API实现的网络和用paddle V1 API实现的网络,在训练过程中,不能严格对齐。
取一张图片做实验,batch_size为1,只执行一个pass(一次forward计算+一次backward计算),对比V1 API和fluid API:
forward结果一致;backward在第二个conv层出现diff,即第二个conv层计算出的input_grad不一致。
取一张图片做实验,batch_size为1,只执行一个pass(一次forward计算+一次backward计算);用fluid API实验两次,对比两次结果:
GPU: forward一致; backward在第二个conv层出现diff,即第二个conv层计算出的input_grad不一致。
CPU: forward和backward都一致
Debug
fluid GPU VS GPU
从上述实验中抽出第二个conv层的
input data
,filter parameter
和output grad
, 单独执行conv_grad
。并对执行两次conv_grad的结果input grad
统计对比,结果如下:上述结果表示diff在1e-11和1e-10之间的元素数量为33,以此类推。
并将diff在1e-11和1e-10之间的元素打印如下:
fluid CPU VS CPU
CPU跑两次实验,无diff
fluid GPU VS CPU
The text was updated successfully, but these errors were encountered: