Debug diff of convolution layer in OCR CTC model #855

wanghaoshuang · 2018-04-16T07:01:08Z

背景

用paddle fluid API实现的网络和用paddle V1 API实现的网络，在训练过程中，不能严格对齐。

取一张图片做实验，batch_size为1，只执行一个pass(一次forward计算+一次backward计算)，对比V1 API和fluid API：
forward结果一致；backward在第二个conv层出现diff，即第二个conv层计算出的input_grad不一致。

取一张图片做实验，batch_size为1，只执行一个pass(一次forward计算+一次backward计算)；用fluid API实验两次，对比两次结果：
GPU: forward一致; backward在第二个conv层出现diff，即第二个conv层计算出的input_grad不一致。
CPU: forward和backward都一致

Debug

fluid GPU VS GPU

从上述实验中抽出第二个conv层的input data, filter parameter和output grad, 单独执行conv_grad。并对执行两次conv_grad的结果input grad统计对比，结果如下：

threhold: 100; count: 0
threhold: 10.0; count: 0
threhold: 1.0; count: 0
threhold: 0.1; count: 0
threhold: 0.01; count: 0
threhold: 0.001; count: 0
threhold: 0.0001; count: 0
threhold: 1e-05; count: 0
threhold: 1e-06; count: 0
threhold: 1e-07; count: 0
threhold: 1e-08; count: 0
threhold: 1e-09; count: 0
threhold: 1e-10; count: 33 // 1e-11< diff < 1e-10
threhold: 1e-11; count: 507
threhold: 1e-12; count: 549
threhold: 1e-13; count: 737
threhold: 1e-14; count: 360
threhold: 1e-15; count: 132
threhold: 1e-16; count: 445
threhold: 1e-17; count: 0

上述结果表示diff在1e-11和1e-10之间的元素数量为33，以此类推。
并将diff在1e-11和1e-10之间的元素打印如下：

-1.88291e-06 VS -1.8829e-06
-1.48766e-06 VS -1.48767e-06
-1.44663e-06 VS -1.44662e-06
1.07083e-06 VS 1.07084e-06
-2.38668e-06 VS -2.38669e-06
6.31052e-06 VS 6.31053e-06
1.62304e-06 VS 1.62303e-06
1.41462e-06 VS 1.41461e-06
-1.12677e-06 VS -1.12678e-06
-1.15945e-06 VS -1.15944e-06
-1.25673e-06 VS -1.25674e-06
-2.85901e-06 VS -2.85902e-06
-1.42249e-06 VS -1.42248e-06
-1.829e-06 VS -1.82899e-06
-1.22958e-06 VS -1.22959e-06
-1.94547e-06 VS -1.94548e-06
1.8461e-06 VS 1.84611e-06
-1.34486e-06 VS -1.34487e-06
1.80192e-06 VS 1.80193e-06
2.2543e-06 VS 2.25429e-06
-2.06272e-06 VS -2.06273e-06
-1.89758e-06 VS -1.89759e-06
1.39698e-06 VS 1.39699e-06
-1.5601e-06 VS -1.56011e-06
-1.00525e-06 VS -1.00524e-06
1.06274e-06 VS 1.06273e-06
1.0087e-06 VS 1.00871e-06
-3.63364e-06 VS -3.63363e-06
-3.16802e-06 VS -3.16801e-06
1.57705e-06 VS 1.57706e-06
5.08058e-06 VS 5.08057e-06
1.71452e-06 VS 1.71453e-06
-3.4769e-06 VS -3.47689e-06

fluid CPU VS CPU

CPU跑两次实验，无diff

fluid GPU VS CPU

threhold: 100; count: 0
threhold: 10.0; count: 0
threhold: 1.0; count: 0
threhold: 0.1; count: 0
threhold: 0.01; count: 0
threhold: 0.001; count: 0
threhold: 0.0001; count: 0
threhold: 1e-05; count: 0
threhold: 1e-06; count: 0
threhold: 1e-07; count: 0
threhold: 1e-08; count: 0
threhold: 1e-09; count: 0
threhold: 1e-10; count: 82
threhold: 1e-11; count: 1216
threhold: 1e-12; count: 1307
threhold: 1e-13; count: 1554
threhold: 1e-14; count: 920
threhold: 1e-15; count: 338
threhold: 1e-16; count: 1058
threhold: 1e-17; count: 0

The text was updated successfully, but these errors were encountered:

wanghaoshuang · 2018-04-23T09:22:29Z

fluid CPU VS V1 CPU

threhold: 100; count: 0
threhold: 10.0; count: 0
threhold: 1.0; count: 0
threhold: 0.1; count: 0
threhold: 0.01; count: 0
threhold: 0.001; count: 0
threhold: 0.0001; count: 160
threhold: 1e-05; count: 4385
threhold: 1e-06; count: 5893
threhold: 1e-07; count: 4872
threhold: 1e-08; count: 23932
threhold: 1e-09; count: 16564
threhold: 1e-10; count: 4475
threhold: 1e-11; count: 841
threhold: 1e-12; count: 10
threhold: 1e-13; count: 0
threhold: 1e-14; count: 0
threhold: 1e-15; count: 0
threhold: 1e-16; count: 0
threhold: 1e-17; count: 0

并将diff在1e-05和0.0001之间的元素打印如下：

-1.05304 VS -1.05303
-1.72346 VS -1.72347
-1.28508 VS -1.28509
1.04012 VS 1.04013
-1.04445 VS -1.04444
-1.0233 VS -1.02329
-1.14327 VS -1.14328
-2.18111 VS -2.18112
-1.54561 VS -1.54562
-1.19549 VS -1.1955
-1.00736 VS -1.00737
-2.80709 VS -2.80708
-2.08709 VS -2.0871
-1.27759 VS -1.2776
-1.02156 VS -1.02155
1.64062 VS 1.64061
1.31589 VS 1.3159
1.23173 VS 1.23174
1.02107 VS 1.02108
1.44508 VS 1.44509
1.8632 VS 1.86319
-2.65495 VS -2.65496
1.37069 VS 1.37068
-1.50813 VS -1.50814
-1.02105 VS -1.02104
-2.16898 VS -2.16899
-1.35952 VS -1.35951
1.33849 VS 1.3385
1.39716 VS 1.39715
2.00454 VS 2.00453
2.29597 VS 2.29598
1.60958 VS 1.60959
-1.56291 VS -1.56292
1.24035 VS 1.24034
2.00508 VS 2.00507
-2.37763 VS -2.37764
1.13701 VS 1.137
1.01417 VS 1.01418
2.23855 VS 2.23854
1.22844 VS 1.22845
-1.203 VS -1.20299
-1.05117 VS -1.05118
-1.35208 VS -1.35209
1.52379 VS 1.5238
1.93136 VS 1.93137
-1.11863 VS -1.11862
1.56314 VS 1.56313
-2.27473 VS -2.27474
-1.41401 VS -1.41402
-1.91326 VS -1.91327
-1.64669 VS -1.6467
1.05009 VS 1.0501
-1.16532 VS -1.16533
-2.07061 VS -2.07062
-1.69224 VS -1.69223
-1.21757 VS -1.21758
1.03189 VS 1.03188
-1.31662 VS -1.31663
-1.18551 VS -1.1855
2.48431 VS 2.48432
-1.66829 VS -1.66828
-1.09787 VS -1.09788
-1.90295 VS -1.90296
-2.57979 VS -2.57978
1.97951 VS 1.9795
2.31804 VS 2.31803
-1.45399 VS -1.454
-1.06898 VS -1.06899
1.26757 VS 1.26756
-1.58222 VS -1.58223
-2.47277 VS -2.47276
1.51811 VS 1.5181
-1.85434 VS -1.85433
-1.07029 VS -1.07028
1.15273 VS 1.15274
-1.18402 VS -1.18401
1.40127 VS 1.40128
1.16825 VS 1.16824
-1.591 VS -1.59099
-1.56114 VS -1.56113
-1.8166 VS -1.81659
-1.07321 VS -1.0732
-2.13524 VS -2.13523
1.40516 VS 1.40517
-1.08656 VS -1.08657
-1.34528 VS -1.34529
1.18319 VS 1.18318
-1.85775 VS -1.85774
2.00879 VS 2.00878
1.48107 VS 1.48106
-1.02107 VS -1.02108
-1.02281 VS -1.02282
1.07779 VS 1.07778
1.14049 VS 1.14048
1.76196 VS 1.76195
-1.93463 VS -1.93462
-1.1016 VS -1.10161
-1.68991 VS -1.6899
1.71782 VS 1.71783
1.55413 VS 1.55412
1.11636 VS 1.11637
-1.04733 VS -1.04732
1.29417 VS 1.29418
-1.41759 VS -1.4176
-1.31412 VS -1.31411
1.69707 VS 1.69706
1.01989 VS 1.0199
-1.06463 VS -1.06462
-1.08839 VS -1.08838
-1.40561 VS -1.4056
-1.45775 VS -1.45774
-1.07009 VS -1.07008
-1.50411 VS -1.5041
-1.62906 VS -1.62907
-1.35848 VS -1.35849
-1.0662 VS -1.06619
-1.43031 VS -1.43032
-1.19385 VS -1.19384
1.79701 VS 1.797
1.00047 VS 1.00048
2.62988 VS 2.62987
1.13155 VS 1.13154
1.0849 VS 1.08491
-1.00726 VS -1.00725
-2.41854 VS -2.41853
-3.08378 VS -3.08377
-2.52872 VS -2.52871
-1.21074 VS -1.21075
2.2652 VS 2.26521
3.22046 VS 3.22045
1.04539 VS 1.04538
1.52448 VS 1.52447
1.21536 VS 1.21537
1.54722 VS 1.54721
-1.58253 VS -1.58254
-2.04054 VS -2.04053
-1.02315 VS -1.02316
2.32118 VS 2.32119
2.50569 VS 2.50568
2.95532 VS 2.95533
1.79294 VS 1.79295
2.61401 VS 2.61402
1.75518 VS 1.75519
1.73679 VS 1.73678
1.67191 VS 1.6719
2.63262 VS 2.63261
1.43763 VS 1.43762
1.62555 VS 1.62554
1.6364 VS 1.63639
-1.48356 VS -1.48355
-2.19141 VS -2.1914
1.87345 VS 1.87344
-1.72803 VS -1.72804
2.36609 VS 2.3661
2.59212 VS 2.59211
1.9238 VS 1.92381
1.46245 VS 1.46246
-1.06982 VS -1.06983
-1.30084 VS -1.30083
-1.07191 VS -1.07192

qingqing01 closed this as completed Jul 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debug diff of convolution layer in OCR CTC model #855

Debug diff of convolution layer in OCR CTC model #855

wanghaoshuang commented Apr 16, 2018 •

edited

Loading

wanghaoshuang commented Apr 23, 2018 •

edited

Loading

Debug diff of convolution layer in OCR CTC model #855

Debug diff of convolution layer in OCR CTC model #855

Comments

wanghaoshuang commented Apr 16, 2018 • edited Loading

背景

Debug

fluid GPU VS GPU

fluid CPU VS CPU

fluid GPU VS CPU

wanghaoshuang commented Apr 23, 2018 • edited Loading

fluid CPU VS V1 CPU

wanghaoshuang commented Apr 16, 2018 •

edited

Loading

wanghaoshuang commented Apr 23, 2018 •

edited

Loading