fix cudnn conv bug which occurs in image classfication demo in GTX GPU #107

qingqing01 · 2016-09-23T08:09:29Z

The bug occurs in layer conv7 at the last batch during testing period in GTX GPU. It is a little strange why there is no problem from the layer conv1 to conv6, but occurs in conv7.
Now revise code to select algorithm when batch size is changed. It is correct now in GeForce GTX 980.

hedaoyuan · 2016-09-23T09:17:50Z

paddle/gserver/layers/CudnnConvLayer.cpp

@@ -132,6 +133,9 @@ void CudnnConvLayer::reshape(int batchSize) {
  getOutput().setFrameHeight(outputH_);
  getOutput().setFrameWidth(outputW_);

+  isSelectAlgo_ = (batchSize == batchNum_);


add some comment.

* enable mod compute * code clean and bug fix

* 1. fix download lcqmc dataset 2. fix installation docs

update python inference model version from 1.8 to 2.0

* fix ins no * fix FillOnestep args

* Optimizing the zero key problem in the push phase * Optimize CUDA thread parallelism in MergeGrad phase * Optimize CUDA thread parallelism in MergeGrad phase * Performance optimization, segment gradient merging * Performance optimization, segment gradient merging * Optimize pullsparse and increase keys aggregation * sync gpugraph to gpugraph_v2 (#86) * change load node and edge from local to cpu (#83) * change load node and edge * remove useless code Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * extract pull sparse as single stage(#85) Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> * [GPUGraph] graph sample v2 (#87) * change load node and edge from local to cpu (#83) * change load node and edge * remove useless code Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * extract pull sparse as single stage(#85) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * support ssdsparsetable;test=develop (#81) * graph sample v2 * remove log Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com> * Release cpu graph * uniq nodeid (#89) * compatible whole HBM mode (#91) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * Gpugraph v2 (#93) * compatible whole HBM mode * unify flag for graph emd storage mode and graph struct storage mode * format Co-authored-by: yangjunchao <yangjunchao@baidu.com> * split generate batch into multi stage (#92) * split generate batch into multi stage * fix conflict Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * [GpuGraph] Uniq feature (#95) * uniq feature * uniq feature * uniq feature * [GpuGraph] global startid (#98) * uniq feature * uniq feature * uniq feature * global startid * load node edge seperately and release graph (#99) * load node edge seperately and release graph * load node edge seperately and release graph Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * v2 infer (#102) * optimize begin pass and end pass (#106) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * fix ins no (#104) * [GPUGraph] fix FillOneStep args (#107) * fix ins no * fix FillOnestep args * fix bug for whole hbm mode (#110) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * [GPUGraph] fix infer && add infer_table_cap (#108) * fix ins no * fix FillOnestep args * fix infer && add infer table cap * fix infer * 【PSCORE】perform ssd sparse table (#111) * perform ssd sparsetable;test=develop Conflicts: paddle/fluid/framework/fleet/ps_gpu_wrapper.cc * perform ssd sparsetable;test=develop * remove debug code; * remove debug code; * add jemalloc cmake;test=develop * fix wrapper;test=develop * fix sample core (#114) * [GpuGraph] optimize shuffle batch (#115) * fix sample core * optimize shuffle batch * release gpu mem when sample end (#116) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * fix class not found err (PaddlePaddle#118) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * optimize sample (PaddlePaddle#117) * optimize sample * optimize sample Co-authored-by: yangjunchao <yangjunchao@baidu.com> * fix clear gpu mem (PaddlePaddle#119) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * fix sample core (PaddlePaddle#121) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * add ssd cache (PaddlePaddle#123) * add ssd cache;test=develop * add ssd cache;test=develop * add ssd cache;test=develop * add multi epoch train & fix train table change ins & save infer embeding (PaddlePaddle#129) * add multi epoch train & fix train table change ins & save infer embedding * change epoch finish judge * change epoch finish change Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * Add debug log (PaddlePaddle#131) * Add debug log * Add debug log Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com> * optimize mem in uniq slot feature (PaddlePaddle#130) * [GpuGraph] cherry pick var slot feature && fix load multi path node (PaddlePaddle#136) * optimize mem in uniq slot feature * cherry-pick var slot_feature Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> * [GpuGraph] fix kernel overflow (PaddlePaddle#138) * optimize mem in uniq slot feature * cherry-pick var slot_feature * fix kernel overflow && add max feature num flag Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> * fix ssd cache;test=develop (PaddlePaddle#139) * slot feature secondary storage (PaddlePaddle#140) * slot feature secondary storage * slot feature secondary storage Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com> Co-authored-by: xuewujiao <105861147+xuewujiao@users.noreply.github.com> Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: Thunderbrook <52529258+Thunderbrook@users.noreply.github.com> Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com> Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com>

* Optimizing the zero key problem in the push phase * Optimize CUDA thread parallelism in MergeGrad phase * Optimize CUDA thread parallelism in MergeGrad phase * Performance optimization, segment gradient merging * Performance optimization, segment gradient merging * Optimize pullsparse and increase keys aggregation * sync gpugraph to gpugraph_v2 (PaddlePaddle#86) * change load node and edge from local to cpu (PaddlePaddle#83) * change load node and edge * remove useless code Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * extract pull sparse as single stage(PaddlePaddle#85) Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> * [GPUGraph] graph sample v2 (PaddlePaddle#87) * change load node and edge from local to cpu (PaddlePaddle#83) * change load node and edge * remove useless code Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * extract pull sparse as single stage(PaddlePaddle#85) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * support ssdsparsetable;test=develop (PaddlePaddle#81) * graph sample v2 * remove log Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com> * Release cpu graph * uniq nodeid (PaddlePaddle#89) * compatible whole HBM mode (PaddlePaddle#91) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * Gpugraph v2 (PaddlePaddle#93) * compatible whole HBM mode * unify flag for graph emd storage mode and graph struct storage mode * format Co-authored-by: yangjunchao <yangjunchao@baidu.com> * split generate batch into multi stage (PaddlePaddle#92) * split generate batch into multi stage * fix conflict Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * [GpuGraph] Uniq feature (PaddlePaddle#95) * uniq feature * uniq feature * uniq feature * [GpuGraph] global startid (PaddlePaddle#98) * uniq feature * uniq feature * uniq feature * global startid * load node edge seperately and release graph (PaddlePaddle#99) * load node edge seperately and release graph * load node edge seperately and release graph Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * v2 infer (PaddlePaddle#102) * optimize begin pass and end pass (PaddlePaddle#106) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * fix ins no (PaddlePaddle#104) * [GPUGraph] fix FillOneStep args (PaddlePaddle#107) * fix ins no * fix FillOnestep args * fix bug for whole hbm mode (PaddlePaddle#110) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * [GPUGraph] fix infer && add infer_table_cap (PaddlePaddle#108) * fix ins no * fix FillOnestep args * fix infer && add infer table cap * fix infer * 【PSCORE】perform ssd sparse table (PaddlePaddle#111) * perform ssd sparsetable;test=develop Conflicts: paddle/fluid/framework/fleet/ps_gpu_wrapper.cc * perform ssd sparsetable;test=develop * remove debug code; * remove debug code; * add jemalloc cmake;test=develop * fix wrapper;test=develop * fix sample core (PaddlePaddle#114) * [GpuGraph] optimize shuffle batch (PaddlePaddle#115) * fix sample core * optimize shuffle batch * release gpu mem when sample end (PaddlePaddle#116) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * fix class not found err (PaddlePaddle#118) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * optimize sample (PaddlePaddle#117) * optimize sample * optimize sample Co-authored-by: yangjunchao <yangjunchao@baidu.com> * fix clear gpu mem (PaddlePaddle#119) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * fix sample core (PaddlePaddle#121) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * add ssd cache (PaddlePaddle#123) * add ssd cache;test=develop * add ssd cache;test=develop * add ssd cache;test=develop * add multi epoch train & fix train table change ins & save infer embeding (PaddlePaddle#129) * add multi epoch train & fix train table change ins & save infer embedding * change epoch finish judge * change epoch finish change Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * Add debug log (PaddlePaddle#131) * Add debug log * Add debug log Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com> * optimize mem in uniq slot feature (PaddlePaddle#130) * [GpuGraph] cherry pick var slot feature && fix load multi path node (PaddlePaddle#136) * optimize mem in uniq slot feature * cherry-pick var slot_feature Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> * [GpuGraph] fix kernel overflow (PaddlePaddle#138) * optimize mem in uniq slot feature * cherry-pick var slot_feature * fix kernel overflow && add max feature num flag Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> * fix ssd cache;test=develop (PaddlePaddle#139) * slot feature secondary storage (PaddlePaddle#140) * slot feature secondary storage * slot feature secondary storage Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com> Co-authored-by: xuewujiao <105861147+xuewujiao@users.noreply.github.com> Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: Thunderbrook <52529258+Thunderbrook@users.noreply.github.com> Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com> Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com>

… 22% (PaddlePaddle#107) Co-authored-by: yangjunchao <yangjunchao@baidu.com>

update bkcl and xdnn.

qingqing01 assigned hedaoyuan, gangliao and emailweixu Sep 23, 2016

fix cudnn conv bug which occurs in image classfication demo in GTX GPU

95da095

qingqing01 force-pushed the cudnn_conv branch from e639979 to 95da095 Compare September 23, 2016 08:24

qingqing01 assigned reyoung Sep 23, 2016

hedaoyuan reviewed Sep 23, 2016

View reviewed changes

Update CudnnConvLayer.cpp

c1c07bb

hedaoyuan merged commit 341486d into PaddlePaddle:master Sep 23, 2016

qingqing01 mentioned this pull request Sep 23, 2016

image_classification cifar-10 train error #95

Closed

This was referenced Sep 29, 2016

Compatiblity issue with CUDA 8.0 #18

Closed

测试vgg_16_cifar.py报错 #9

Closed

alvations mentioned this pull request Oct 5, 2016

"cudaSuccess == err (0 vs. 8)" error on v0.8.0b1 #158

Closed

qingqing01 deleted the cudnn_conv branch October 18, 2016 09:51

DemoMoon mentioned this pull request Mar 24, 2021

oneDNN 如何能提升DeepSpeech的语音处理性能 #31838

Closed

thisjiang pushed a commit to thisjiang/Paddle that referenced this pull request Oct 28, 2021

CAS support mod inference (PaddlePaddle#107)

13b63ea

* enable mod compute * code clean and bug fix

wangxicoding pushed a commit to wangxicoding/Paddle that referenced this pull request Dec 9, 2021

fix download lcqmc dataset (PaddlePaddle#107)

5955367

* 1. fix download lcqmc dataset 2. fix installation docs

zhoutianzi666 pushed a commit to zhoutianzi666/Paddle that referenced this pull request May 23, 2022

Merge pull request PaddlePaddle#107 from winter-wang/model_to_2.0

cb38d31

update python inference model version from 1.8 to 2.0

Thunderbrook added a commit to Thunderbrook/Paddle that referenced this pull request Sep 9, 2022

[GPUGraph] fix FillOneStep args (PaddlePaddle#107)

582d236

* fix ins no * fix FillOnestep args

XDyaoshi mentioned this pull request Mar 13, 2023

【PaddlePaddle Hackathon 第四期】任务总览 #51281

Closed

laipaang pushed a commit to laipaang/Paddle that referenced this pull request Jan 16, 2024

add cusparseLt 0.4 to speed up ffn1,ffn2,qkvo multiplication,speed up…

80933a8

… 22% (PaddlePaddle#107) Co-authored-by: yangjunchao <yangjunchao@baidu.com>

jack603047588 pushed a commit to jiaoxuewu/PaddleBox that referenced this pull request Oct 29, 2024

Merge pull request PaddlePaddle#107 from tiancaitzp/paddlebox

b396e30

update bkcl and xdnn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix cudnn conv bug which occurs in image classfication demo in GTX GPU #107

fix cudnn conv bug which occurs in image classfication demo in GTX GPU #107

qingqing01 commented Sep 23, 2016 •

edited

Loading

hedaoyuan Sep 23, 2016

fix cudnn conv bug which occurs in image classfication demo in GTX GPU #107

fix cudnn conv bug which occurs in image classfication demo in GTX GPU #107

Conversation

qingqing01 commented Sep 23, 2016 • edited Loading

hedaoyuan Sep 23, 2016

Choose a reason for hiding this comment

qingqing01 commented Sep 23, 2016 •

edited

Loading