add parallel_nn api and unitest #110

luotao1 · 2016-09-23T10:37:13Z

No description provided.

reyoung

Looks good to me. Is there anybody want to review this PR?

* renew the install doc for V1.0 * refine doc * refine MAC_compile with user installed openblas * fix python3 related issue according to PaddlePaddle#13724 * refine python3 LD_LIBRARY and DYLD_LIBRARY set * fix FAQ * refine macos compile and install on python settings * refine mac compile command * fix comment related format problem * fix FAQ format problem * add mac compile on 10.14 * remove py3 compile on mac 10.14 * refine

* code clean * make Lower's return from vector to LowerFunc * move optime to Lower

* add topk & topp sampling * add description * fix * alter default decoding_strategy * update comments

update lib url

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* Optimizing the zero key problem in the push phase * Optimize CUDA thread parallelism in MergeGrad phase * Optimize CUDA thread parallelism in MergeGrad phase * Performance optimization, segment gradient merging * Performance optimization, segment gradient merging * Optimize pullsparse and increase keys aggregation * sync gpugraph to gpugraph_v2 (#86) * change load node and edge from local to cpu (#83) * change load node and edge * remove useless code Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * extract pull sparse as single stage(#85) Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> * [GPUGraph] graph sample v2 (#87) * change load node and edge from local to cpu (#83) * change load node and edge * remove useless code Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * extract pull sparse as single stage(#85) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * support ssdsparsetable;test=develop (#81) * graph sample v2 * remove log Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com> * Release cpu graph * uniq nodeid (#89) * compatible whole HBM mode (#91) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * Gpugraph v2 (#93) * compatible whole HBM mode * unify flag for graph emd storage mode and graph struct storage mode * format Co-authored-by: yangjunchao <yangjunchao@baidu.com> * split generate batch into multi stage (#92) * split generate batch into multi stage * fix conflict Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * [GpuGraph] Uniq feature (#95) * uniq feature * uniq feature * uniq feature * [GpuGraph] global startid (#98) * uniq feature * uniq feature * uniq feature * global startid * load node edge seperately and release graph (#99) * load node edge seperately and release graph * load node edge seperately and release graph Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * v2 infer (#102) * optimize begin pass and end pass (#106) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * fix ins no (#104) * [GPUGraph] fix FillOneStep args (#107) * fix ins no * fix FillOnestep args * fix bug for whole hbm mode (#110) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * [GPUGraph] fix infer && add infer_table_cap (#108) * fix ins no * fix FillOnestep args * fix infer && add infer table cap * fix infer * 【PSCORE】perform ssd sparse table (#111) * perform ssd sparsetable;test=develop Conflicts: paddle/fluid/framework/fleet/ps_gpu_wrapper.cc * perform ssd sparsetable;test=develop * remove debug code; * remove debug code; * add jemalloc cmake;test=develop * fix wrapper;test=develop * fix sample core (#114) * [GpuGraph] optimize shuffle batch (#115) * fix sample core * optimize shuffle batch * release gpu mem when sample end (#116) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * fix class not found err (PaddlePaddle#118) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * optimize sample (PaddlePaddle#117) * optimize sample * optimize sample Co-authored-by: yangjunchao <yangjunchao@baidu.com> * fix clear gpu mem (PaddlePaddle#119) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * fix sample core (PaddlePaddle#121) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * add ssd cache (PaddlePaddle#123) * add ssd cache;test=develop * add ssd cache;test=develop * add ssd cache;test=develop * add multi epoch train & fix train table change ins & save infer embeding (PaddlePaddle#129) * add multi epoch train & fix train table change ins & save infer embedding * change epoch finish judge * change epoch finish change Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * Add debug log (PaddlePaddle#131) * Add debug log * Add debug log Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com> * optimize mem in uniq slot feature (PaddlePaddle#130) * [GpuGraph] cherry pick var slot feature && fix load multi path node (PaddlePaddle#136) * optimize mem in uniq slot feature * cherry-pick var slot_feature Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> * [GpuGraph] fix kernel overflow (PaddlePaddle#138) * optimize mem in uniq slot feature * cherry-pick var slot_feature * fix kernel overflow && add max feature num flag Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> * fix ssd cache;test=develop (PaddlePaddle#139) * slot feature secondary storage (PaddlePaddle#140) * slot feature secondary storage * slot feature secondary storage Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com> Co-authored-by: xuewujiao <105861147+xuewujiao@users.noreply.github.com> Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: Thunderbrook <52529258+Thunderbrook@users.noreply.github.com> Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com> Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com>

* Optimizing the zero key problem in the push phase * Optimize CUDA thread parallelism in MergeGrad phase * Optimize CUDA thread parallelism in MergeGrad phase * Performance optimization, segment gradient merging * Performance optimization, segment gradient merging * Optimize pullsparse and increase keys aggregation * sync gpugraph to gpugraph_v2 (PaddlePaddle#86) * change load node and edge from local to cpu (PaddlePaddle#83) * change load node and edge * remove useless code Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * extract pull sparse as single stage(PaddlePaddle#85) Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> * [GPUGraph] graph sample v2 (PaddlePaddle#87) * change load node and edge from local to cpu (PaddlePaddle#83) * change load node and edge * remove useless code Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * extract pull sparse as single stage(PaddlePaddle#85) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * support ssdsparsetable;test=develop (PaddlePaddle#81) * graph sample v2 * remove log Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com> * Release cpu graph * uniq nodeid (PaddlePaddle#89) * compatible whole HBM mode (PaddlePaddle#91) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * Gpugraph v2 (PaddlePaddle#93) * compatible whole HBM mode * unify flag for graph emd storage mode and graph struct storage mode * format Co-authored-by: yangjunchao <yangjunchao@baidu.com> * split generate batch into multi stage (PaddlePaddle#92) * split generate batch into multi stage * fix conflict Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * [GpuGraph] Uniq feature (PaddlePaddle#95) * uniq feature * uniq feature * uniq feature * [GpuGraph] global startid (PaddlePaddle#98) * uniq feature * uniq feature * uniq feature * global startid * load node edge seperately and release graph (PaddlePaddle#99) * load node edge seperately and release graph * load node edge seperately and release graph Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * v2 infer (PaddlePaddle#102) * optimize begin pass and end pass (PaddlePaddle#106) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * fix ins no (PaddlePaddle#104) * [GPUGraph] fix FillOneStep args (PaddlePaddle#107) * fix ins no * fix FillOnestep args * fix bug for whole hbm mode (PaddlePaddle#110) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * [GPUGraph] fix infer && add infer_table_cap (PaddlePaddle#108) * fix ins no * fix FillOnestep args * fix infer && add infer table cap * fix infer * 【PSCORE】perform ssd sparse table (PaddlePaddle#111) * perform ssd sparsetable;test=develop Conflicts: paddle/fluid/framework/fleet/ps_gpu_wrapper.cc * perform ssd sparsetable;test=develop * remove debug code; * remove debug code; * add jemalloc cmake;test=develop * fix wrapper;test=develop * fix sample core (PaddlePaddle#114) * [GpuGraph] optimize shuffle batch (PaddlePaddle#115) * fix sample core * optimize shuffle batch * release gpu mem when sample end (PaddlePaddle#116) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * fix class not found err (PaddlePaddle#118) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * optimize sample (PaddlePaddle#117) * optimize sample * optimize sample Co-authored-by: yangjunchao <yangjunchao@baidu.com> * fix clear gpu mem (PaddlePaddle#119) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * fix sample core (PaddlePaddle#121) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * add ssd cache (PaddlePaddle#123) * add ssd cache;test=develop * add ssd cache;test=develop * add ssd cache;test=develop * add multi epoch train & fix train table change ins & save infer embeding (PaddlePaddle#129) * add multi epoch train & fix train table change ins & save infer embedding * change epoch finish judge * change epoch finish change Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * Add debug log (PaddlePaddle#131) * Add debug log * Add debug log Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com> * optimize mem in uniq slot feature (PaddlePaddle#130) * [GpuGraph] cherry pick var slot feature && fix load multi path node (PaddlePaddle#136) * optimize mem in uniq slot feature * cherry-pick var slot_feature Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> * [GpuGraph] fix kernel overflow (PaddlePaddle#138) * optimize mem in uniq slot feature * cherry-pick var slot_feature * fix kernel overflow && add max feature num flag Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> * fix ssd cache;test=develop (PaddlePaddle#139) * slot feature secondary storage (PaddlePaddle#140) * slot feature secondary storage * slot feature secondary storage Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com> Co-authored-by: xuewujiao <105861147+xuewujiao@users.noreply.github.com> Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: Thunderbrook <52529258+Thunderbrook@users.noreply.github.com> Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com> Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com>

fix typo in default.yaml

* fused_seqpool_cvm_with_conv support filter by threshold * add fill zero in fused_seqpool_cvm * add fused seq tensor && support transpose batch fc weight --------- Co-authored-by: mojingcj <ChengJing_dhu@163.com> Co-authored-by: jiaoxuewu <jiaoxuewu@163.com> Co-authored-by: yuandong1998 <1377526365@qq.com> Co-authored-by: shangzhongbin <shangzhongbin@baidu.com>

support fused seq tensor for xpu

PaddlePaddle#55713 修改了BF16默认黑白名单导致当前代码会报错。因此需要手动将layer norm添加bf16 o1白名单 ![bd3f6f1c3d576870635f19e8a49f04f6](https://github.com/PaddlePaddle/PaddleMIX/assets/50394665/33d20150-fa49-43c8-b421-e89618ed43ea)

add parallel_nn api and unitest

8b611e5

luotao1 assigned reyoung, qingqing01, emailweixu and hedaoyuan Sep 23, 2016

reyoung approved these changes Sep 27, 2016

View reviewed changes

reyoung merged commit ffc3416 into PaddlePaddle:master Sep 27, 2016

luotao1 deleted the parallelnn branch September 27, 2016 05:25

thisjiang pushed a commit to thisjiang/Paddle that referenced this pull request Oct 28, 2021

refine lower (PaddlePaddle#110)

91de356

* code clean * make Lower's return from vector to LowerFunc * move optime to Lower

wangxicoding pushed a commit to wangxicoding/Paddle that referenced this pull request Dec 9, 2021

[Faster Transformer] Add topk & topp sampling (PaddlePaddle#110)

adb5ae5

* add topk & topp sampling * add description * fix * alter default decoding_strategy * update comments

zhoutianzi666 pushed a commit to zhoutianzi666/Paddle that referenced this pull request May 23, 2022

Merge pull request PaddlePaddle#110 from jiweibo/update_lib

4fe07c0

update lib url

danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Sep 14, 2022

fix bug for whole hbm mode (PaddlePaddle#110)

cc71f56

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Dec 6, 2023

Merge pull request PaddlePaddle#110 from eltociear/patch-1

06da275

fix typo in default.yaml

lizexu123 pushed a commit to lizexu123/Paddle that referenced this pull request Feb 23, 2024

fix bugs (PaddlePaddle#110)

a7b93e0

jack603047588 pushed a commit to jiaoxuewu/PaddleBox that referenced this pull request Oct 29, 2024

Merge pull request PaddlePaddle#110 from YaoCheng8667/paddlebox-yc

6c25fc6

support fused seq tensor for xpu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add parallel_nn api and unitest #110

add parallel_nn api and unitest #110

luotao1 commented Sep 23, 2016

reyoung left a comment

add parallel_nn api and unitest #110

add parallel_nn api and unitest #110

Conversation

luotao1 commented Sep 23, 2016

reyoung left a comment

Choose a reason for hiding this comment