-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add parallel_nn api and unitest #110
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
reyoung
approved these changes
Sep 27, 2016
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Is there anybody want to review this PR?
zhhsplendid
pushed a commit
to zhhsplendid/Paddle
that referenced
this pull request
Sep 25, 2019
* renew the install doc for V1.0 * refine doc * refine MAC_compile with user installed openblas * fix python3 related issue according to PaddlePaddle#13724 * refine python3 LD_LIBRARY and DYLD_LIBRARY set * fix FAQ * refine macos compile and install on python settings * refine mac compile command * fix comment related format problem * fix FAQ format problem * add mac compile on 10.14 * remove py3 compile on mac 10.14 * refine
thisjiang
pushed a commit
to thisjiang/Paddle
that referenced
this pull request
Oct 28, 2021
* code clean * make Lower's return from vector to LowerFunc * move optime to Lower
wangxicoding
pushed a commit
to wangxicoding/Paddle
that referenced
this pull request
Dec 9, 2021
* add topk & topp sampling * add description * fix * alter default decoding_strategy * update comments
zhoutianzi666
pushed a commit
to zhoutianzi666/Paddle
that referenced
this pull request
May 23, 2022
update lib url
danleifeng
pushed a commit
to danleifeng/Paddle
that referenced
this pull request
Sep 14, 2022
Co-authored-by: yangjunchao <yangjunchao@baidu.com>
qingshui
referenced
this pull request
in qingshui/Paddle
Nov 14, 2022
* Optimizing the zero key problem in the push phase * Optimize CUDA thread parallelism in MergeGrad phase * Optimize CUDA thread parallelism in MergeGrad phase * Performance optimization, segment gradient merging * Performance optimization, segment gradient merging * Optimize pullsparse and increase keys aggregation * sync gpugraph to gpugraph_v2 (#86) * change load node and edge from local to cpu (#83) * change load node and edge * remove useless code Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * extract pull sparse as single stage(#85) Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> * [GPUGraph] graph sample v2 (#87) * change load node and edge from local to cpu (#83) * change load node and edge * remove useless code Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * extract pull sparse as single stage(#85) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * support ssdsparsetable;test=develop (#81) * graph sample v2 * remove log Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com> * Release cpu graph * uniq nodeid (#89) * compatible whole HBM mode (#91) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * Gpugraph v2 (#93) * compatible whole HBM mode * unify flag for graph emd storage mode and graph struct storage mode * format Co-authored-by: yangjunchao <yangjunchao@baidu.com> * split generate batch into multi stage (#92) * split generate batch into multi stage * fix conflict Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * [GpuGraph] Uniq feature (#95) * uniq feature * uniq feature * uniq feature * [GpuGraph] global startid (#98) * uniq feature * uniq feature * uniq feature * global startid * load node edge seperately and release graph (#99) * load node edge seperately and release graph * load node edge seperately and release graph Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * v2 infer (#102) * optimize begin pass and end pass (#106) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * fix ins no (#104) * [GPUGraph] fix FillOneStep args (#107) * fix ins no * fix FillOnestep args * fix bug for whole hbm mode (#110) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * [GPUGraph] fix infer && add infer_table_cap (#108) * fix ins no * fix FillOnestep args * fix infer && add infer table cap * fix infer * 【PSCORE】perform ssd sparse table (#111) * perform ssd sparsetable;test=develop Conflicts: paddle/fluid/framework/fleet/ps_gpu_wrapper.cc * perform ssd sparsetable;test=develop * remove debug code; * remove debug code; * add jemalloc cmake;test=develop * fix wrapper;test=develop * fix sample core (#114) * [GpuGraph] optimize shuffle batch (#115) * fix sample core * optimize shuffle batch * release gpu mem when sample end (#116) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * fix class not found err (PaddlePaddle#118) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * optimize sample (PaddlePaddle#117) * optimize sample * optimize sample Co-authored-by: yangjunchao <yangjunchao@baidu.com> * fix clear gpu mem (PaddlePaddle#119) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * fix sample core (PaddlePaddle#121) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * add ssd cache (PaddlePaddle#123) * add ssd cache;test=develop * add ssd cache;test=develop * add ssd cache;test=develop * add multi epoch train & fix train table change ins & save infer embeding (PaddlePaddle#129) * add multi epoch train & fix train table change ins & save infer embedding * change epoch finish judge * change epoch finish change Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * Add debug log (PaddlePaddle#131) * Add debug log * Add debug log Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com> * optimize mem in uniq slot feature (PaddlePaddle#130) * [GpuGraph] cherry pick var slot feature && fix load multi path node (PaddlePaddle#136) * optimize mem in uniq slot feature * cherry-pick var slot_feature Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> * [GpuGraph] fix kernel overflow (PaddlePaddle#138) * optimize mem in uniq slot feature * cherry-pick var slot_feature * fix kernel overflow && add max feature num flag Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> * fix ssd cache;test=develop (PaddlePaddle#139) * slot feature secondary storage (PaddlePaddle#140) * slot feature secondary storage * slot feature secondary storage Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com> Co-authored-by: xuewujiao <105861147+xuewujiao@users.noreply.github.com> Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: Thunderbrook <52529258+Thunderbrook@users.noreply.github.com> Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com> Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com>
zmxdream
pushed a commit
to zmxdream/Paddle
that referenced
this pull request
Dec 7, 2022
* Optimizing the zero key problem in the push phase * Optimize CUDA thread parallelism in MergeGrad phase * Optimize CUDA thread parallelism in MergeGrad phase * Performance optimization, segment gradient merging * Performance optimization, segment gradient merging * Optimize pullsparse and increase keys aggregation * sync gpugraph to gpugraph_v2 (PaddlePaddle#86) * change load node and edge from local to cpu (PaddlePaddle#83) * change load node and edge * remove useless code Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * extract pull sparse as single stage(PaddlePaddle#85) Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> * [GPUGraph] graph sample v2 (PaddlePaddle#87) * change load node and edge from local to cpu (PaddlePaddle#83) * change load node and edge * remove useless code Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * extract pull sparse as single stage(PaddlePaddle#85) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * support ssdsparsetable;test=develop (PaddlePaddle#81) * graph sample v2 * remove log Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com> * Release cpu graph * uniq nodeid (PaddlePaddle#89) * compatible whole HBM mode (PaddlePaddle#91) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * Gpugraph v2 (PaddlePaddle#93) * compatible whole HBM mode * unify flag for graph emd storage mode and graph struct storage mode * format Co-authored-by: yangjunchao <yangjunchao@baidu.com> * split generate batch into multi stage (PaddlePaddle#92) * split generate batch into multi stage * fix conflict Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * [GpuGraph] Uniq feature (PaddlePaddle#95) * uniq feature * uniq feature * uniq feature * [GpuGraph] global startid (PaddlePaddle#98) * uniq feature * uniq feature * uniq feature * global startid * load node edge seperately and release graph (PaddlePaddle#99) * load node edge seperately and release graph * load node edge seperately and release graph Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * v2 infer (PaddlePaddle#102) * optimize begin pass and end pass (PaddlePaddle#106) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * fix ins no (PaddlePaddle#104) * [GPUGraph] fix FillOneStep args (PaddlePaddle#107) * fix ins no * fix FillOnestep args * fix bug for whole hbm mode (PaddlePaddle#110) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * [GPUGraph] fix infer && add infer_table_cap (PaddlePaddle#108) * fix ins no * fix FillOnestep args * fix infer && add infer table cap * fix infer * 【PSCORE】perform ssd sparse table (PaddlePaddle#111) * perform ssd sparsetable;test=develop Conflicts: paddle/fluid/framework/fleet/ps_gpu_wrapper.cc * perform ssd sparsetable;test=develop * remove debug code; * remove debug code; * add jemalloc cmake;test=develop * fix wrapper;test=develop * fix sample core (PaddlePaddle#114) * [GpuGraph] optimize shuffle batch (PaddlePaddle#115) * fix sample core * optimize shuffle batch * release gpu mem when sample end (PaddlePaddle#116) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * fix class not found err (PaddlePaddle#118) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * optimize sample (PaddlePaddle#117) * optimize sample * optimize sample Co-authored-by: yangjunchao <yangjunchao@baidu.com> * fix clear gpu mem (PaddlePaddle#119) Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * fix sample core (PaddlePaddle#121) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * add ssd cache (PaddlePaddle#123) * add ssd cache;test=develop * add ssd cache;test=develop * add ssd cache;test=develop * add multi epoch train & fix train table change ins & save infer embeding (PaddlePaddle#129) * add multi epoch train & fix train table change ins & save infer embedding * change epoch finish judge * change epoch finish change Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> * Add debug log (PaddlePaddle#131) * Add debug log * Add debug log Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com> * optimize mem in uniq slot feature (PaddlePaddle#130) * [GpuGraph] cherry pick var slot feature && fix load multi path node (PaddlePaddle#136) * optimize mem in uniq slot feature * cherry-pick var slot_feature Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> * [GpuGraph] fix kernel overflow (PaddlePaddle#138) * optimize mem in uniq slot feature * cherry-pick var slot_feature * fix kernel overflow && add max feature num flag Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> * fix ssd cache;test=develop (PaddlePaddle#139) * slot feature secondary storage (PaddlePaddle#140) * slot feature secondary storage * slot feature secondary storage Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com> Co-authored-by: xuewujiao <105861147+xuewujiao@users.noreply.github.com> Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com> Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com> Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com> Co-authored-by: Thunderbrook <52529258+Thunderbrook@users.noreply.github.com> Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com> Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com>
AnnaTrainingG
pushed a commit
to AnnaTrainingG/Paddle
that referenced
this pull request
Dec 6, 2023
fix typo in default.yaml
lizexu123
pushed a commit
to lizexu123/Paddle
that referenced
this pull request
Feb 23, 2024
qingshui
added a commit
to jiaoxuewu/PaddleBox
that referenced
this pull request
Mar 6, 2024
* fused_seqpool_cvm_with_conv support filter by threshold * add fill zero in fused_seqpool_cvm * add fused seq tensor && support transpose batch fc weight --------- Co-authored-by: mojingcj <ChengJing_dhu@163.com> Co-authored-by: jiaoxuewu <jiaoxuewu@163.com> Co-authored-by: yuandong1998 <1377526365@qq.com> Co-authored-by: shangzhongbin <shangzhongbin@baidu.com>
jack603047588
pushed a commit
to jiaoxuewu/PaddleBox
that referenced
this pull request
Oct 29, 2024
support fused seq tensor for xpu
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.