-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cherry-pick 2.2]Refactor Heterogenous Pipeline Parameter Server #37446
Merged
fuyinno4
merged 12 commits into
PaddlePaddle:release/2.2
from
zmxdream:cherry_fix_heterps
Nov 23, 2021
Merged
[cherry-pick 2.2]Refactor Heterogenous Pipeline Parameter Server #37446
fuyinno4
merged 12 commits into
PaddlePaddle:release/2.2
from
zmxdream:cherry_fix_heterps
Nov 23, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Thanks for your contribution! |
zmxdream
changed the title
bug fix for DeserializeSelectedRows.(#36520)
[cherry-pick]bug fix for DeserializeSelectedRows.(#36520)
Nov 22, 2021
* bug fix for DeserializeSelectedRows. test=develop * fix bug for SerializeSelectedRows. test=develop * update. test=develop
* change username * fix * fix * fix * fix * fix * update * update * update unittests * fix * update * fix * update * fix * fix * fix * update * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update send_and_recv op. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix unit. notest,test=coverage * fix ut. notest, test=coverage * update. notest,test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix. notest, test=coverage * fix. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * add func. notest, test=coverage * fix ut. notest, test=coverage * fix. test=develop * fix. test=develop
zmxdream
changed the title
[cherry-pick]bug fix for DeserializeSelectedRows.(#36520)
[cherry-pick]Refactor Heter Pipeline ps
Nov 22, 2021
* fix ut. test=develop * fix ut. test=develop
…ddle#37166) * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop
* fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * refactor heter trainer. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop
* fix pslib. test=develop * add device to train_from_dataset. test=develop * refine fleet.stop_worker. test=develop * fix ut. test=develop * fix ut. test=develop * fix executor & ut. test=develop * fix executor & ut. test=develop * fix executor & ut. test=develop
* fix api. test=develop * fix api. test=develop
zmxdream
changed the title
[cherry-pick]Refactor Heter Pipeline ps
[cherry-pick]Refactor Heterogenous Pipeline Parameter Server
Nov 22, 2021
Thunderbrook
approved these changes
Nov 23, 2021
chenwhql
approved these changes
Nov 23, 2021
lanxianghit
approved these changes
Nov 23, 2021
TCChenlong
approved these changes
Nov 23, 2021
fuyinno4
approved these changes
Nov 23, 2021
raindrops2sea
approved these changes
Nov 23, 2021
zmxdream
changed the title
[cherry-pick]Refactor Heterogenous Pipeline Parameter Server
[cherry-pick 2.2]Refactor Heterogenous Pipeline Parameter Server
Nov 23, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
New features
PR changes
Others
Describe
This PR refactor heterps with following new features:
support partitioning static graph arbitrarily(#36845)
a. Eliminate redundant information transmitted across different stages
b. add function to union forward op and backward op to construct sub-graph after spliting program
c. bug fix for matchnet & 2emb model(re-sort sum op)
support M:N pipeline training of Heterogeneous Parameter Server
a. add heter_pipeline_trainer & heter_section_worker for multi-thread pipeline training in M:N mode
b. add heter_devices args to fleet to appoint device type in each stage【will be deprecated in next version】
c. enable trainer instance cache in executor.train_from_dataset in Heter pipeline train mode
d. bug fix for multi-node train [Python set(list + list) will disorder in different machine]
e. fix SerializeSelectedRows & DeserializeSelectedRows (#36520 #36543)
f. change default executor for heter trainer to keep API consistent for cpu trainer & heter trainer (#37314)