Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cherry-pick 2.2]Refactor Heterogenous Pipeline Parameter Server #37446

Merged
merged 12 commits into from
Nov 23, 2021

Conversation

zmxdream
Copy link
Contributor

@zmxdream zmxdream commented Nov 22, 2021

PR types

New features

PR changes

Others

Describe

This PR refactor heterps with following new features:
support partitioning static graph arbitrarily(#36845)
a. Eliminate redundant information transmitted across different stages
b. add function to union forward op and backward op to construct sub-graph after spliting program
c. bug fix for matchnet & 2emb model(re-sort sum op)

support M:N pipeline training of Heterogeneous Parameter Server
a. add heter_pipeline_trainer & heter_section_worker for multi-thread pipeline training in M:N mode
b. add heter_devices args to fleet to appoint device type in each stage【will be deprecated in next version】
c. enable trainer instance cache in executor.train_from_dataset in Heter pipeline train mode
d. bug fix for multi-node train [Python set(list + list) will disorder in different machine]
e. fix SerializeSelectedRows & DeserializeSelectedRows (#36520 #36543
f. change default executor for heter trainer to keep API consistent for cpu trainer & heter trainer (#37314)

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@zmxdream zmxdream changed the title bug fix for DeserializeSelectedRows.(#36520) [cherry-pick]bug fix for DeserializeSelectedRows.(#36520) Nov 22, 2021
* bug fix for  DeserializeSelectedRows. test=develop

* fix bug for SerializeSelectedRows. test=develop

* update. test=develop
* change username

* fix

* fix

* fix

* fix

* fix

* update

* update

* update unittests

* fix

* update

* fix

* update

* fix

* fix

* fix

* update

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update send_and_recv op. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix unit. notest,test=coverage

* fix ut. notest, test=coverage

* update. notest,test=coverage

* fix ut. notest, test=coverage

* fix ut. notest, test=coverage

* fix. notest, test=coverage

* fix. notest, test=coverage

* fix ut. notest, test=coverage

* fix ut. notest, test=coverage

* fix ut. notest, test=coverage

* fix ut. notest, test=coverage

* add func. notest, test=coverage

* fix ut. notest, test=coverage

* fix. test=develop

* fix. test=develop
@zmxdream zmxdream changed the title [cherry-pick]bug fix for DeserializeSelectedRows.(#36520) [cherry-pick]Refactor Heter Pipeline ps Nov 22, 2021
* fix ut. test=develop

* fix ut. test=develop
…ddle#37166)

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop
* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* refactor heter trainer. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop
* fix pslib. test=develop

* add device to train_from_dataset. test=develop

* refine fleet.stop_worker. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix executor & ut. test=develop

* fix executor & ut. test=develop

* fix executor & ut. test=develop
* fix api. test=develop

* fix api. test=develop
@zmxdream zmxdream changed the title [cherry-pick]Refactor Heter Pipeline ps [cherry-pick]Refactor Heterogenous Pipeline Parameter Server Nov 22, 2021
@fuyinno4 fuyinno4 merged commit 4dc426f into PaddlePaddle:release/2.2 Nov 23, 2021
@zmxdream zmxdream changed the title [cherry-pick]Refactor Heterogenous Pipeline Parameter Server [cherry-pick 2.2]Refactor Heterogenous Pipeline Parameter Server Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants