Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade async distributed training in pscore #37515

Merged

Conversation

zhaocaibei123
Copy link
Contributor

@zhaocaibei123 zhaocaibei123 commented Nov 24, 2021

PR types

Function optimization

PR changes

Others

Describe

upgrade async distributed training in pscore:

  1. fuse lookup_table to distributed_lookup_table, fuse lookup_table_grad to distributed_push_sparse in trainer_pass
  2. push sparse grad by distributed_push_sparse op using fleet->comm->brpc_client
  3. push dense grad by send op using fleet->comm->brpc_client(send op behavior changed, now we use send op only to push dense grad)
  4. upgrade lookup table from CommonSparseTable to MemorySparseTable in server to support incremental storage for embedding
  5. add ShowClickEntry to push show/click to server, which is used for MemorySparseTable.

We now support large scale sparse model inference online used widely in recommendation/search/ad by large scale kv service, which loads incremental sparse model. If you want to use this, set entry and embedding like this, then you can dump incremental sparse model.

        import paddle
        paddle.enable_static()
        sparse_feature_dim = 1024
        embedding_size = 64
        shows = paddle.static.data(name='show', shape=[1], dtype='int64')
        clicks = paddle.static.data(name='clk', shape=[1], dtype='int64')
        input = paddle.static.data(name='ins', shape=[1], dtype='int64')
        # show/clk is var_name of show/click data
        entry = paddle.distributed.ShowClickEntry("show", "clk")
        emb = paddle.static.nn.sparse_embedding(
            input=input,
            size=[sparse_feature_dim, embedding_size],
            is_test=False,
            entry=entry,
            param_attr=paddle.ParamAttr(name="SparseFeatFactors",
                                       initializer=paddle.nn.initializer.Uniform()))            

zhaocaibei123 and others added 22 commits August 20, 2021 07:15
@paddle-bot-old
Copy link

paddle-bot-old bot commented Nov 24, 2021

✅ This PR's description meets the template requirements!
Please wait for other CI results.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@zhaocaibei123 zhaocaibei123 changed the title Accessor work 20211124 upgrade async distributed training in pscore Nov 25, 2021
Copy link
Contributor

@yaoxuefeng6 yaoxuefeng6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for _init.py

@Thunderbrook Thunderbrook merged commit 74605fc into PaddlePaddle:develop Nov 26, 2021
Zjq9409 pushed a commit to Zjq9409/Paddle that referenced this pull request Dec 10, 2021
* test

* test

* rm test

* update

* update

* update

* add unittest

* update

* update save
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants