upgrade async distributed training in pscore #37515

zhaocaibei123 · 2021-11-24T09:02:10Z

PR types

Function optimization

PR changes

Others

Describe

upgrade async distributed training in pscore：

fuse lookup_table to distributed_lookup_table, fuse lookup_table_grad to distributed_push_sparse in trainer_pass
push sparse grad by distributed_push_sparse op using fleet->comm->brpc_client
push dense grad by send op using fleet->comm->brpc_client(send op behavior changed, now we use send op only to push dense grad)
upgrade lookup table from CommonSparseTable to MemorySparseTable in server to support incremental storage for embedding
add ShowClickEntry to push show/click to server, which is used for MemorySparseTable.

We now support large scale sparse model inference online used widely in recommendation/search/ad by large scale kv service, which loads incremental sparse model. If you want to use this, set entry and embedding like this, then you can dump incremental sparse model.

        import paddle
        paddle.enable_static()
        sparse_feature_dim = 1024
        embedding_size = 64
        shows = paddle.static.data(name='show', shape=[1], dtype='int64')
        clicks = paddle.static.data(name='clk', shape=[1], dtype='int64')
        input = paddle.static.data(name='ins', shape=[1], dtype='int64')
        # show/clk is var_name of show/click data
        entry = paddle.distributed.ShowClickEntry("show", "clk")
        emb = paddle.static.nn.sparse_embedding(
            input=input,
            size=[sparse_feature_dim, embedding_size],
            is_test=False,
            entry=entry,
            param_attr=paddle.ParamAttr(name="SparseFeatFactors",
                                       initializer=paddle.nn.initializer.Uniform()))

merge dev

… develop

…o develop

… develop

…o develop

… develop

…o develop

… develop

paddle-bot-old · 2021-11-24T09:02:13Z

✅ This PR's description meets the template requirements!
Please wait for other CI results.

paddle-bot-old · 2021-11-24T09:02:20Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

yaoxuefeng6

LGTM

chenwhql

LGTM for _init.py

* test * test * rm test * update * update * update * add unittest * update * update save

zhaocaibei123 and others added 22 commits August 20, 2021 07:15

test

760676b

Merge pull request #9 from PaddlePaddle/develop

58712fa

merge dev

test

6650d87

rm test

7c9a7a3

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d9bb3ad

… develop

Merge branch 'develop' of https://github.com/zhaocaibei123/Paddle int…

bb5d1f2

…o develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

b3d8a01

… develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

cd7af0f

… develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

a5a703d

… develop

Merge branch 'develop' of https://github.com/zhaocaibei123/Paddle int…

c9e4289

…o develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

fea03ec

… develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

cf96c33

… develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

0aa2c90

… develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

6edef85

… develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

3ad8e56

… develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

f0b2ec5

… develop

Merge branch 'develop' of https://github.com/zhaocaibei123/Paddle int…

8e379b1

…o develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

6148f18

… develop

update

d82afca

update

2effe9b

update

0318dbf

add unittest

cd24736

update

692f911

zhaocaibei123 changed the title ~~Accessor work 20211124~~ upgrade async distributed training in pscore Nov 25, 2021

update save

c2dcc4d

yaoxuefeng6 approved these changes Nov 26, 2021

View reviewed changes

chenwhql approved these changes Nov 26, 2021

View reviewed changes

Thunderbrook approved these changes Nov 26, 2021

View reviewed changes

lanxianghit approved these changes Nov 26, 2021

View reviewed changes

PangHua approved these changes Nov 26, 2021

View reviewed changes

Ligoml approved these changes Nov 26, 2021

View reviewed changes

Thunderbrook merged commit 74605fc into PaddlePaddle:develop Nov 26, 2021

Zjq9409 pushed a commit to Zjq9409/Paddle that referenced this pull request Dec 10, 2021

upgrade async distributed training in pscore (PaddlePaddle#37515)

f5cc9fa

* test * test * rm test * update * update * update * add unittest * update * update save

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upgrade async distributed training in pscore #37515

upgrade async distributed training in pscore #37515

zhaocaibei123 commented Nov 24, 2021 •

edited

Loading

paddle-bot-old bot commented Nov 24, 2021 •

edited

Loading

paddle-bot-old bot commented Nov 24, 2021

yaoxuefeng6 left a comment

chenwhql left a comment

upgrade async distributed training in pscore #37515

upgrade async distributed training in pscore #37515

Conversation

zhaocaibei123 commented Nov 24, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Nov 24, 2021 • edited Loading

paddle-bot-old bot commented Nov 24, 2021

yaoxuefeng6 left a comment

Choose a reason for hiding this comment

chenwhql left a comment

Choose a reason for hiding this comment

zhaocaibei123 commented Nov 24, 2021 •

edited

Loading

paddle-bot-old bot commented Nov 24, 2021 •

edited

Loading