[NPU] support global accumulator for adam #32780

zhiqiu · 2021-05-07T07:58:11Z

PR types

Performance optimization

PR changes

OPs

Describe

[NPU] support global accumulator for adam

As described in #32605, on ernie-3.0 model, we can see that there are several time bubbles between two adam kernels.

There are mainly two problems that cost much time,

update beta1/beta2 pow
convert beta1/beta2/epsilon to NPU tensor

PR #32605 solves one problem 2, and this PR tries to solve problem 1.

why

The original implementation of AdamOptimizer creates beta1_pow and beta2_pow for each parameter and updates them in adam op for each parameter.
It SHOULD be pointed out that, actually, the value of the beta1_pow and beta2_pow of each parameter is the same.

This works fine in adam CUDA kernel, since the beta1_pow and beta2_pow can be updated fast. However, in NPU kernel, it requires to call two mul op and cost much time.

How

So, we introduce global beta_pow, which means only creates one beta1_pow and one beta2_pow for all the parameters of the whole model.
Specificlly,

Add bool attribute use_global_beta_pow to adam op. If true, the outputs(Beta1PowOut, Beta2PowOut) will not be used in adam op, "
and beta_pow will be updated after all adam op in the model.
add bool parameter use_global_beta_pow to paddle.fluid.optimizer.Adam, If true, Adam will use global beta_pow for whole model instead of creating beta_pow for each parameter.

As can be seen in the timeline, there is no mul between two ApplyAdam

Performance

before
after

22211 -> 24481 tokens/s, +10 %

paddle-bot-old · 2021-05-07T07:58:14Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

liym27

LGTM

wanghuancoder

LGTM

* add use_global_beta_pow * add use_global_beta_pow * update npu kernel * update python api * refine code * add ut for use_global_beta_pow * fix npu kernel * add ut for api * add ut for exception * add ut for save/load

zhiqiu added 2 commits May 7, 2021 07:53

add use_global_beta_pow

eb0293e

add use_global_beta_pow

1138984

zhiqiu added 3 commits May 7, 2021 08:03

update npu kernel

b587a4d

update python api

acc1b55

refine code

f09b6cf

liym27 previously approved these changes May 10, 2021

View reviewed changes

add ut for use_global_beta_pow

d9430dd

zhiqiu dismissed liym27’s stale review via d9430dd May 11, 2021 09:29

zhiqiu added 4 commits May 11, 2021 11:13

fix npu kernel

df81ac6

add ut for api

84273c2

add ut for exception

b6d5a69

add ut for save/load

8b12bdc

wanghuancoder approved these changes May 13, 2021

View reviewed changes

zhiqiu requested a review from phlrain May 13, 2021 01:58

phlrain approved these changes May 13, 2021

View reviewed changes

zhiqiu merged commit dace3fd into PaddlePaddle:develop May 13, 2021

zhaoyinglia mentioned this pull request Sep 2, 2021

[2.1/cherrypick] cherry-pick adamw PR: 34897/35020/35124/35286/32780/34075/34274/32605 #35364

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPU] support global accumulator for adam #32780

[NPU] support global accumulator for adam #32780

zhiqiu commented May 7, 2021 •

edited

Loading

paddle-bot-old bot commented May 7, 2021

liym27 left a comment

wanghuancoder left a comment

[NPU] support global accumulator for adam #32780

[NPU] support global accumulator for adam #32780

Conversation

zhiqiu commented May 7, 2021 • edited Loading

PR types

PR changes

Describe

why

How

Performance

paddle-bot-old bot commented May 7, 2021

liym27 left a comment

Choose a reason for hiding this comment

wanghuancoder left a comment

Choose a reason for hiding this comment

zhiqiu commented May 7, 2021 •

edited

Loading