Add multi_tensor for momentum optimizer and clear_grads #37564

zhangbo9674 · 2021-11-25T13:08:06Z

PR types

New features

PR changes

APIs

Describe

一、pr主要内容：

momentum优化器动态图添加multi_tensor_apply的优化策略。（依赖于merged_momentumop的pr）
优化器动态图的clear_grad添加multi_tensor_apply的优化策略。（依赖VarBase::ClearGradient的优化pr）

二、multi_tensor_apply策略：

2.1、原始优化器执行逻辑：

循环所有的参数，逐个调用优化器kernel进行参数更新。以resnet50模型为例，对优化器执行逻辑及耗时进行统计分析，结果如下：

优化器总耗时9ms，其中6ms(66.7%)的耗时集中在逐个遍历每个网络参数，调用momentum op对参数进行优化。
动态图分支存在一些对参数更新无用的步骤，例如update_param_device_map(params_grads)。

2.2、使用multi_tensor_apply策略的优化器执行逻辑：

multi_tensor_apply的优化器执行逻辑如下图所示，主要分为两个部分：

黄色部分为数据初始化部分，在第一轮训练会对网络参数进行遍历，对global_lr、parameter、velocity、regularization等参数组为list，用于后续优化器op的调用。该流程比较耗时，但在训练一轮后，后续无需再次调用。
绿色部分为每轮训练都需要执行的内容：包括将网络所有的grad、lr组成list，调用一次[merged_momentum](https://github.com/PaddlePaddle/Paddle/pull/37527)op对网络所有的参数进行更新。

2.3、clear_grad的multi_tensor_apply逻辑：

与优化器优化逻辑一致，原本clear_grad循环遍历所有的grad，调用VarBase::ClearGradient(set_to_zero=True)。主要耗时集中在：

反复的python-c++交互；
set_to_zero模式性能较差、耗时较长。

加入multi_tensor_apply策略后，一次传入所有的grad，在C++端训练调用VarBase::ClearGradient(set_to_zero=False)。减少了python-c++交互时间、set_to_zero模式的耗时。

三、优化性能测试：

以resnet50为例，bath_size=256，优化器及clear_grad优化前后耗时对比如下：

优化前耗时11ms左右：
优化后耗时6ms左右：

paddle-bot-old · 2021-11-25T13:08:10Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… dev/approve_momentum_py

wadefelix · 2021-12-03T01:58:01Z

python/paddle/optimizer/optimizer.py

+            None
+
+        Examples:
+            .. code-block:: python


code-block行与代码正文之间要间隔个空行

… dev/approve_momentum_py

zhiqiu · 2021-12-13T04:00:43Z

python/paddle/optimizer/momentum.py

@@ -129,7 +131,8 @@ def __init__(self,
                 grad_clip=None,
                 multi_precision=False,
                 rescale_grad=1.0,
-                 name=None):
+                 name=None,
+                 use_multi_tensor=False):


Is it better to put use_multi_tensor before name?

zhiqiu · 2021-12-13T05:41:28Z

python/paddle/optimizer/momentum.py

+            self.helper = LayerHelper(self.__class__.__name__)
+
+            self._create_global_learning_rate()
+            if framework.in_dygraph_mode():


what about static mode?

Tks, Multi Tensor has been added to the static mode.

zhiqiu · 2021-12-13T05:44:22Z

python/paddle/optimizer/momentum.py

+        """
+        self._create_accumulators(target_block, parameters)
+        for param in parameters:
+            if param.stop_gradient is False:


Is this if needed?

This is not needed, done, tks!

zhiqiu · 2021-12-13T05:46:51Z

python/paddle/optimizer/momentum.py

+                                                         param)
+                    self.velocity_dict['FP16_LODTensor'].append(velocity_acc)
+                    # master weight
+                    # master weight


duplicated.

zhiqiu · 2021-12-13T05:47:58Z

python/paddle/optimizer/momentum.py

+                    # regularization
+                    regularization_method = self._regularization_method
+                    regularization_coeff = self._regularization_coeff
+                    if hasattr(param, 'regularizer'):
+                        # we skip param's l2decay before, so fuse it with momentum here.
+                        if isinstance(param.regularizer, L2DecayRegularizer):
+                            regularization_method = "l2_decay"
+                            regularization_coeff = param.regularizer._regularization_coeff
+                        # the param's regularization has been done before, we avoid do l2decay in momentum.
+                        else:
+                            regularization_method = ""
+                            regularization_coeff = 0


same with fp32, the code can be resued.

zhiqiu · 2021-12-13T05:48:58Z

python/paddle/optimizer/momentum.py

+        self.grad_dict = {'FP32_LODTensor': [], 'FP16_LODTensor': []}
+        self.lr_dict = {'FP32_LODTensor': [], 'FP16_LODTensor': []}
+
+        if framework.in_dygraph_mode():


same above, what about static mode?

Tks, Multi Tensor has been added to the static mode.

zhiqiu · 2021-12-13T05:50:34Z

python/paddle/optimizer/momentum.py

+        self.grad_dict = {'FP32_LODTensor': [], 'FP16_LODTensor': []}
+        self.lr_dict = {'FP32_LODTensor': [], 'FP16_LODTensor': []}


It is no need to be attr of self, temp var is ok

… dev/approve_momentum_py

zhiqiu · 2021-12-14T08:25:23Z

python/paddle/optimizer/optimizer.py

+        # NOTE: Multi Tensor: Pass in all parameters and gradients to the op kernel of the Optimizer at one time for updating for dygraph mode.
+        # Optimizer support list: [ paddle.optimizer.Momentum ].
+        self._use_multi_tensor = None
+        self.param_dict = {'FP32_LODTensor': [], 'FP16_LODTensor': []}


Suggested change

self.param_dict = {'FP32_LODTensor': [], 'FP16_LODTensor': []}

self._param_dict = {'FP32_LODTensor': [], 'FP16_LODTensor': []}

zhiqiu · 2021-12-14T08:29:16Z

python/paddle/optimizer/optimizer.py

+        param_list = []
        if self._parameter_list is None or not isinstance(
                self._parameter_list[0], dict):
            for p in self._parameter_list:
                if not p.stop_gradient:
-                    p.clear_gradient()
+                    if set_to_zero:
+                        p.clear_gradient()
+                    else:
+                        param_list.append(p)
        else:
            for param_group in self._param_groups:
                for p in param_group['params']:
                    if not p.stop_gradient:
-                        p.clear_gradient()
+                        if set_to_zero:
+                            p.clear_gradient()
+                        else:
+                            param_list.append(p)


I think we can use core.clear_gradients even if set_to_zero is true

zhiqiu · 2021-12-16T03:07:22Z

python/paddle/optimizer/momentum.py

+            use_multi_tensor=False,
+            name=None, ):


Suggested change

use_multi_tensor=False,

name=None, ):

use_multi_tensor=False,

name=None):

zhiqiu · 2021-12-16T03:07:54Z

python/paddle/optimizer/momentum.py

@@ -72,6 +73,7 @@ class Momentum(Optimizer):
            ( :ref:`api_fluid_clip_GradientClipByGlobalNorm` , :ref:`api_fluid_clip_GradientClipByNorm` ,
            :ref:`api_fluid_clip_GradientClipByValue` ). Default None, meaning there is no gradient clipping.
        multi_precision (bool, optional): Whether to use multi-precision during weight updating. Default is false.
+        use_multi_tensor (bool, optional): Whether to use multi-tensor strategy to update all parameters at once . Default is false.


it should be listed after rescale_grad

TCChenlong · 2021-12-16T06:10:29Z

python/paddle/optimizer/optimizer.py

+        There are two method to clear grad: set_to_zero or delete grad.
+
+        Args:
+            set_to_zero (bool): If set grads to zero or not, default is True.


bool -> bool, optional

TCChenlong

LGTM

zhiqiu

LGTM

add multi_tensor for momentum and clear_grads for optimizer

ee0611c

zhangbo9674 changed the title ~~[Opt]add multi_tensor for momentum and clear_grads for optimizer~~ Add multi_tensor for momentum and clear_grads for optimizer Dec 1, 2021

zhangbo9674 changed the title ~~Add multi_tensor for momentum and clear_grads for optimizer~~ Add multi_tensor for momentum optimizer and clear_grads Dec 1, 2021

zhangbo9674 added 5 commits December 1, 2021 05:54

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

cff538e

… dev/approve_momentum_py

fix bug for dygraph

d1f2e65

add unittest

f2170fd

refine comment

517539e

add param_group

b5b0181

wadefelix reviewed Dec 3, 2021

View reviewed changes

zhangbo9674 added 7 commits December 3, 2021 03:46

refine regularizaiton logic

5040d32

del clear_grads

9359158

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

eaa498c

… dev/approve_momentum_py

add clear_grads

335fc20

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

6575038

… dev/approve_momentum_py

merge develop & fix confilct

7feaa77

add dispensable check of None

debef46

zhangbo9674 assigned zhiqiu Dec 8, 2021

zhangbo9674 added 3 commits December 9, 2021 10:23

refine clear_grad

4fef207

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

aa370b0

… dev/approve_momentum_py

fix build bug

aa37e17

zhiqiu reviewed Dec 13, 2021

View reviewed changes

zhangbo9674 added 5 commits December 13, 2021 08:17

refine code by comment

28e4a7e

refine code

c5981f3

add multi tensor check

c152fbb

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ff221ae

… dev/approve_momentum_py

refine param_group update

2cf25e2

zhiqiu reviewed Dec 14, 2021

View reviewed changes

zhangbo9674 added 2 commits December 15, 2021 05:07

add multi tensor for static mode

668d5ad

refine comments

f7317e2

merge develop

da9c4a4

zhiqiu reviewed Dec 16, 2021

View reviewed changes

zhangbo9674 added 2 commits December 16, 2021 03:55

delete useless comma for momentum

deb10be

refine comment for momentum

9d2df8a

TCChenlong reviewed Dec 16, 2021

View reviewed changes

refine code by commment

3d1cd6a

lanxianghit previously approved these changes Dec 17, 2021

View reviewed changes

TCChenlong previously approved these changes Dec 17, 2021

View reviewed changes

Merge branch 'develop' into dev/approve_momentum_py

5423606

zhangbo9674 dismissed stale reviews from TCChenlong and lanxianghit via 5423606 December 17, 2021 10:19

zhangbo9674 added 2 commits December 17, 2021 11:44

merge develop

c363ba5

fix conflict

5d9239b

lanxianghit approved these changes Dec 20, 2021

View reviewed changes

TCChenlong approved these changes Dec 20, 2021

View reviewed changes

zhiqiu approved these changes Dec 20, 2021

View reviewed changes

zhiqiu merged commit 0cc5e22 into PaddlePaddle:develop Dec 20, 2021

zhangbo9674 mentioned this pull request Dec 28, 2021

Add multi tensor for adam #38010

Merged

zhangbo9674 deleted the dev/approve_momentum_py branch March 2, 2023 02:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi_tensor for momentum optimizer and clear_grads #37564

Add multi_tensor for momentum optimizer and clear_grads #37564

zhangbo9674 commented Nov 25, 2021 •

edited

Loading

paddle-bot-old bot commented Nov 25, 2021

wadefelix Dec 3, 2021

zhangbo9674 Dec 3, 2021

zhiqiu Dec 13, 2021

zhangbo9674 Dec 13, 2021

zhiqiu Dec 13, 2021

zhangbo9674 Dec 15, 2021

zhiqiu Dec 13, 2021

zhangbo9674 Dec 13, 2021

zhiqiu Dec 13, 2021

zhangbo9674 Dec 13, 2021

zhiqiu Dec 13, 2021

zhangbo9674 Dec 13, 2021

zhiqiu Dec 13, 2021

zhangbo9674 Dec 15, 2021

zhiqiu Dec 13, 2021

zhangbo9674 Dec 13, 2021

zhiqiu Dec 14, 2021

zhangbo9674 Dec 15, 2021

zhiqiu Dec 14, 2021

zhangbo9674 Dec 15, 2021

zhiqiu Dec 16, 2021

zhangbo9674 Dec 16, 2021

zhiqiu Dec 16, 2021

zhangbo9674 Dec 16, 2021

TCChenlong Dec 16, 2021

zhangbo9674 Dec 16, 2021

TCChenlong left a comment

zhiqiu left a comment

		self.grad_dict = {'FP32_LODTensor': [], 'FP16_LODTensor': []}
		self.lr_dict = {'FP32_LODTensor': [], 'FP16_LODTensor': []}

	self.param_dict = {'FP32_LODTensor': [], 'FP16_LODTensor': []}
	self._param_dict = {'FP32_LODTensor': [], 'FP16_LODTensor': []}

Add multi_tensor for momentum optimizer and clear_grads #37564

Add multi_tensor for momentum optimizer and clear_grads #37564

Conversation

zhangbo9674 commented Nov 25, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Nov 25, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TCChenlong left a comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

zhangbo9674 commented Nov 25, 2021 •

edited

Loading