[hybrid performance] softmax mask fuse upper triangle #33981

FeixLiu · 2021-07-06T03:14:27Z

PR types

New features

PR changes

OPs

Describe

Softmax mask fuse upper triangle.

With the observation that, for GPT kind structure, the attention mask is always be an upper triangle matrix that mask the upper triangle part of the QK product.

To save the time for creating mask and the HtoD time for the mask matrix (and may even save the time for communication of the mask between different stages of the PP), we fuse the softmax and mask (upper triangle) together.

Without this fusion:

# prepare QK and mask
QK_mask = QK + mask
rst = softmax(QK_mask)

With this fusion:

# prepare QK
rst = softmax_mask_fuse_upper_triangle(QK)

Performance gain (Static mode)

Model size	AMP	Hybird config (dp, pp, mp)	Before fusion	After fusion	Gain
117M	True	1(1, 1, 1)	32153	35493	+10.4%
117M	False	1(1, 1, 1)	11304	12225	+8.1%
117M	True	4(1, 1, 4)	73757	78592	+6.5%
117M	False	4(1, 1, 4)	33674	35333	+4.9%
117M	True	4(1, 4, 1)	43388	44514	+2.6%
117M	False	4(1, 4, 1)	21790	22583	+3.6%
117M	True	8(2, 2, 2)	58573	62136	+6.1%
117M	False	8(2, 2, 2)	41666	43047	+3.3%

Precision check

How to use

For dygraph:

import paddle.fluid as fluid
import paddle.incubate as incubate
import numpy as np
x_in_np = np.random.random((1, 1, 32, 32)).astype("float32")
input_x = fluid.dygraph.to_variable(x_in_np)
rst = incubate.softmax_mask_fuse_upper_triangle(input_x)

For static mode:

import paddle
import paddle.fluid as fluid
import paddle.incubate as incubate
import numpy as np
paddle.enable_static()
input_x = fluid.data(name="x", shape=[1, 1, 32, 32], dtype="float16")
rst = incubate.softmax_mask_fuse_upper_triangle(input_x)
x_in_np = np.random.random((1, 1, 32, 32)).astype("float16")
exe = fluid.Executor(fluid.CUDAPlace(0))
fetches = exe.run(fluid.default_main_program(),
                  feed={"x": x_in_np},
                  fetch_list=[rst])

paddle-bot-old · 2021-07-06T03:14:31Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

ForFishes

LGTM

ForFishes

LGTM

XiaoguangHu01

LGTM

PangHua

LGTM

softmax mask fuse upper triangle

928478d

ForFishes previously approved these changes Jul 8, 2021

View reviewed changes

cover not implemented cpu code

d2b0fee

FeixLiu dismissed ForFishes’s stale review via d2b0fee July 9, 2021 02:50

FeixLiu force-pushed the softmax_mask_fuse_upper_triangle branch from 84dba9e to d2b0fee Compare July 9, 2021 02:50

ForFishes approved these changes Jul 9, 2021

View reviewed changes

kolinwei approved these changes Jul 9, 2021

View reviewed changes

XiaoguangHu01 approved these changes Jul 9, 2021

View reviewed changes

PangHua approved these changes Jul 12, 2021

View reviewed changes

ForFishes merged commit e2e1c57 into PaddlePaddle:develop Jul 12, 2021

FeixLiu deleted the softmax_mask_fuse_upper_triangle branch July 13, 2021 05:41

FeixLiu changed the title ~~softmax mask fuse upper triangle~~ [hybrid performance] softmax mask fuse upper triangle Oct 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hybrid performance] softmax mask fuse upper triangle #33981

[hybrid performance] softmax mask fuse upper triangle #33981

FeixLiu commented Jul 6, 2021 •

edited

Loading

paddle-bot-old bot commented Jul 6, 2021

ForFishes left a comment

ForFishes left a comment

XiaoguangHu01 left a comment

PangHua left a comment

[hybrid performance] softmax mask fuse upper triangle #33981

[hybrid performance] softmax mask fuse upper triangle #33981

Conversation

FeixLiu commented Jul 6, 2021 • edited Loading

PR types

PR changes

Describe

Performance gain (Static mode)

Precision check

How to use

paddle-bot-old bot commented Jul 6, 2021

ForFishes left a comment

Choose a reason for hiding this comment

ForFishes left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

PangHua left a comment

Choose a reason for hiding this comment

FeixLiu commented Jul 6, 2021 •

edited

Loading