【PaddlePaddle Hackathon 3 No.47】为 Paddle 新增 logsumexp support fp16 #45817

xiaohemaikoo · 2022-09-07T00:49:46Z

PR types

New features

PR changes

OPs

Describe

logsumexp support fp16

performance

Case No.	input_shape	FP32 Perf(us)	FP16 Perf(us)	diff
0	[1000, 130, 17]	173.735	206.377	0.842
1	[1000, 100, 10, 10]	576.93	651.485	0.886
2	[1000, 100, 200]	1089.02	1219.52	0.893
3	[100, 1000, 25, 40]	5195	5766.75	0.901
4	[100, 1000, 250, 40]	40763	39548.1	1.031
5	[100, 1000, 250, 50]	64426	50173.5	1.284

paddle-bot · 2022-09-07T00:49:51Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

xiaohemaikoo · 2022-09-07T00:58:24Z

你好，这个pr只增加了fp32单测。
我看到 test_logsumexp.py 中并没有添加针对fp32计算精度的单测。
按照PR中提交的单测测试fp32后，发现fp32精度也没达到 1e-3内的要求。

那么如果我要优化该logsumexp算子数据类型支持，可能需要同时优化支持fp32和fp16的计算精度。
同时保持性能不差于原始代码使用 float32 类型计算可能无法达到要求，因为float32原本的精度就不达标。
优化float32精度的时候，fp32的性能可能会相对之前同步变差。

我认为上述fp32单测没问题的话。那fp32和fp16在精度上都需要修改。

zhangting2020 · 2022-09-08T05:42:43Z

看上去fp64的精度验证是通过的，但fp32却无法通过。你可否先试试看用numpy实现一个自定义梯度，对比numpy算出的期望梯度值和算子的计算结果有多大差异吗？

Ligoml · 2022-09-14T03:47:24Z

为什么close了呀？

xiaohemaikoo · 2022-09-19T13:15:54Z

为什么close了呀？

不好意思，上周我去度假了，赶在19号前没时间继续看了。

Ligoml · 2022-09-20T12:14:52Z

PR提交的截止时间是19号，合入的截止时间是29号，还有机会的~

xiaohemaikoo · 2022-09-25T21:44:53Z

在inputs取值shape = [2, 3, 4, 5]， x = np.random.uniform(-1, 1, shape).astype(dtype)时
本地numpy自定义梯度对比算子的计算梯度结果在fp32和fp16时并无太大差异。各项结果都是小于1e-3的。

但是默认的check_grad仍然会报错。
请问你们定义的user_defined_grads，和 user_defined_grad_outputs具体指的是什么，
fp32时默认计算的user_defined_grads和user_defined_grad_outputs会出现大于5e-3的误差。

xiaohemaikoo · 2022-09-25T22:13:57Z

还有user_defined_grads和user_defined_grad_outputs分别对应的numeric_grads和analytic_grads在op_test.py中代表什么。
因为是默认计算出的numeric_grads和analytic_grads误差比较大。

Xreki · 2022-10-08T08:13:53Z

在inputs取值shape = [2, 3, 4, 5]， x = np.random.uniform(-1, 1, shape).astype(dtype)时
本地numpy自定义梯度对比算子的计算梯度结果在fp32和fp16时并无太大差异。各项结果都是小于1e-3的。

但是默认的check_grad仍然会报错。
请问你们定义的user_defined_grads，和 user_defined_grad_outputs具体指的是什么，
fp32时默认计算的user_defined_grads和user_defined_grad_outputs会出现大于5e-3的误差。

还有user_defined_grads和user_defined_grad_outputs分别对应的numeric_grads和analytic_grads在op_test.py中代表什么。
因为是默认计算出的numeric_grads和analytic_grads误差比较大。

PR中只看到增加了一个FP32的单测，本地是否有代码修改，能否先更新一下？

Xreki · 2022-10-08T11:04:41Z

paddle/phi/kernels/gpu/logsumexp_kernel.cu

 PD_REGISTER_KERNEL(
-    logsumexp, GPU, ALL_LAYOUT, phi::LogsumexpKernel, float, double) {}
+    logsumexp, GPU, ALL_LAYOUT, phi::LogsumexpKernel, float, double, float16) {}


LogsumexpFunctor实现中存在exp、log，需要使用float作为计算类型。修改方式可以参考下#45952 。

Xreki · 2022-10-08T11:06:13Z

paddle/phi/kernels/gpu/logsumexp_grad_kernel.cu

+PD_REGISTER_KERNEL(logsumexp_grad,
+                   GPU,
+                   ALL_LAYOUT,
+                   phi::LogsumexpGradKernel,


LogsumexpGradFunctor实现中存在exp，也需要使用float作为计算类型。

Xreki · 2022-10-08T11:08:32Z

python/paddle/fluid/tests/unittests/test_logsumexp.py

+        self.dtype = 'float32'
+
+
+class TestLogsumexp_FP16(TestLogsumexp):


fp16单测可添加如下装饰器，跳过CPU上的执行：

@unittest.skipIf(not core.is_compiled_with_cuda(), "core is not compiled with CUDA")

另外，若单测精度无法通过，可尝试修改单测中的atol、rtol、max_relative_error。对于float16来说，有效位数只有3位，设置成1e-3也是合理的。

xiaohemaikoo · 2022-10-10T01:55:10Z

@Xreki 你好，代码conflict已经解了。目前看Functor cast 成double在单测fp32和fp16时也有精度问题。是不是单测还要单独处理才行。

Xreki · 2022-10-10T08:17:12Z

paddle/phi/kernels/impl/logsumexp_grad_kernel_impl.h

            dev_ctx, in, out, out_grad, in_grad, functor, axis32);
        break;
      case 4:
-        phi::funcs::ReduceGradFunctor<Context, T, 4, LogsumexpGradFunctor>(
+        phi::funcs::ReduceGradFunctor<Context, T, 4, LogsumexpGradFunctor<T>>(
            dev_ctx, in, out, out_grad, in_grad, functor, axis32);
        break;
    }


建议在这里加一下default的处理，对于大于4维的输入使用PADDLE_THROW报错。

Xreki · 2022-10-10T08:19:29Z

paddle/phi/kernels/impl/logsumexp_kernel_impl.h

@@ -74,7 +79,7 @@ void LogsumexpKernel(const Context& dev_ctx,
    auto output = phi::EigenScalar<T>::From(*out);
    auto& place = *dev_ctx.eigen_device();
    auto reduce_dim = Eigen::array<int, 1>({{0}});
-    LogsumexpFunctor()(place, &input, &output, reduce_dim);
+    LogsumexpFunctor<T>()(place, &input, &output, reduce_dim);
  } else {


对于不支持的维度，麻烦帮忙加一个报错吧。

Xreki · 2022-10-10T08:21:36Z

python/paddle/fluid/tests/unittests/test_logsumexp.py

+
+    def set_attrs(self):
+        self.dtype = 'float16'
+


目前看Functor cast 成double在单测fp32和fp16时也有精度问题。是不是单测还要单独处理才行。

该单测中，可以重写下test_check_output和test_check_grad函数，并且指定大一些的atol、max_relative_error精度阈值。

xiaohemaikoo · 2022-10-11T01:39:59Z

@Xreki 单测已经加好了，请有时间再review一下代码，性能数据今明天补上。

xiaohemaikoo · 2022-10-11T22:00:48Z

@Xreki 你好，性能数据已经更新。fp16性能会随着数据规模增大逐渐变好。因为logsumexp中exp和log运算需要cast成float，总体fp16和fp32性能在一个数量级，小规模fp32没有cast开销性能稍好，随着数据规模增大fp16性能会更好。我本地环境最大只能测到[100, 1000, 250, 50]数据规模，fp32:fp16是1.284。尝试其他几种步骤计算logsumexp op，结果性能和当前几乎无变化。 CI已经通过，请有时间review。

zhangting2020 · 2022-10-13T03:22:05Z

python/paddle/fluid/tests/unittests/test_logsumexp.py

+    x_grad = tensor_x.gradient()
+    fluid.set_flags({"FLAGS_retain_grad_for_all_tensor": False})
+    paddle.enable_static()
+    return x_grad


目前已经不推荐使用fluid的接口，这里建议参考如下单测：

to_variable -> paddle.to_tensor

计算梯度backward() -> padle.grad

Paddle/python/paddle/fluid/tests/unittests/test_layer_norm_op.py

Lines 346 to 388 in 97ec57f

class TestFP16ScaleBiasLayerNorm(unittest.TestCase):

def check_main(self, x_np, weight_np, bias_np, dtype):

paddle.disable_static()

weight_np = weight_np.astype(dtype)

bias_np = bias_np.astype(dtype)

x = paddle.to_tensor(x_np)

weight = paddle.to_tensor(weight_np)

bias = paddle.to_tensor(bias_np)

x.stop_gradient = False

weight.stop_gradient = False

bias.stop_gradient = False

y = F.layer_norm(x, x.shape[1:], weight, bias)

x_g, w_g, b_g = paddle.grad(y, [x, weight, bias])

y_np = y.numpy().astype('float32')

x_g_np = x_g.numpy().astype('float32')

w_g_np = w_g.numpy().astype('float16')

b_g_np = b_g.numpy().astype('float32')

paddle.enable_static()

return y_np, x_g_np, w_g_np, b_g_np

def test_main(self):

if not paddle.is_compiled_with_cuda():

return

x_np = np.random.random([10, 20]).astype('float16')

weight_np = np.random.random([20]).astype('float16')

bias_np = np.random.random([20]).astype('float16')

y_np_1, x_g_np_1, w_g_np_1, b_g_np_1 = self.check_main(

x_np, weight_np, bias_np, 'float16')

y_np_2, x_g_np_2, w_g_np_2, b_g_np_2 = self.check_main(

x_np, weight_np, bias_np, 'float32')

def assert_equal(x, y):

np.testing.assert_array_equal(x, y)

assert_equal(y_np_1, y_np_2)

assert_equal(x_g_np_1, x_g_np_2)

assert_equal(w_g_np_1, w_g_np_2)

assert_equal(b_g_np_1, b_g_np_2)

zhangting2020 · 2022-10-13T03:36:20Z

python/paddle/fluid/tests/unittests/test_logsumexp.py

+
+def logsumexp_ref_grad(x):
+    sum = np.exp(x).sum()
+    return np.exp(x) / sum


如果输入是fp16，这里计算过程都是fp16也会损失精度，这个参考值应该不够精确。建议这里计算过程也用fp32

zhangting2020 · 2022-10-13T03:42:21Z

python/paddle/fluid/tests/unittests/test_logsumexp.py

+        self.__class__.dtype = self.dtype
+        x_grad = logsumexp_op_grad(self.inputs['X'])
+        ref_x_grad = logsumexp_ref_grad(self.inputs['X'])
+        np.testing.assert_allclose(x_grad, ref_x_grad, rtol=1e-05, atol=1e-04)


这里atol设置的是不是有一些问题？allclose的规则：absolute(a - b) <= (atol + rtol * absolute(b))，fp16的相对误差是1e-3，即标准是rotl=1e-3，atol则尽可能是0

xiaohemaikoo · 2022-10-13T05:00:34Z

@zhangting2020 谢谢，以上review提示已修改，请再检查一遍。

zhangting2020

LGTM

paddle-bot bot added contributor External developers status: proposed labels Sep 7, 2022

xiaohemaikoo mentioned this pull request Sep 7, 2022

【PaddlePaddle Hackathon 第三期】任务总览 #43938

Closed

luotao1 assigned luotao1, zhangting2020 and Ligoml Sep 7, 2022

luotao1 added the PaddlePaddle Hackathon label Sep 7, 2022

paddle-bot bot removed the status: proposed label Sep 7, 2022

xiaohemaikoo closed this Sep 12, 2022

Ligoml reopened this Sep 20, 2022

xiaohemaikoo force-pushed the ma-elem branch from ae46141 to f054b23 Compare October 8, 2022 10:56

Xreki reviewed Oct 8, 2022

View reviewed changes

xiaohemaikoo force-pushed the ma-elem branch 2 times, most recently from 42f6904 to c0ac7db Compare October 9, 2022 14:45

Xreki reviewed Oct 10, 2022

View reviewed changes

xiaohemaikoo force-pushed the ma-elem branch from c0ac7db to 2d58b6d Compare October 11, 2022 00:22

zhangting2020 reviewed Oct 13, 2022

View reviewed changes

logsumexp support fp16

5ff6557

xiaohemaikoo force-pushed the ma-elem branch from 2d58b6d to 5ff6557 Compare October 13, 2022 04:57

zhangting2020 approved these changes Oct 13, 2022

View reviewed changes

luotao1 approved these changes Oct 13, 2022

View reviewed changes

zhangting2020 merged commit 910e1b6 into PaddlePaddle:develop Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【PaddlePaddle Hackathon 3 No.47】为 Paddle 新增 logsumexp support fp16 #45817

【PaddlePaddle Hackathon 3 No.47】为 Paddle 新增 logsumexp support fp16 #45817

xiaohemaikoo commented Sep 7, 2022 •

edited by zhangting2020

Loading

paddle-bot bot commented Sep 7, 2022

xiaohemaikoo commented Sep 7, 2022

zhangting2020 commented Sep 8, 2022

Ligoml commented Sep 14, 2022

xiaohemaikoo commented Sep 19, 2022

Ligoml commented Sep 20, 2022

xiaohemaikoo commented Sep 25, 2022

xiaohemaikoo commented Sep 25, 2022

Xreki commented Oct 8, 2022 •

edited

Loading

Xreki Oct 8, 2022

Xreki Oct 8, 2022

Xreki Oct 8, 2022

xiaohemaikoo commented Oct 10, 2022

Xreki Oct 10, 2022

Xreki Oct 10, 2022

Xreki Oct 10, 2022 •

edited

Loading

xiaohemaikoo commented Oct 11, 2022

xiaohemaikoo commented Oct 11, 2022

zhangting2020 Oct 13, 2022

zhangting2020 Oct 13, 2022

zhangting2020 Oct 13, 2022

xiaohemaikoo commented Oct 13, 2022

zhangting2020 left a comment

		self.dtype = 'float32'


		class TestLogsumexp_FP16(TestLogsumexp):

	class TestFP16ScaleBiasLayerNorm(unittest.TestCase):

	def check_main(self, x_np, weight_np, bias_np, dtype):
	paddle.disable_static()

	weight_np = weight_np.astype(dtype)
	bias_np = bias_np.astype(dtype)

	x = paddle.to_tensor(x_np)
	weight = paddle.to_tensor(weight_np)
	bias = paddle.to_tensor(bias_np)
	x.stop_gradient = False
	weight.stop_gradient = False
	bias.stop_gradient = False
	y = F.layer_norm(x, x.shape[1:], weight, bias)
	x_g, w_g, b_g = paddle.grad(y, [x, weight, bias])
	y_np = y.numpy().astype('float32')
	x_g_np = x_g.numpy().astype('float32')
	w_g_np = w_g.numpy().astype('float16')
	b_g_np = b_g.numpy().astype('float32')

	paddle.enable_static()
	return y_np, x_g_np, w_g_np, b_g_np

	def test_main(self):
	if not paddle.is_compiled_with_cuda():
	return
	x_np = np.random.random([10, 20]).astype('float16')
	weight_np = np.random.random([20]).astype('float16')
	bias_np = np.random.random([20]).astype('float16')

	y_np_1, x_g_np_1, w_g_np_1, b_g_np_1 = self.check_main(
	x_np, weight_np, bias_np, 'float16')
	y_np_2, x_g_np_2, w_g_np_2, b_g_np_2 = self.check_main(
	x_np, weight_np, bias_np, 'float32')

	def assert_equal(x, y):
	np.testing.assert_array_equal(x, y)

	assert_equal(y_np_1, y_np_2)
	assert_equal(x_g_np_1, x_g_np_2)
	assert_equal(w_g_np_1, w_g_np_2)
	assert_equal(b_g_np_1, b_g_np_2)

【PaddlePaddle Hackathon 3 No.47】为 Paddle 新增 logsumexp support fp16 #45817

【PaddlePaddle Hackathon 3 No.47】为 Paddle 新增 logsumexp support fp16 #45817

Conversation

xiaohemaikoo commented Sep 7, 2022 • edited by zhangting2020 Loading

PR types

PR changes

Describe

performance

paddle-bot bot commented Sep 7, 2022

xiaohemaikoo commented Sep 7, 2022

zhangting2020 commented Sep 8, 2022

Ligoml commented Sep 14, 2022

xiaohemaikoo commented Sep 19, 2022

Ligoml commented Sep 20, 2022

xiaohemaikoo commented Sep 25, 2022

xiaohemaikoo commented Sep 25, 2022

Xreki commented Oct 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiaohemaikoo commented Oct 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xreki Oct 10, 2022 • edited Loading

Choose a reason for hiding this comment

xiaohemaikoo commented Oct 11, 2022

xiaohemaikoo commented Oct 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiaohemaikoo commented Oct 13, 2022

zhangting2020 left a comment

Choose a reason for hiding this comment

xiaohemaikoo commented Sep 7, 2022 •

edited by zhangting2020

Loading

Xreki commented Oct 8, 2022 •

edited

Loading

Xreki Oct 10, 2022 •

edited

Loading