【Hackathon 5th No.38】为 Paddle 新增 FractionalMaxPool2d / FractionalMaxPool3d API -kernel #59847

megemini · 2023-12-08T11:28:35Z

PR types

New features

PR changes

APIs

Description

RFC: PaddlePaddle/community#698

RFC V2.1： PaddlePaddle/community#798

关联 PR：#59130

新建算子重新实现 api ～

paddle-bot · 2023-12-08T11:28:58Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

megemini · 2023-12-12T10:50:21Z

Update 20231212

@Charles-hit

本 PR 针对 #59130 中提出的兼容问题，重新将 fractional max pooling 2d/3d 分别实现了两个 kernel，主要区别为：

重新实现两个 kernel：fractional_max_pool2d_with_index, fractional_max_pool3d_with_index，以及相关 grad。
所有 kernel 相关的地方都单独实现，不干涉 max_pool2d_with_index, max_pool3d_with_index。
只保留 fractional max pooling 相关的两个参数：output_size 与 random_u (由于 kernel_size 是动态计算的，因此这里直接使用 output_size 而不是原来的 ksize 参数)。
只保留 fractional max pooling 相关逻辑，剔除掉原来 max_poolxd_with_index 中关于 max pooling 与 adaptive max pooling 的部分。
不实现 xpu 部分。由于之前的 PR 是复用 max_poolxd_with_index ，所以需要实现 xpu 部分的签名，而现在是重新实现 kernel ，所以不再需要 xpu 部分了。
单独测试算子 fractional_max_pool2d_with_index, fractional_max_pool3d_with_index ，继承 OpTest 。

具体涉及文件：

paddle/phi/api/yaml/backward.yaml ：反向算子描述
paddle/phi/api/yaml/op_compat.yaml ：兼容算子的参数
paddle/phi/api/yaml/ops.yaml ：前向算子描述
paddle/phi/infermeta/backward.cc ：反向算子
paddle/phi/infermeta/backward.h ：反向算子
paddle/phi/infermeta/unary.cc ：算子 InferMeta
paddle/phi/infermeta/unary.h ：算子 InferMeta
paddle/phi/kernels/cpu/pool_grad_kernel.cc ：注册算子
paddle/phi/kernels/cpu/pool_kernel.cc ：注册算子
paddle/phi/kernels/funcs/pooling.cc ：实现 cpu 算子
paddle/phi/kernels/funcs/pooling.cu ：实现 gpu 算子
paddle/phi/kernels/funcs/pooling.h ：添加 fractional max pooling 计算 index 的算法
paddle/phi/kernels/gpu/pool_grad_kernel.cu ：注册算子
paddle/phi/kernels/gpu/pool_kernel.cu ：注册算子
paddle/phi/kernels/impl/pool_grad_kernel_impl.h ：头文件
paddle/phi/kernels/impl/pool_kernel_impl.h ：头文件
paddle/phi/kernels/pool_grad_kernel.h ：头文件
paddle/phi/kernels/pool_kernel.h ：头文件
python/paddle/nn/init.py ：添加 api
python/paddle/nn/functional/init.py ：添加 api
python/paddle/nn/functional/pooling.py ：实现 fractional_max_pool2d， fractional_max_pool3d
python/paddle/nn/layer/init.py ：添加 api
python/paddle/nn/layer/pooling.py ：实现 FractionalMaxPool2D， FractionalMaxPool3D
test/legacy_test/test_fractional_max_pool2d_api.py ：测试 2d api
test/legacy_test/test_fractional_max_pool2d_op.py ：测试 2d 算子
test/legacy_test/test_fractional_max_pool3d_api.py ：测试 3d api
test/legacy_test/test_fractional_max_pool3d_op.py ：测试 3d 算子
test/white_list/op_accuracy_white_list.py ：参考 max_poolxd_with_index 添加算子精度白名单
test/white_list/op_threshold_white_list.py：参考 max_poolxd_with_index 添加算子精度白名单

目前相关算子与 api 在本地（ubuntu）已经测试通过，CI 中大部分已经通过，只有 windows 相关的几个好像有问题，不知道是不是这几天 windows ci 有问题？另外，windows openblas 的问题：

2023-12-12 14:17:36 ======================================================================
2023-12-12 14:17:36 FAIL: test_check_grad (test_fractional_max_pool3d_op.TestMaxPoolWithIndex_Op)
2023-12-12 14:17:36 ----------------------------------------------------------------------
2023-12-12 14:17:36 Traceback (most recent call last):
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\test_fractional_max_pool3d_op.py", line 175, in test_check_grad
2023-12-12 14:17:36     self.check_grad({'X'}, ['Out'])
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 2969, in check_grad
2023-12-12 14:17:36     self.check_grad_with_place(
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 3242, in check_grad_with_place
2023-12-12 14:17:36     self._assert_is_close(
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 2926, in _assert_is_close
2023-12-12 14:17:36     self.assertLessEqual(max_diff, max_relative_error, err_msg())
2023-12-12 14:17:36 AssertionError: 1.0352556758097368e+23 not less than or equal to 0.005 : Operator fractional_max_pool3d_with_index error, Gradient Check On Place(cpu) variable X (shape: (2, 3, 7, 7, 7), dtype: float64) max gradient diff 1.035256e+23 over limit 5.000000e-03, the first error element is 0, expected 6.172840e-03, but got 6.390467e+20.
2023-12-12 14:17:36 ----------------------------------------------------------------------

感觉像是 openblas 有问题？openblas 中 cpu 的 float64 是不是实现有点问题，好像其他的 pr 中也有类似的问题？参考这个 issue：#55707

另外，如果这个 PR 可以的话，原来的 PR ： #59130 关掉？

请评审～非常感谢！

Charles-hit · 2023-12-17T03:31:34Z

paddle/phi/api/yaml/backward.yaml

+  args : (Tensor x, Tensor mask, Tensor out_grad, int[] output_size, float random_u)
+  output : Tensor(x_grad)
+  infer_meta :
+    func : FractionalMaxPoolWithIndexGradInferMeta


我看这个实现是跟UnchangedInferMeta是一样的，直接配置UnchangedInferMeta即可，并且反向不需要新增infermeta函数

Charles-hit · 2023-12-17T03:37:19Z

paddle/phi/infermeta/unary.cc

+          x_dims.size(),
+          output_size_.size()));
+
+  std::vector<int64_t> output_shape({x_dims[0], x_dims[1]});


上面指定x的dim size是4 或者5 为什么这儿output dim size只能为2了，是不是应该也可以为3

output_size 只需要是 2、3 即可，只需要设置最后的几个 size ～

比如使用 2d 的 api 输入是 [2, 3, 32, 32] ，输出 output_size 是 [18, 18] ；如果使用 3d 的 api 输入是 [2, 3, 32, 32, 32] ，输出 output_size 是 [18, 18, 18] ～

所以使用 x 的前两个 dim ～

Charles-hit · 2023-12-18T03:14:48Z

paddle/phi/kernels/impl/pool_kernel_impl.h

+    } break;
+    default: {
+      PADDLE_THROW(
+          errors::InvalidArgument("Pool op only supports 2D and 3D input."));


这儿是不是应该是output?

这里与其他 max pool 接口统一的，使用的是 2d/3d api，所以写的 input 吧～

Charles-hit · 2023-12-18T03:15:08Z

paddle/phi/kernels/impl/pool_grad_kernel_impl.h

+      } break;
+      default: {
+        PADDLE_THROW(
+            errors::InvalidArgument("Pool op only supports 2D and 3D input."));


input是不是应该改成output

Charles-hit · 2023-12-18T07:09:32Z

python/paddle/nn/functional/pooling.py

+        l_type = 'fractional_max_pool2d_with_index'
+
+        check_variable_and_dtype(
+            x, 'x', ['float32', 'float64'], 'fractional_max_pool2d'


这儿是不是需要检查加上低精度，不然低精度直接报错了

因为是跟 adaptive max pooling 一致的，如果验证没有问题的话我加一下吧～

Charles-hit · 2023-12-18T07:22:17Z

test/legacy_test/test_fractional_max_pool2d_op.py

+
+
+# ----------------fractional_max_pool2d_with_index----------------
+def fractional_max_pool2d_with_index_wapper(


这个wrapper不需要吧直接传入定义的API就好了

确实不需要，不过这里跟 max pooling 的单测保持一致，还是保留吧?

Charles-hit · 2023-12-18T07:22:52Z

test/legacy_test/test_fractional_max_pool2d_op.py

+
+class TestCase2(TestCase1):
+    def init_fractional(self):
+        self.random_u = 0.5


random_u可以加一个为一个0 1的边界case吗

Charles-hit · 2023-12-18T07:23:20Z

test/legacy_test/test_fractional_max_pool3d_api.py

+
+
+if __name__ == '__main__':
+    unittest.main()


修改意见同2d

Charles-hit · 2023-12-18T07:23:42Z

test/legacy_test/test_fractional_max_pool3d_op.py

+
+        def test_check_grad(self):
+            place = core.CUDAPlace(0)
+            numeric_grads = self.get_numeric_grad(place, 'X')


这个好像没用到？

漏掉了～已补充～

Charles-hit · 2023-12-18T07:24:09Z

test/legacy_test/test_fractional_max_pool2d_op.py

+
+        def test_check_grad(self):
+            place = core.CUDAPlace(0)
+            numeric_grads = self.get_numeric_grad(place, 'X')


这个好像没用到

这个没漏掉 ... ...

Charles-hit · 2023-12-18T07:26:09Z

Update 20231212

@Charles-hit

本 PR 针对 #59130 中提出的兼容问题，重新将 fractional max pooling 2d/3d 分别实现了两个 kernel，主要区别为：

重新实现两个 kernel：fractional_max_pool2d_with_index, fractional_max_pool3d_with_index，以及相关 grad。

所有 kernel 相关的地方都单独实现，不干涉 max_pool2d_with_index, max_pool3d_with_index。

只保留 fractional max pooling 相关的两个参数：output_size 与 random_u (由于 kernel_size 是动态计算的，因此这里直接使用 output_size 而不是原来的 ksize 参数)。

只保留 fractional max pooling 相关逻辑，剔除掉原来 max_poolxd_with_index 中关于 max pooling 与 adaptive max pooling 的部分。

不实现 xpu 部分。由于之前的 PR 是复用 max_poolxd_with_index ，所以需要实现 xpu 部分的签名，而现在是重新实现 kernel ，所以不再需要 xpu 部分了。

单独测试算子 fractional_max_pool2d_with_index, fractional_max_pool3d_with_index ，继承 OpTest 。

具体涉及文件：

paddle/phi/api/yaml/backward.yaml ：反向算子描述

paddle/phi/api/yaml/op_compat.yaml ：兼容算子的参数

paddle/phi/api/yaml/ops.yaml ：前向算子描述

paddle/phi/infermeta/backward.cc ：反向算子

paddle/phi/infermeta/backward.h ：反向算子

paddle/phi/infermeta/unary.cc ：算子 InferMeta

paddle/phi/infermeta/unary.h ：算子 InferMeta

paddle/phi/kernels/cpu/pool_grad_kernel.cc ：注册算子

paddle/phi/kernels/cpu/pool_kernel.cc ：注册算子

paddle/phi/kernels/funcs/pooling.cc ：实现 cpu 算子

paddle/phi/kernels/funcs/pooling.cu ：实现 gpu 算子

paddle/phi/kernels/funcs/pooling.h ：添加 fractional max pooling 计算 index 的算法

paddle/phi/kernels/gpu/pool_grad_kernel.cu ：注册算子

paddle/phi/kernels/gpu/pool_kernel.cu ：注册算子

paddle/phi/kernels/impl/pool_grad_kernel_impl.h ：头文件

paddle/phi/kernels/impl/pool_kernel_impl.h ：头文件

paddle/phi/kernels/pool_grad_kernel.h ：头文件

paddle/phi/kernels/pool_kernel.h ：头文件

python/paddle/nn/init.py ：添加 api

python/paddle/nn/functional/init.py ：添加 api

python/paddle/nn/functional/pooling.py ：实现 fractional_max_pool2d， fractional_max_pool3d

python/paddle/nn/layer/init.py ：添加 api

python/paddle/nn/layer/pooling.py ：实现 FractionalMaxPool2D， FractionalMaxPool3D

test/legacy_test/test_fractional_max_pool2d_api.py ：测试 2d api

test/legacy_test/test_fractional_max_pool2d_op.py ：测试 2d 算子

test/legacy_test/test_fractional_max_pool3d_api.py ：测试 3d api

test/legacy_test/test_fractional_max_pool3d_op.py ：测试 3d 算子

test/white_list/op_accuracy_white_list.py ：参考 max_poolxd_with_index 添加算子精度白名单

test/white_list/op_threshold_white_list.py：参考 max_poolxd_with_index 添加算子精度白名单

目前相关算子与 api 在本地（ubuntu）已经测试通过，CI 中大部分已经通过，只有 windows 相关的几个好像有问题，不知道是不是这几天 windows ci 有问题？另外，windows openblas 的问题：
2023-12-12 14:17:36 ======================================================================
2023-12-12 14:17:36 FAIL: test_check_grad (test_fractional_max_pool3d_op.TestMaxPoolWithIndex_Op)
2023-12-12 14:17:36 ----------------------------------------------------------------------
2023-12-12 14:17:36 Traceback (most recent call last):
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\test_fractional_max_pool3d_op.py", line 175, in test_check_grad
2023-12-12 14:17:36     self.check_grad({'X'}, ['Out'])
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 2969, in check_grad
2023-12-12 14:17:36     self.check_grad_with_place(
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 3242, in check_grad_with_place
2023-12-12 14:17:36     self._assert_is_close(
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 2926, in _assert_is_close
2023-12-12 14:17:36     self.assertLessEqual(max_diff, max_relative_error, err_msg())
2023-12-12 14:17:36 AssertionError: 1.0352556758097368e+23 not less than or equal to 0.005 : Operator fractional_max_pool3d_with_index error, Gradient Check On Place(cpu) variable X (shape: (2, 3, 7, 7, 7), dtype: float64) max gradient diff 1.035256e+23 over limit 5.000000e-03, the first error element is 0, expected 6.172840e-03, but got 6.390467e+20.
2023-12-12 14:17:36 ----------------------------------------------------------------------
感觉像是 openblas 有问题？openblas 中 cpu 的 float64 是不是实现有点问题，好像其他的 pr 中也有类似的问题？参考这个 issue：#55707

另外，如果这个 PR 可以的话，原来的 PR ： #59130 关掉？

请评审～非常感谢！

我看了一下他的好像误差比较小，你这儿看着误差特别大，像是溢出了，要不在CI上打一些日志调试一下？原则上这个单测也要通过的。

… hack5_38_kernel

megemini · 2023-12-20T10:10:45Z

Update 20231220

使用 UnchangedInferMeta 代替原算子
注册 cpu 算子支持 float16 (pooling 里面的几个算子有点乱啊，是否可以考虑重构了 ... ... 🤣🤣🤣)
增加 float16/bfloat16 单测
增加 random_u 的范围测试
增加 test_fractional_max_pool2d_op/test_fractional_max_pool3d_op 的 coverage 测试 timeout
这里主要是 ci 中 3d 的算子测试超时了，3d 算子涉及的数据本身比较大，因此增加了 timeout，可否？

目前 CI 主要的测试项已经通过，PR-CI-GpuPS 和 PR-CI-LLM 不清楚为啥挂了，好像不是这两个算子导致的～

之前 windows 的 ci 没过，是不是那几天 windows 的 ci 出啥问题了 ... ...

另外，之前 review 的几个意见已经回复～

@Charles-hit 请评审～

Charles-hit · 2023-12-20T12:41:04Z

test/legacy_test/test_fractional_max_pool2d_api.py

+
+            # use_cuda and core.is_bfloat16_supported(cpu) can not be correctly detected for win32
+            if (
+                sys.platform != 'win32'


windows不支持bf16吗？为什么直接禁用了

core.is_bfloat16_supported 是不是有问题？bfloat16 应该是 gpu 用的吧？windows inference （好像是只有这个 ci）这个 ci 会在 cpu 上进行验证，所以这里关了～

那用core.is_compiled_with_cuda()判断就好了你这儿直接把windows平台给禁了，windows有三条流水线有的会跑GPU的。

这里是两个同时判断的，core.is_compiled_with_cuda() 也就是 use_cuda 和 core.is_bfloat16_supported ～

这也不行～ ci 里面会在 cpu 上验证 bfloat16 ～

没有办法检查 windows inference 这个 ci 环境，所以只能关了 ... ...

这是当时的代码，if use_cuda and core.is_bfloat16_supported(place) 才会测试 bfloat16 ～

def test_dtypes(self): for use_cuda in ( [False, True] if core.is_compiled_with_cuda() else [False] ): place = paddle.CUDAPlace(0) if use_cuda else paddle.CPUPlace() paddle.disable_static(place=place) dtypes = ['float32', 'float64'] if core.is_float16_supported(place): dtypes += ['float16'] if use_cuda and core.is_bfloat16_supported(place): dtypes += ['uint16']

这是当时 ci 的截图 ... ...

另外，这几个算子不涉及什么平台差异性，ubuntu 和 mac 都可以，windows 也可以，windows inference 不行，能不能帮忙看看是不是还有哪个地方没设置好？？？

windows-inference是CUDA11.2

windows是CUDA12

是否和CUDA版本有关系

core.is_bfloat16_supported(paddle.CPUPlace()) 这个会返回true，你直接用use_cuda来决定测不测bf16即可。

Charles-hit · 2023-12-20T12:43:11Z

Update 20231220

使用 UnchangedInferMeta 代替原算子

注册 cpu 算子支持 float16 (pooling 里面的几个算子有点乱啊，是否可以考虑重构了 ... ... 🤣🤣🤣)

增加 float16/bfloat16 单测

增加 random_u 的范围测试

增加 test_fractional_max_pool2d_op/test_fractional_max_pool3d_op 的 coverage 测试 timeout
这里主要是 ci 中 3d 的算子测试超时了，3d 算子涉及的数据本身比较大，因此增加了 timeout，可否？

目前 CI 主要的测试项已经通过，PR-CI-GpuPS 和 PR-CI-LLM 不清楚为啥挂了，好像不是这两个算子导致的～

之前 windows 的 ci 没过，是不是那几天 windows 的 ci 出啥问题了 ... ...

另外，之前 review 的几个意见已经回复～

@Charles-hit 请评审～
这两个流水线需要重新构建

… hack5_38_kernel

megemini · 2024-01-08T09:44:15Z

@Charles-hit

windows 的 ci 环境还是有问题，咱们这个算子不涉及跨平台，大概率应该没什么问题，能不能先 review 代码，看看还有什么要讨论的？谢谢～

luotao1 · 2024-01-09T03:35:30Z

libphi.lib(pooling.cc.obj) : error LNK2019: unresolved external symbol "public: __cdecl pir::InterfaceValue::~InterfaceValue(void)" (??1InterfaceValue@pir@@QEAA@XZ) referenced in function "protected: void __cdecl std::_Tree<class std::_Tset_traits<class pir::InterfaceValue,struct std::less<class pir::InterfaceValue>,class std::allocator<class pir::InterfaceValue>,0> >::_Erase(struct std::_Tree_node<class pir::InterfaceValue,void *> *)" (?_Erase@?$_Tree@V?$_Tset_traits@VInterfaceValue@pir@@U?$less@VInterfaceValue@pir@@@std@@V?$allocator@VInterfaceValue@pir@@@4@$0A@@std@@@std@@IEAAXPEAU?$_Tree_node@VInterfaceValue@pir@@PEAX@2@@Z)

它的pooling.cc.ob编译有问题，我理解phi和pir是独立的，现在phi找不到pir的符号，因该是它引入了bug
windows的机制是不主动暴露符号，生成的库的符号对外部是不可见的。了解到最近pir同学可能将pir和phi解耦了，导致了这些符号在windwos不可见了

comment from @xuxinyi389

megemini · 2024-01-09T04:11:37Z

libphi.lib(pooling.cc.obj) : error LNK2019: unresolved external symbol "public: __cdecl pir::InterfaceValue::~InterfaceValue(void)" (??1InterfaceValue@pir@@QEAA@XZ) referenced in function "protected: void __cdecl std::_Tree<class std::_Tset_traits<class pir::InterfaceValue,struct std::less<class pir::InterfaceValue>,class std::allocator<class pir::InterfaceValue>,0> >::_Erase(struct std::_Tree_node<class pir::InterfaceValue,void *> *)" (?_Erase@?$_Tree@V?$_Tset_traits@VInterfaceValue@pir@@U?$less@VInterfaceValue@pir@@@std@@V?$allocator@VInterfaceValue@pir@@@4@$0A@@std@@@std@@IEAAXPEAU?$_Tree_node@VInterfaceValue@pir@@PEAX@2@@Z)
它的pooling.cc.ob编译有问题，我理解phi和pir是独立的，现在phi找不到pir的符号，因该是它引入了bug

windows的机制是不主动暴露符号，生成的库的符号对外部是不可见的。了解到最近pir同学可能将pir和phi解耦了，导致了这些符号在windwos不可见了

comment from @xuxinyi389

@luotao1 @xuxinyi389 感谢二位帮忙定位问题～ 👍👍👍

刚合入了一下代码，目前看，PR-CI-Windows-OPENBLAS 虽然失败了，但是我这边的算子应该是通过了：

其他两个 windows 的 ci 应该是依赖这个 windows-openblas 的吧？这要怎么处理？

谢谢！

xuxinyi389 · 2024-01-09T05:01:53Z

另外两条看起来没有什么问题，你可以先解决openblas流水线的问题

megemini · 2024-01-09T05:33:23Z

另外两条看起来没有什么问题，你可以先解决openblas流水线的问题

非常感谢！：）

Charles-hit

看了一下没什么问题了，static_check流水线需要关注一下，现在新增api名字跟参数要跟yaml保持一致了。

3. API's name and params should be consistent with op's name and params in yaml.
2024-01-09 11:03:51                 The API or Yaml file you changed may cause inconsistent.
2024-01-09 11:03:51  please request one of the RD (YuanRisheng, zyfncg, chenwhql, phlrain)

megemini · 2024-01-09T10:09:58Z

看了一下没什么问题了，static_check流水线需要关注一下，现在新增api名字跟参数要跟yaml保持一致了。
3. API's name and params should be consistent with op's name and params in yaml.
2024-01-09 11:03:51                 The API or Yaml file you changed may cause inconsistent.
2024-01-09 11:03:51  please request one of the RD (YuanRisheng, zyfncg, chenwhql, phlrain) 

哦？那要怎么改？

def fractional_max_pool2d(
    x,
    output_size,
    kernel_size=None,
    random_u=None,
    return_mask=False,
    name=None,
):

改为

def fractional_max_pool2d_with_index(
    x,
    output_size,
    kernel_size=None,
    random_u=None,
    return_mask=False,
    name=None,
):

还是修改 yaml，把

- op : fractional_max_pool2d_with_index
  args : (Tensor x, int[] output_size, int[] kernel_size = {0, 0}, float random_u = 0.0, bool return_mask = true)
  output : Tensor(out), Tensor(mask)
  infer_meta :
    func : FractionalMaxPoolWithIndexInferMeta
  kernel :
    func : fractional_max_pool2d_with_index
  backward : fractional_max_pool2d_with_index_grad

改为

- op : fractional_max_pool2d
  args : (Tensor x, int[] output_size, int[] kernel_size = {0, 0}, float random_u = 0.0, bool return_mask = true)
  output : Tensor(out), Tensor(mask)
  infer_meta :
    func : FractionalMaxPoolWithIndexInferMeta
  kernel :
    func : fractional_max_pool2d_with_index
  backward : fractional_max_pool2d_with_index_grad

另外，现在只有 name 是算子里面没有的参数，这个要怎么搞？

谢谢！：）

Charles-hit · 2024-01-09T11:19:46Z

看了一下没什么问题了，static_check流水线需要关注一下，现在新增api名字跟参数要跟yaml保持一致了。
3. API's name and params should be consistent with op's name and params in yaml.
2024-01-09 11:03:51                 The API or Yaml file you changed may cause inconsistent.
2024-01-09 11:03:51  please request one of the RD (YuanRisheng, zyfncg, chenwhql, phlrain) 

哦？那要怎么改？

def fractional_max_pool2d(
    x,
    output_size,
    kernel_size=None,
    random_u=None,
    return_mask=False,
    name=None,
):

改为

def fractional_max_pool2d_with_index(
    x,
    output_size,
    kernel_size=None,
    random_u=None,
    return_mask=False,
    name=None,
):

还是修改 yaml，把

- op : fractional_max_pool2d_with_index
  args : (Tensor x, int[] output_size, int[] kernel_size = {0, 0}, float random_u = 0.0, bool return_mask = true)
  output : Tensor(out), Tensor(mask)
  infer_meta :
    func : FractionalMaxPoolWithIndexInferMeta
  kernel :
    func : fractional_max_pool2d_with_index
  backward : fractional_max_pool2d_with_index_grad

改为

- op : fractional_max_pool2d
  args : (Tensor x, int[] output_size, int[] kernel_size = {0, 0}, float random_u = 0.0, bool return_mask = true)
  output : Tensor(out), Tensor(mask)
  infer_meta :
    func : FractionalMaxPoolWithIndexInferMeta
  kernel :
    func : fractional_max_pool2d_with_index
  backward : fractional_max_pool2d_with_index_grad

另外，现在只有 name 是算子里面没有的参数，这个要怎么搞？

谢谢！：）

修改一下yaml op的名字应该就可以了 name不需要关注

megemini · 2024-01-10T07:19:25Z

Update 20240110

修改算子名称

@Charles-hit 请评审～

zyfncg · 2024-01-10T11:29:48Z

paddle/phi/api/yaml/op_compat.yaml

+- op : fractional_max_pool2d
+  inputs :
+    {x : X}
+  outputs :
+    {out : Out, mask : Mask}
+
+- op : fractional_max_pool3d
+  inputs :
+    {x : X}
+  outputs :
+    {out : Out, mask : Mask}
+


新增算子不需要配置这个映射

zyfncg · 2024-01-10T11:31:16Z

paddle/phi/api/yaml/ops.yaml

+  args : (Tensor x, int[] output_size, int[] kernel_size = {0, 0}, float random_u = 0.0, bool return_mask = true)
+  output : Tensor(out), Tensor(mask)
+  infer_meta :
+    func : FractionalMaxPoolWithIndexInferMeta


InferMeta函数名为什么多了WithIndex？

之前为了与其他带有 index 的 pooling 算子命名保持一致所以用的 with index ～我改一下吧～

zyfncg · 2024-01-10T11:32:08Z

paddle/phi/infermeta/unary.h

@@ -350,6 +350,15 @@ void MaxPoolWithIndexInferMeta(const MetaTensor& x,
                               MetaTensor* mask,
                               MetaConfig config = MetaConfig());

+void FractionalMaxPoolWithIndexInferMeta(const MetaTensor& x,


InferMeta函数按照字母序排列

zyfncg · 2024-01-10T11:34:47Z

paddle/phi/kernels/gpu/pool_kernel.cu

+PD_REGISTER_KERNEL(fractional_max_pool2d,
+                   GPU,
+                   ALL_LAYOUT,
+                   phi::FractionalMaxPool2dWithIndexKernel,


kernel名为什么也多了WithIndex?

Charles-hit

LGTM 辛苦简单修改一下上面意见吧

megemini · 2024-01-11T09:32:47Z

Update 20240111

移除 op_compat.yaml 中的修改
修改算子与函数名，不使用 index/idx 等字样
修改函数按字母排序

@Charles-hit @zyfncg 请评审～

sunzhongkai588

LGTM for docs

jeff41404

LGTM

jeff41404 · 2024-01-12T08:44:49Z

code is fine, but the design of API in rfc should be modified to be consistent with the code.

megemini · 2024-01-12T11:37:12Z

code is fine, but the design of API in rfc should be modified to be consistent with the code.

PaddlePaddle/community#798

* [DimExpr] DimExpr support hash (PaddlePaddle#60471) * open warning with `paddle.utils.deprecated` (PaddlePaddle#60458) * open_warning * update unittest * update * fix typos * fix warning in test runner * uncomment * cleanup todo * using VisibleDeprecationWarning * update comment * fix typo * fix indentation * fix * fix * fix indent level and test * update --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> * [AutoParallel] Auto Trans PP to VPP (PaddlePaddle#60467) * [AutoParallel] Auto Trans PP to VPP * add comment * 【PIR OpTest Fix No.23】 fix test_distribute_fpn_proposals_op (PaddlePaddle#60335) * fix * fix * fix test_lookup_table_v2_bf16_op (PaddlePaddle#60332) * Fix shape error in combined-indexing setitem (PaddlePaddle#60447) * add ut * fix shape error in combine-indexing * fix ut * [auto parallel] Add pp lazy init, bug fix for xavier (PaddlePaddle#60441) * [PIR] add slice_array_dense api (PaddlePaddle#60433) * fix * fix * Set value with scalar (PaddlePaddle#60452) * set_value with scalar * fix ut * [PIR]Support custom op in PIR (PaddlePaddle#59790) * support custom op in pir * fix compile bugs * fix bugs * delete code * fix windows bugs * fix windows bugs * add symbol to paddle lib * fix windows bugs * revert code * fix bugs * fix bugs * perfect code according comment * fix py3 * revert third party * fix bugs * fix bug * fix compile bugs * fix windows * [Prim][PIR] support roll, gather, scatter, scatter_nd_add op backward in pir prim (PaddlePaddle#60481) * prim gather op backward * prim scatter op backward * prim roll op backward * prim scatter_nd op backward * [PIR] delete dense_tensor mem_desc_ (PaddlePaddle#60024) * delete dense_tensor mem_desc_ * [PIR] Complement op defs (PaddlePaddle#60475) * complement translation of legacy matmul * Complement op mappings in translation for deformable_conv_v1. * [pir]Supporting constant_folding_pass for train (PaddlePaddle#60355) * [pir]Supporting constant_folding_pass for train * fix * Update constant_folding_pass.cc * [Dynamic Shape] Fuse shape ops into generate shape op pass (PaddlePaddle#60490) * add shape.generate_shape op * rename shape.generate_shape to cinn_op.generate_shape * refactor GenerateShapeOp::SymbolBinding * move GenerateShapeOp related helper functions into generate_shape_util.cc * minor fix * minor fix * backup * refine signature of ConvertDimExprToAttribute * minor fix for signature of ConvertDimExprToAttributes * remove SubstituteDimExpr from generate_shape_util.h * Fix compile error * Fix unittest compile error * Code format * Code format * Fix _hiden_size to _hidden_size (PaddlePaddle#60485) * [DimExpr] Add substitute DimExpr util (PaddlePaddle#60493) * add SubstituteDimExpr * Fix compile error * Code format * Polish DimExprUtilTest * Change namesapce * Fix unittest * Polish DimExprUtilTest * [xpu]add sine_pos fuse pass and sine_pos xpu kernel (PaddlePaddle#60025) * add split with variable in factors and rewrite vectorize,unroll,bind error handling mechanism (PaddlePaddle#60449) * [CodeStyle] Fix regression of Ruff in sot (PaddlePaddle#60483) * support cast op from FP32 to low precision (PaddlePaddle#60385) * test=document_fix (PaddlePaddle#60399) * [XPU] refine flash attention ut (PaddlePaddle#60474) * [XPU] refine flash attention ut * refine tolerance * [Inference] support collect shape in sub block (PaddlePaddle#60451) * support collect shape in sub block * udpate * udpate * fix process mesh incorrect set in converter (PaddlePaddle#60504) * 【CMake opt No.13】Remove CINN DEPS in test/cpp/pir/shape_dialect/CMakeLists.txt (PaddlePaddle#60517) * Update CMakeLists.txt * Apply suggestions from code review * Apply suggestions from code review * Update CMakeLists.txt * Update CMakeLists.txt * 【pir】 add tensorarray op createarrylike, add_n (PaddlePaddle#60460) * optimize backward * [PIR] add vjp interface for while op * [PIR] fix ci error. * modify while stopgradient * merge * modify while grad bug * modify while grad op * modify * increment vp * [PIR] add get_used_external_value interface for block. * while case * delete print * delete print * Update python/paddle/autograd/ir_backward.py * [PIR] add unit_test for get_used_external_value * modify while_loop * code_style * modofy ci bug * modify while api * modify ci * modify array * Update python/paddle/autograd/ir_backward.py * Update test/legacy_test/test_cond.py * update * modify array_write grad info * merge * add_n and createarraylike * conflict * modify exe bug * modify kernel choose --------- Co-authored-by: winter-wang <1030748926@qq.com> * Add align iter space tactic (PaddlePaddle#60498) Add align iter space tactic * [Dynamic Shape] Add helper function MakeGenerateShapeOpAttribute (PaddlePaddle#60512) * add helper function MakeGenerateShapeOpAttribute * fix complier complaint * Code format * [Prim][PIR] Set prim gflag for pure cpp (PaddlePaddle#60505) * inference support decomp * polish code * add decomp base define * add decomp base define2 * change decomp infer * fix symbol overload * fix test case * debug * debug * decomp add debug info * add cpp flag * revert * remove unused flag * [PIR] Refine and fix pir exe (PaddlePaddle#60443) * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * update 2023 security advisory, test=document_fix (PaddlePaddle#60527) * [Inference] refine common/*.h for inference lib (PaddlePaddle#60513) * 【complex op】No.19 add complex support for triangular_solve (PaddlePaddle#59529) * fix reshard dist_attr (PaddlePaddle#60535) * 【auto parallel】剔除切分推导相关的头文件对proto 的依赖 (PaddlePaddle#60543) * decouple proto * format * format * strcuct pre def * [PIR] Support Operation::Clone Interface (PaddlePaddle#60536) * [PIR] Support Operation::Clone Interface * modify into shared_ptr * [Dynamic Shape] Add FullyInsertBroadcastPass and Broadcast Op (PaddlePaddle#60511) * add ShapeBroadcastOp * add pass FullyInsertBroadcastPass * InferSymbolicShape of BroadcastShape Op * Delete unit test * Fix return error * Code format * Fix error message * Update paddle/cinn/hlir/dialect/operator/transforms/fully_insert_broadcast_pass.cc Co-authored-by: Bo Zhang <105368690+zhangbopd@users.noreply.github.com> --------- Co-authored-by: Bo Zhang <105368690+zhangbopd@users.noreply.github.com> * Fix OpTranslatorTest name (PaddlePaddle#60518) * fix name * fix name * fix name * fix name * [PIR] migrate DataFeeder into pir (PaddlePaddle#60434) * 【PIR API adaptor No.90,92】Migrate some ops into pir (PaddlePaddle#59801) * [DimExpr] Convert Broadcast to BroadcastTree (PaddlePaddle#60440) * backup BroadcastTree * add SubstituteDimExpr * add helper function ConstructBroadcastTree * Fix compile error * Code format * Polish DimExprUtilTest * Add cmake file * Change namesapce * Fix compile error * Fix unittest * reconstruct BroadcastTree * Polish DimExprUtilTest * Reconstruct BroadcastTree * Finish BroadcastBranch * Finish BroadcastBranch * Finish BroadcastBranch * Add Unittest * Remove unnecessary dim_expr_util * Add header file * [Dynamic Shape] Erase expand (PaddlePaddle#60525) * EraseExpandOp * minor fix * minor fix * Code format * [inference] Support wint4 groupwise with cutlass gemm (PaddlePaddle#60422) * support gemv-groupwise func && weightQuanter-groupwise && weightDeQuanter-groupwise * fix build bug * add unit_test && fix bug * delete useless code * fix ci build bug * fix ci && optimize * fix merge conflict * add op change info * fix weight_only_linear_pass * fix format * solve ci unit_test * init * support cutlass gemm with groupwise * add unit test * fix strange bug * delete random bug * fix sm70 build bug * try to fix ci build bug * fix bug * fix volta build bug * skip sm70 in groupwise mode * change cutlass branch * simplify extent of loop after fuse and add corresponding test case (PaddlePaddle#60538) * fix bug of put_along_axis (PaddlePaddle#60551) * remove clearPass to allow custom device use fusion under fp16 (PaddlePaddle#60541) * fix fleetutil get_online_pass_interval bug2; test=develop (PaddlePaddle#60544) * fix vs2017 limit (PaddlePaddle#60528) * 【Hackathon 5th No.20】为 Paddle 新增 Exponential 和 Gamma API (PaddlePaddle#57899) * add exponential * add gamma distribution * refine docs * add kl_divergence and test * resolve conflicts * resolve conflicts * fix bug * refine test * fix test timeout * refine code * add standard_gamma kernel * fix comments * fix tests * fix tests * fix comments * fix tests * fix gamma grad * fix yaml * fix bugs * fix tests * fix standard_gamma_grad * fix test * fix test * add cdf & icdf * add cdf & icdf * refine comments * fix * fix * fix head file * fix * fix cuda op * fix * fix * refine test * fix test * refine comments * fix comments * fix * fix * fix type check * fix docs * delete useless comments * [CINN] Add IntrinsicOps into ir_codes_collector (PaddlePaddle#60556) This PR fixed a bug of running Resnet PaddleClas. The bug is due to vectorize introduce an intrinsic GetAddr and we didn't collect the tensor of GetAddr in ir_node_collector, this would caused tensor alias won't create in cuda code. TODO: we may modify IntrinsicOp in the near future * 【auto parallel】custom op spmd rule register (PaddlePaddle#60509) * custom op spmd rule register * custom op spmd rule register * custom op spmd rule register * custom op spmd rule register * polish * 【AutoParallel】Add master grad in AMP-O2 of AutoParallel (PaddlePaddle#59987) * add master_grad in auto-parallel * reset third_party * fix coverage * support bf16 master_grad * fix bug in master_grad * change code according to review * change the way to find optimizer op * [Dy2St] Fix `NameloadJstTransformer` missing transform call kwargs (PaddlePaddle#60515) --------- Co-authored-by: gouzil <66515297+gouzil@users.noreply.github.com> * cinn(backends): generate infer shape kernel to infer shape of output tensor (PaddlePaddle#60519) 通过二维指针来返回后端infer shape的结果。生成的cinn ir如下。tensor_shape_args是一个二维指针。 infer_shape_set_value(0, 0, S1, tensor_shape_args) 表示将第0个output tensor的第0维设置为S1。 * fix tensor math method inplace converter (PaddlePaddle#60546) * [xpu]Add vis_decoder_attention_xpu_pass && modify qkv_attention_xpu_kernel (PaddlePaddle#60361) * [Prim][PIR] support abs, instance_norm op backward in prim pir (PaddlePaddle#60444) * abs op backward * add test case * update code * update code * update code * update code * update code * instance_norm op backward * add instance_norm_v2 test cast * custom op * [PIR] remove log simply name mechnism from phi to common. (PaddlePaddle#60507) * [InferSymbolicShape] Delete redundent value_id_to_shapeordata_ (PaddlePaddle#60554) * 【Hackathon 5th No.25】add gammaln api (PaddlePaddle#60553) * fix (PaddlePaddle#60570) * [CINN] Add tile tactic and bind cuda tactic (PaddlePaddle#60534) * [CINN] Add tile tactic * [CINN] Add bind cuda tactic * 【PIR OpTest Fix No.8】 fix test_shuffle_batch_op (PaddlePaddle#59631) * fix test_shuffle_batch_op * fix * 【PIR OpTest Fix No.14】 fix test_nce (PaddlePaddle#60255) * fix test_nce * fix test_nce * Update ops.yaml * fix * Update utils.cc * Update ops.yaml * 【PIR OpTest Fix No.19】 fix test_ftrl_op (PaddlePaddle#60329) * fix test_ftrl_op * fix * [auto parallel] Lazy init for MP. Add reshard infer shape. (PaddlePaddle#60563) * [PIR] Add unittest for Operation::Clone and Group::Clone (PaddlePaddle#60577) * [PIR] dce pass disable custom op (PaddlePaddle#60578) * [Inference] Fix bug of RunWithExternalStream API in new executor (PaddlePaddle#60122) * fix bug of RunWithExternalStream API in new executor * add test * fix bug of RunWithExternalStream API in new executor * reset flage in RunWithExternalStream * fix bug * add param swith_stream * fix bug * modify python api * fix bug * Resubmit PR-58859 (PaddlePaddle#60310) * allow multiple rng state in generator * Fix 60142; Fix some comments from sneaxiy * Overwrite copy constructors * add api * pre-commit * tensor_array slice in PIR (PaddlePaddle#60503) * use slice_array, now will meet error of destory opresult still in use * disable the pir test until the bug fixed * Set DistModel state_dict keys to structure_names (PaddlePaddle#60478) * exclude xpu * check structure name mapping * test pp * polish * support dynamic save static load * support dygraph save static load * polish * polish * use structured_name as key in DistModel state_dict * polish * polish * fix checkpoint path conflict * test get_rank_to_files * static save dynamic load test * fix sm75 build bug (PaddlePaddle#60583) * replace LOG(INFO) with VLOG(6) * Add CanProveDivisible for symbolic calculation (PaddlePaddle#60572) * add CanProveDivisible for symbolic calculation * delete extra cout for debug * fix according to some comments * [PIR][DynamicShape] make shape pass default and fix some bugs (PaddlePaddle#60548) att, make shape pass default and fix some bugs * Fix words (PaddlePaddle#60603) * 【auto parallel】custom op use spmd rule (PaddlePaddle#60571) * custom op use smpd rule * custom op use smpd rule * [auto parallel] add lazy init ut to llama (PaddlePaddle#60585) * 【pir】 modify array_write and array_read vjp , add a simple while with array_write (PaddlePaddle#60575) * optimize backward * [PIR] add vjp interface for while op * [PIR] fix ci error. * modify while stopgradient * merge * modify while grad bug * modify while grad op * modify * increment vp * [PIR] add get_used_external_value interface for block. * while case * delete print * delete print * Update python/paddle/autograd/ir_backward.py * [PIR] add unit_test for get_used_external_value * modify while_loop * code_style * modofy ci bug * modify while api * modify ci * modify array * Update python/paddle/autograd/ir_backward.py * Update test/legacy_test/test_cond.py * update * modify array_write grad info * merge * add_n and createarraylike * conflict * modify array_write vjp * modify array_write vjp * Update paddle/fluid/pybind/manual_static_op_function.h * modify array_write vjp * modify ci bug * modify * modify * Update test/legacy_test/test_while_loop_op.py * modify inplace array_read * Update test/legacy_test/test_while_op.py * Update test/ir/pir/test_while_api.py --------- Co-authored-by: winter-wang <1030748926@qq.com> * [Prim][PIR] add leaky_relu, sigmoid, instance_norm op forward prim (PaddlePaddle#60564) * hardswish op prim sink * hardswish op prim * add composite * add leaky_relu, sigmoid op forward prim * remove hardswish op forward * add instance_norm op forward prim * [CINN]Add bucket context (PaddlePaddle#60549) * [CINN] Add tile tactic * [CINN] Add bind cuda tactic * [CINN] Add bucket contexts * fix group output args bug * Add CUDNNv8 max pooling (PaddlePaddle#59413) * Add CUDNNv8 version of pool2d * Minor fix * Fix build failure * Remove dygraph API * Fix CI failure * Fix CI failure * Fix timeout * Fix timeout * Add comments * Minor fix * update lbfgs to avoid the randomness caused by paddle.dot() temporarily (PaddlePaddle#60591) * update lbfgs to avoid the randomness caused by paddle.dot() temporarily * add note * set_pir_tests_properties for some tests (PaddlePaddle#60401) * fix * Update CMakeLists.txt * Update pir_op_test_white_list * Update pir_op_test_white_list * Update pir_op_test_white_list * Add tests to whitelist (PaddlePaddle#60522) * fix * add * fix double grad without convert inplace (PaddlePaddle#60614) * fix fleetutil get_online_pass_interval bug3 (PaddlePaddle#60615) * fix fleetutil get_online_pass_interval bug3; test=develop * fix fleetutil get_online_pass_interval bug3; test=develop * fix fleetutil get_online_pass_interval bug3; test=develop * [PIR][DynamicShape] Add an example for broadcast in dynamic shape infer (PaddlePaddle#60608) * Add an example for broadcast in dynamic shape infer * fix_convert_all_blocks (PaddlePaddle#60613) * fix_convert_all_blocks * [Paddle-TRT] support set_value dynamic shape (PaddlePaddle#60508) [Paddle-TRT] support set_value dynamic shape (PaddlePaddle#60508) * fix (PaddlePaddle#60625) * [PIR] Support Region Clone in Operation::Clone (PaddlePaddle#60590) * deg2rad test passed (PaddlePaddle#60619) * [PIR+CINN]Fix Pool2d Variant Attibute for kernel_size (PaddlePaddle#60623) * [PIR+CINN]Fix Pool2d Variant Attibute for kernel_size * fix padding_size * fix pooling_type * [SOT] move_gpu_pinned_to_gpu (PaddlePaddle#60395) * PIR API adaptor No.35、40】 Migrate paddle.nn.ChannelShuffle/ClipGradByNorm into pir (PaddlePaddle#60445) * fix some bugs * fix bugs * Update clip.py * Update test_channel_shuffle.py * Update test_clip_by_norm_op.py * Update test_clip_by_norm_op.py * add param name for dist_tensor parameter (PaddlePaddle#60574) * Fix (PaddlePaddle#60631) * [PIR] Reify InferSymbolicShapeInterface (PaddlePaddle#60438) * Reify InferSymbolicShapeInterface * [Dynamic Shape] Remove ShapeBroadcastOp redundant codes (PaddlePaddle#60609) * [Dy2St] fix `test_grad` in PIR mode (PaddlePaddle#60621) --------- Co-authored-by: xiaoguoguo626807 <100397923+xiaoguoguo626807@users.noreply.github.com> * reconstruct llama ci cases (PaddlePaddle#60637) * 【AutoParallel】Unify the fp16 and bf16 in auto-parallel (PaddlePaddle#60514) * unify the fp16 and bf16 * change white_list in AMP * add dtype support * fix bug in dtype * [Dynamic Shape] Add SplitGenerateShapeIntoShapeOpsPass (PaddlePaddle#60624) * [Dynamic Shape] Add SplitGenerateShapeIntoShapeOpsPass * Fix compile error * Fix compile error * update pdsa-2023-019, test=document_fix (PaddlePaddle#60646) * [SOT] sot export test files (PaddlePaddle#60547) * Improve the performence of put_along_axis (PaddlePaddle#60618) * fix bug of put_along_axis * improve performence of put_along_axis * [AutoParallel] Fit vpp for gradient_merge pass (PaddlePaddle#60560) * add dist attr * add op namescope * add test_semi_auto_parallel_hybrid_strategy (PaddlePaddle#60537) * [PIR]Open uts for AdaptiveAvgPool3D (PaddlePaddle#60636) * test (PaddlePaddle#60654) * [CINN] Add OptimizeReductionTactic (PaddlePaddle#60661) * [Paddle-Trt]update set_value cmakelist (PaddlePaddle#60664) [Paddle-Trt]update set_value cmakelist * [auto parallel] fix reshape infer shape (PaddlePaddle#60632) * [CINN+PIR]Clean Old GroupScheduler logic and switch into new_group_scheduler (PaddlePaddle#60642) * [CINN]Fix HasDynamicShape Bug while Type is NULL (PaddlePaddle#60658) * [PIR] pir onednn support legact istruction and lrn (PaddlePaddle#60502) * pir onednn support legact istruction and lrn * c_softmax_with_cross_entropy support bf16 for xpu (PaddlePaddle#60472) * enable custom device to use silu_fuse_pass (PaddlePaddle#60595) move SetUseCustomDevice to all platform * [XPU] add empty_like op and test, update XHPC to 20240105 (PaddlePaddle#60617) * [XPU] update XHPC date and refine FA ut (PaddlePaddle#60598) * [XPU] update XHPC date * update comments for ut * correct adamw bf16 unit test and the way to get data type (PaddlePaddle#60565) * Fix some PADDLE_THROW error type and change test cases (PaddlePaddle#60487) * fix error type * fix TypeError fix type fix fix fix fix * fix typo * as_complex as_real check_grad (PaddlePaddle#60666) * [Fix Bug] Fix Bugs of Two Pass (PaddlePaddle#60626) * [Fix Bug] Fix Bugs of Two Pass * Fix GenerateShapeOp bug * Modify unit test * Fix MakeGetterDimExpr4SymbolName * 【Hackathon 5th No.34】为 Paddle 新增 bitwise_right_shift / bitwise_right_shift_ / bitwise_left_shift / bitwise_left_shift_ API (PaddlePaddle#58092) * This PR enable offset of generator for custom device. (PaddlePaddle#60616) * [SOT] Convert dtype to `DataType` in PIR mode (PaddlePaddle#60627) * [PIR] Change output to block_arg from copy to a shared for the execution of while (PaddlePaddle#60607) * test * fix * fix * fix * 【auto parallel】custom op spmd infer add args check (PaddlePaddle#60633) * add bound check * add bound check * [PIR] Open PIR flag for test_ifelse (PaddlePaddle#60685) * open pir flag for test_ifelse * Update test_ifelse.py * Update test_ifelse.py * [CIN+PIR]Fix SplitOpPattern Bug in pd_to_cinn_pass (PaddlePaddle#60669) * [CIN+PIR]Fix SplitOpPattern Bug in pd_to_cinn_pass * fix index error * refine pir_all_path UT * fix bug * fix uncontiguous tensor resize bug (PaddlePaddle#60684) * fix uncontiguous tensor resize bug * [PIR]Support inplace custom op in pir (PaddlePaddle#60529) * support inplace in pir * fix inference ut * fix win bugs * fix win bug * fix * polish code * polish code * print log * print log * debug * fix win bugs * fix windows * fix (PaddlePaddle#60634) * [Docs] Update latest release version in README (PaddlePaddle#60691) * [CINN] Refine cmake for pass in cinn (PaddlePaddle#60683) * refine cmake for pass in cinn * add dependency in cmake * add dependency in cmake * [PIR]Open uts for PReLU (PaddlePaddle#60645) * [PIR]Open uts for ReLU6 (PaddlePaddle#60650) * [PIR]Open uts for RReLU (PaddlePaddle#60660) * [NPU] fix storage_properties type mismatch with OneDNN and NPU (PaddlePaddle#60566) * fix ttfnet_darknet53_1x_coco in pir mode (PaddlePaddle#60663) * [auto parallel] shard tensor stop gradient support (PaddlePaddle#60699) * [PIR][DynamicShape] Polish some codes (PaddlePaddle#60651) att, polish some codes * [PIR] fix onednn double reg (PaddlePaddle#60720) * fix onednn double reg * 【pir】modify add_n in while use blockarg instead of input value (PaddlePaddle#60668) * test * fix * fix * fix * modify add_n block_arg * modify increment return value * merge * modfiy whiel_op.py --------- Co-authored-by: zhangbo9674 <zhangbo54@baidu.com> * [PIR] Open test_case ut (PaddlePaddle#60721) * fix * fix * [PIR] rename data_layout (PaddlePaddle#60678) * rename data_layout * [xpu]: check op is null (PaddlePaddle#60656) * 【Hackathon 5th No.1】为 Paddle 新增 copysign API (PaddlePaddle#57785) * add copysign op * fix codestyle * codestyle * fix test * fix std bug * merge init * merge init * merge init * add static cast * add std * static cast * static cast * copysignf * static cast to float input * float input * static cast to double input * fix * add inplace test * fix api * fix cast when grad * modify paddle.cast_ to cast_ * remove cast in python api * support fp16 && bf16 * set grad y to zero * fix en doc * support number input * add hostdevice * refactor kernel * fix nan when backward * add broadcast unit test * modify .cu * Update __init__.py * Update __init__.py * for ci test * static float * codestyle * static double * fix broadcast, try coverage * Delete paddle/phi/kernels/funcs/broadcast_function.h * remove unused * Update math.py * Update math.py * fix en doc * add test for output dtype, integer unsupported for now * update * update * fix * fix * add cast for input * fix * add pir test * fix doc * fix doc * fix doc * detail doc * adjust for MSVC * fix * Update python/paddle/tensor/math.py Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com> * Update python/paddle/tensor/math.py Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com> * fix doc output dtype, fix Equation * codestyle * codestyle * Update math.py --------- Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com> * rms_norm_infer_spmd (PaddlePaddle#60709) * [PIR]Open more tests for bernoulli and celu (PaddlePaddle#60706) * bernoulli && celu * celu test_error * [PIR]Open uts for scatter_nd_add (PaddlePaddle#60698) * [PIR]Open uts for scatter_nd_add * Fix ut * [PIR]Open uts for sinh (PaddlePaddle#60714) * [PIR]Open uts for Softshrink and Softsign (PaddlePaddle#60716) * [PIR] polish the ir_mapping implimentation. (PaddlePaddle#60675) * [PIR] fix onednn layout transform yaml format (PaddlePaddle#60680) * fix onednn layout transform yaml format * 【CINN】Complete error handler mechanism of dynamic schedule (PaddlePaddle#60718) * complete error handler mechanism of dynamic schedule * fix some output info * fix windows C++17 bug (PaddlePaddle#60736) * [XPU] fc pass and delete pass nodes check (PaddlePaddle#60314) * fix_local_windows_compile (PaddlePaddle#60682) * [PIR] fix onednn dialect name (PaddlePaddle#60665) * fix onednn dialect name * 【pir】add tesnor to array kernel etc (PaddlePaddle#60703) * merge * modfiy kernel * modify net * modify print * Fix defition definition (PaddlePaddle#60679) * cholesky and cholesky_solve tests (PaddlePaddle#60726) * [PIR]Open uts for searchsorted (PaddlePaddle#60700) * [PIR]Open uts for selu (PaddlePaddle#60702) * [PIR]Open uts for selu * Fix ut * [PIR]Open uts for sequence_mask (PaddlePaddle#60704) * [PIR] adjust pir pass log printing (PaddlePaddle#60723) * adjust pir pass log printing * update * update * update * fix compile * Fix Throughtput Throughput (PaddlePaddle#60741) * please last md (PaddlePaddle#60749) * [CINN+PIR]Fix Fetch XShape Variable logic (PaddlePaddle#60722) * [PIR][DynamicShape] Remove redundant code for shapeAnalysis and shapedTypeInterface (PaddlePaddle#60744) att, remove redundant code for shapeAnalysis and shapedTypeInterface * 【PIR Dist Op Reg No.1】 reg push_sparse_v2 (PaddlePaddle#60473) * code reg push_sparse_v2 * [Dynamic Shape] Provide operator<< For BroadcastTree (PaddlePaddle#60730) * [PIR] change IR clone to const and support clone operation successors (PaddlePaddle#60752) * support ir clone const and support clone operation successors * refine ir_mapping * refine region clone * [CINN] Refine fully_insert_broadcast_pass (PaddlePaddle#60676) * refine fully_insert_broadcast_pass * fix complie bug * fix complie * fix conflict * [PIR] einsum's inner_cache and xshape set to optional (PaddlePaddle#60748) * einsum's inner_cache and xshape set to intermediate * Update paddle/fluid/pir/dialect/operator/ir/ops.yaml --------- Co-authored-by: kangguangli <kangguangli@hotmail.com> * reduce runtime of unit-tests in windows-trt (PaddlePaddle#60731) * modify trt test to deal with Timeout * windows * [Paddle-TRT] upgrade EnqueueV2 to EnqueueV3 (PaddlePaddle#59950) * 【Hackathon 5th No.110】为 Paddle 增强 sparse.matmul API (PaddlePaddle#59890) * Fix rank_relatvie rank_relative (PaddlePaddle#60770) * add graph_key to specific graph's varmap (PaddlePaddle#60567) * add graph_key to specific graph's varmap * fix inpalce case * fix inpalce case * 【Hackathon 5th No.38】为 Paddle 新增 FractionalMaxPool2d / FractionalMaxPool3d API -kernel (PaddlePaddle#59847) * [Init] add fractional max pool kernel and api * [Fix] pooling.cu seed offset * [Change] remove adaptive from fractional max pool * [Change] fractional max 2d gpu pooling.cu grad * [Change] fractional max 2d gpu pooling.cu grad with dim3 * [Change] use UnchangedInferMeta * [Change] test api with uint16 * [Change] wrap test disable_static * [Change] regiester float16/bfloat16 * [Change] remove bfloat16 from cpu kernrl * [Change] test dtypes in cpu and gpu * [Change] test_fractional_max_pool3d_2d/3d timeout to 30s * [Fix] resolve conflict * [Change] win32 cannot detect bfloat16 correctly * [Change] force set_device * [Add] test random_u is None * [Change] use kernel_size for overlapping mode * [Change] clean headers * [CodeStyle] pooling * [Change] rename op * [Change] rename func without index * [Prim][PIR] Recover pir bn (PaddlePaddle#60689) * reopen bn prim pir * fix atol * decomp support batch_norm_ * fix test case * fix bug * fix code * [PIR]fc_with_special_op_fuse_pass bug fix (PaddlePaddle#60751) * bug fix update * update * delete all debug message * add code deleted wrong at last commit * delete createAutoMixedPrecisionPass in analysis_predictor.cc --------- Co-authored-by: HongyuJia <jiahongyu@baidu.com> Co-authored-by: ooo oo <106524776+ooooo-create@users.noreply.github.com> Co-authored-by: SigureMo <sigure.qaq@gmail.com> Co-authored-by: zhaoyingli <86812880+zhaoyinglia@users.noreply.github.com> Co-authored-by: xingmingyyj <135400902+xingmingyyj@users.noreply.github.com> Co-authored-by: JYChen <zoooo0820@qq.com> Co-authored-by: Yuang Liu <liuyuang@baidu.com> Co-authored-by: zhangbo9674 <82555433+zhangbo9674@users.noreply.github.com> Co-authored-by: YuanRisheng <yuanrisheng@baidu.com> Co-authored-by: kevin <chengyf112@gmail.com> Co-authored-by: wanghuancoder <wanghuan29@baidu.com> Co-authored-by: kangguangli <kangguangli@hotmail.com> Co-authored-by: zhangyuqin1998 <75946871+zhangyuqin1998@users.noreply.github.com> Co-authored-by: co63oc <co63oc@users.noreply.github.com> Co-authored-by: NeroLoh <745827440@qq.com> Co-authored-by: 傅剑寒 <Xs1580802568@gmail.com> Co-authored-by: lzydev <lizhiyu02@baidu.com> Co-authored-by: tianshuo78520a <707759223@qq.com> Co-authored-by: houj04 <35131887+houj04@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com> Co-authored-by: LiYuRio <63526175+LiYuRio@users.noreply.github.com> Co-authored-by: 张春乔 <83450930+Liyulingyue@users.noreply.github.com> Co-authored-by: xiaoguoguo626807 <100397923+xiaoguoguo626807@users.noreply.github.com> Co-authored-by: winter-wang <1030748926@qq.com> Co-authored-by: BiynXu <62832681+BiynXu@users.noreply.github.com> Co-authored-by: cyber-pioneer <116002591+cyber-pioneer@users.noreply.github.com> Co-authored-by: Vigi Zhang <VigiZhang@users.noreply.github.com> Co-authored-by: zbt78 <1095497213@qq.com> Co-authored-by: liuzhenhai93 <liuzhenhai93@outlook.com> Co-authored-by: Aurelius84 <zhangliujie@baidu.com> Co-authored-by: Bo Zhang <105368690+zhangbopd@users.noreply.github.com> Co-authored-by: Lu Qi <61354321+MarioLulab@users.noreply.github.com> Co-authored-by: LoneRanger <836253168@qq.com> Co-authored-by: freeliuzc <lzc842650834@gmail.com> Co-authored-by: YibLiu <68105073+YibinLiu666@users.noreply.github.com> Co-authored-by: engineer1109 <jialiang.wang@xdxct.com> Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com> Co-authored-by: xuxinyi389 <104957571+xuxinyi389@users.noreply.github.com> Co-authored-by: MayYouBeProsperous <ljmhz@outlook.com> Co-authored-by: Huihuang Zheng <zhhsplendid@163.com> Co-authored-by: gouzil <66515297+gouzil@users.noreply.github.com> Co-authored-by: 6clc <chaoliu.lc@foxmail.com> Co-authored-by: Terry <38135104+TR666@users.noreply.github.com> Co-authored-by: winter-wang <78149749+winter-wang@users.noreply.github.com> Co-authored-by: Wang Xin <xinwang614@gmail.com> Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com> Co-authored-by: Frank Lin <eee4017@gmail.com> Co-authored-by: pangengzheng <117730991+pangengzheng@users.noreply.github.com> Co-authored-by: lanxianghit <47554610+lanxianghit@users.noreply.github.com> Co-authored-by: Tian Zheng <tizheng@nvidia.com> Co-authored-by: lijialin03 <124568209+lijialin03@users.noreply.github.com> Co-authored-by: Wangzheee <634486483@qq.com> Co-authored-by: zhink <33270771+zhink@users.noreply.github.com> Co-authored-by: huangjiyi <43315610+huangjiyi@users.noreply.github.com> Co-authored-by: Chen Zhiyang <1792266893@qq.com> Co-authored-by: feifei-111 <2364819892@qq.com> Co-authored-by: fsczz <57291768+fsczz@users.noreply.github.com> Co-authored-by: Haohongxiang <86215757+haohongxiang@users.noreply.github.com> Co-authored-by: Sonder <55493212+AndSonder@users.noreply.github.com> Co-authored-by: Liujie0926 <44688141+Liujie0926@users.noreply.github.com> Co-authored-by: WangZhen <23097963+0x45f@users.noreply.github.com> Co-authored-by: risemeup1 <62429225+risemeup1@users.noreply.github.com> Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com> Co-authored-by: zhangyikun02 <48021248+zhangyk0314@users.noreply.github.com> Co-authored-by: Jianbang Yang <yangjianbang112@gmail.com> Co-authored-by: enzodechine <enzo9533@hotmail.com> Co-authored-by: Zhan Rongrui <46243324+zrr1999@users.noreply.github.com> Co-authored-by: coco <69197635+cocoshe@users.noreply.github.com> Co-authored-by: zhaohaixu <49297029+zhaohaixu@users.noreply.github.com> Co-authored-by: chen2016013 <111894720+chen2016013@users.noreply.github.com> Co-authored-by: zyfncg <zhangyunfei07@baidu.com> Co-authored-by: Qi Li <qili93@qq.com> Co-authored-by: zhangbo9674 <zhangbo54@baidu.com> Co-authored-by: Liuyinfeng <30849840+gitliuyf@users.noreply.github.com> Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com> Co-authored-by: wendaxiao <113992173+wenxiaohahaha@users.noreply.github.com> Co-authored-by: cyberslack_lee <luhputu0815@gmail.com> Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com> Co-authored-by: GGBond8488 <33050871+GGBond8488@users.noreply.github.com> Co-authored-by: megemini <megemini@outlook.com>

[Init] add fractional max pool kernel and api

ada4fa5

megemini changed the title ~~【Hackathon 5th No.38】为 Paddle 新增 FractionalMaxPool2d / FractionalMaxPool3d API~~ 【Hackathon 5th No.38】为 Paddle 新增 FractionalMaxPool2d / FractionalMaxPool3d API -kernel Dec 8, 2023

paddle-bot bot added the contributor External developers label Dec 8, 2023

megemini added 3 commits December 9, 2023 12:48

[Fix] pooling.cu seed offset

6411892

[Change] remove adaptive from fractional max pool

ac2151c

[Change] fractional max 2d gpu pooling.cu grad

6d6dcf8

luotao1 mentioned this pull request Dec 11, 2023

【PaddlePaddle Hackathon 5th】开源贡献个人挑战赛 #57262

Open

luotao1 added the PaddlePaddle Hackathon label Dec 11, 2023

luotao1 assigned luotao1 and Charles-hit Dec 11, 2023

[Change] fractional max 2d gpu pooling.cu grad with dim3

7b3ef68

Charles-hit reviewed Dec 18, 2023

View reviewed changes

megemini added 11 commits December 18, 2023 18:53

[Change] use UnchangedInferMeta

80caf0f

[Change] test api with uint16

f512f96

[Change] wrap test disable_static

f092060

[Change] regiester float16/bfloat16

a23d0f2

[Change] remove bfloat16 from cpu kernrl

a6f18d4

[Change] test dtypes in cpu and gpu

8fd2c3f

[Change] test_fractional_max_pool3d_2d/3d timeout to 30s

940499d

[Fix] resolve conflict

36aee10

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

7d65773

… hack5_38_kernel

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

f6182d4

… hack5_38_kernel

[Change] win32 cannot detect bfloat16 correctly

0e8df70

megemini requested a review from Charles-hit December 20, 2023 10:11

Charles-hit reviewed Dec 20, 2023

View reviewed changes

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

98d4c9b

… hack5_38_kernel

megemini added 2 commits January 9, 2024 10:15

[Change] clean headers

34b4d28

[CodeStyle] pooling

c1305fe

Charles-hit reviewed Jan 9, 2024

View reviewed changes

megemini requested a review from Charles-hit January 9, 2024 10:11

[Change] rename op

6cc617d

zyfncg reviewed Jan 10, 2024

View reviewed changes

Charles-hit requested review from zyfncg and sunzhongkai588 January 10, 2024 12:31

Charles-hit previously approved these changes Jan 10, 2024

View reviewed changes

[Change] rename func without index

e24739a

megemini dismissed Charles-hit’s stale review via e24739a January 11, 2024 05:35

megemini requested a review from Charles-hit January 11, 2024 09:30

zyfncg approved these changes Jan 11, 2024

View reviewed changes

Charles-hit approved these changes Jan 11, 2024

View reviewed changes

sunzhongkai588 approved these changes Jan 12, 2024

View reviewed changes

jeff41404 approved these changes Jan 12, 2024

View reviewed changes

luotao1 merged commit fcb2137 into PaddlePaddle:develop Jan 12, 2024
29 checks passed



		# ----------------fractional_max_pool2d_with_index----------------
		def fractional_max_pool2d_with_index_wapper(

【Hackathon 5th No.38】为 Paddle 新增 FractionalMaxPool2d / FractionalMaxPool3d API -kernel #59847

【Hackathon 5th No.38】为 Paddle 新增 FractionalMaxPool2d / FractionalMaxPool3d API -kernel #59847

Conversation

megemini commented Dec 8, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Dec 8, 2023

megemini commented Dec 12, 2023

Update 20231212

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Charles-hit commented Dec 18, 2023

Update 20231212

megemini commented Dec 20, 2023

Update 20231220

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Charles-hit Dec 21, 2023 • edited Loading

Choose a reason for hiding this comment

megemini Dec 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Charles-hit commented Dec 20, 2023

Update 20231220

megemini commented Jan 8, 2024

luotao1 commented Jan 9, 2024

megemini commented Jan 9, 2024

xuxinyi389 commented Jan 9, 2024

megemini commented Jan 9, 2024

Charles-hit left a comment • edited by luotao1 Loading

Choose a reason for hiding this comment

megemini commented Jan 9, 2024

Charles-hit commented Jan 9, 2024

megemini commented Jan 10, 2024

Update 20240110

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Charles-hit left a comment • edited Loading

Choose a reason for hiding this comment

megemini commented Jan 11, 2024

Update 20240111

sunzhongkai588 left a comment

Choose a reason for hiding this comment

jeff41404 left a comment

Choose a reason for hiding this comment

jeff41404 commented Jan 12, 2024

megemini commented Jan 12, 2024

megemini commented Dec 8, 2023 •

edited

Loading

Charles-hit Dec 21, 2023 •

edited

Loading

megemini Dec 21, 2023 •

edited

Loading

Charles-hit left a comment •

edited by luotao1

Loading

Charles-hit left a comment •

edited

Loading