Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 5th No.34】为 Paddle 新增 bitwise_right_shift / bitwise_right_shift_ / bitwise_left_shift / bitwise_left_shift_ API -part #58092

Merged
merged 31 commits into from
Jan 10, 2024

Conversation

@cocoshe
Copy link
Contributor Author

cocoshe commented Nov 17, 2023

@xuxinyi389 可以抽空帮忙看看嘛?

为啥本地test可以过,但是在CI-PR3的环境确过不了呢?我好像复现不了这个bug,不知道从哪里可以入手
本地:
image
CI:https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/9540790/job/24456843
image
感谢~

@xuxinyi389
Copy link
Contributor

@xuxinyi389 可以抽空帮忙看看嘛?

为啥本地test可以过,但是在CI-PR3的环境确过不了呢?我好像复现不了这个bug,不知道从哪里可以入手 本地: image CI:https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/9540790/job/24456843 image 感谢~

你写的test在目前过的流水线好像都没运行过,应该不是环境的问题。你本地的测试通过,可以提交一份本地的测试报告。

@cocoshe
Copy link
Contributor Author

cocoshe commented Nov 17, 2023

@xuxinyi389 可以抽空帮忙看看嘛?
为啥本地test可以过,但是在CI-PR3的环境确过不了呢?我好像复现不了这个bug,不知道从哪里可以入手 本地: image CI:https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/9540790/job/24456843 image 感谢~

你写的test在目前过的流水线好像都没运行过,应该不是环境的问题。你本地的测试通过,可以提交一份本地的测试报告。

我看到在PR-CI-Windows-Inference里面是没运行到,但是在PR-CI-Py3里面应该是运行过但是失败了,我在上个commit中加入了一些print用来打印:两个输入、一个输出、一个reference,不太清楚为什么这个ci里面的输出错的离谱的感觉:
比如这个样例,在我本地(编译后安装):

CYU(NO~WH%4GEJL@ERC6CQ
看上去是正常的,但是下面是ci中计算的输出out是这样的
image
这可能是什么原因呢?

@xuxinyi389
Copy link
Contributor

我之前遇到过静态图变量泄漏,因为没有用好静态图的上下文,存在同名变量导致fetch错。但是我看你的动态图测试也没通过。你的测试没有在任何一个CI上通过,应该和环境没关系,你自己再定位下错误。

@cocoshe
Copy link
Contributor Author

cocoshe commented Nov 18, 2023

我之前遇到过静态图变量泄漏,因为没有用好静态图的上下文,存在同名变量导致fetch错。但是我看你的动态图测试也没通过。你的测试没有在任何一个CI上通过,应该和环境没关系,你自己再定位下错误。

试了下是CPUPlace的时候,遇到需要broadcast的时候结果会出现问题(之前本地我都是开的GPU,broadcast正常,我上面将broadcast的case注释掉了,然后就可以过ci)
我看了相关实现应该是在funcs::ElementwiseCompute

我看这里的ElementwiseCompute的描述

// It is a common CPU implementation to compute binary calculation with the
// support of broadcast. Note:
// 1. CPU implementation cannot support the case when x needs broadcast, thus
// this function need to be called with XxxFunctor and XxxInverseFunctor,
// like AddFunctor and InverseAddFunctor.
// 2. The corresponding GPU implementation supports all the broadcast cases,
// thus there is no need to define and call with XxxInverseFunctor.
// TODO(liuyiqun): optimize the CPU implementation to support all broadcast
// cases and avoid the need of XxxInverseFunctor.

bitwise的broadcast应该是在CPU和GPU下有不同的实现?理论上所有bitwise算子应该都进行了自动的broadcast,既然在GPU下的broadcast样例没问题而在CPU下的broadcast样例有问题,那么我感觉我可能是这个地方出了问题。
为这次和以后可能遇到的类似问题,我想请教一下这种python和cpp调用混合在一起的情况下,一般是如何debug的呢?(我一直都是纯python的时候用pdb,然后纯cpp的时候用gdb。但是python中调用_c_ops的时候不知道如何能进去看代码执行情况,所以想问问这种有没有什么推荐的debug方法呢?)

@xuxinyi389
Copy link
Contributor

我之前遇到过静态图变量泄漏,因为没有用好静态图的上下文,存在同名变量导致fetch错。但是我看你的动态图测试也没通过。你的测试没有在任何一个CI上通过,应该和环境没关系,你自己再定位下错误。

试了下是CPUPlace的时候,遇到需要broadcast的时候结果会出现问题(之前本地我都是开的GPU,broadcast正常,我上面将broadcast的case注释掉了,然后就可以过ci) 我看了相关实现应该是在funcs::ElementwiseCompute

我看这里的ElementwiseCompute的描述

// It is a common CPU implementation to compute binary calculation with the
// support of broadcast. Note:
// 1. CPU implementation cannot support the case when x needs broadcast, thus
// this function need to be called with XxxFunctor and XxxInverseFunctor,
// like AddFunctor and InverseAddFunctor.
// 2. The corresponding GPU implementation supports all the broadcast cases,
// thus there is no need to define and call with XxxInverseFunctor.
// TODO(liuyiqun): optimize the CPU implementation to support all broadcast
// cases and avoid the need of XxxInverseFunctor.

bitwise的broadcast应该是在CPU和GPU下有不同的实现?理论上所有bitwise算子应该都进行了自动的broadcast,既然在GPU下的broadcast样例没问题而在CPU下的broadcast样例有问题,那么我感觉我可能是这个地方出了问题。 为这次和以后可能遇到的类似问题,我想请教一下这种python和cpp调用混合在一起的情况下,一般是如何debug的呢?(我一直都是纯python的时候用pdb,然后纯cpp的时候用gdb。但是python中调用_c_ops的时候不知道如何能进去看代码执行情况,所以想问问这种有没有什么推荐的debug方法呢?)

在调用逻辑较为复杂时,为了定位出bug的代码行数,C++的调试可以在源码内添加一些日志输出,然后设置GLOG_v的级别,可以更直观的定位。

@luotao1
Copy link
Contributor

luotao1 commented Nov 21, 2023

请通过下覆盖率测试

@cocoshe
Copy link
Contributor Author

cocoshe commented Nov 28, 2023

@xuxinyi389 查明白之前的问题了,是在CPU下x和y在broadcast的时候,会将x和y中len(shape)更大的作为x,更小的作为y,然后再做broadcast,所以当len(x.shape)比len(y.shape)小的时候,原本的x>>y就变成了y>>x,需要补充InverseXXX来解决(看上去是个历史遗留todo,未来应该会适应)

然后现在遇到一个问题,就是支持的类型是uint8, int8, int16, int32, int64,但是之前调研的JAX中是组合cast和算术位移来实现逻辑位移,然而它的基础类型是numpy,支持uint16, uint32, uint64这种,但是paddle目前cast api并不支持uint16, uint32, uint64int16, int32, int64的转换;我也想过用paddle.where来给有符号负数的位置强行转换为无符号数,也就是符号位先置零(主要是要解决有符号数逻辑右移的情况),然后shift后手动放置符号位,但是发现paddle.where不支持int16,所以感觉在python层实现可能还需要更强的类型支持

然后我尝试在cpp层,想要在底层bitwise计算的时候加入一个bool判断算术位移还是逻辑位移,但是发现funcs::ElementwiseCompute写的比较特定,不太方便拓展出一个bool参数来计算

想问一下有什么建议嘛?emmm感觉有点头大(思考算术位移的必要性emmmm

@luotao1
Copy link
Contributor

luotao1 commented Nov 29, 2023

但是paddle目前cast api并不支持uint16, uint32, uint64与int16, int32, int64的转换;我也想过用paddle.where来给有符号负数的位置强行转换为无符号数,也就是符号位先置零(主要是要解决有符号数逻辑右移的情况),然后shift后手动放置符号位,但是发现paddle.where不支持int16,所以感觉在python层实现可能还需要更强的类型支持

是否可以给cast api和paddle.where 补充下类型支持?

@cocoshe
Copy link
Contributor Author

cocoshe commented Nov 29, 2023

但是paddle目前cast api并不支持uint16, uint32, uint64与int16, int32, int64的转换;我也想过用paddle.where来给有符号负数的位置强行转换为无符号数,也就是符号位先置零(主要是要解决有符号数逻辑右移的情况),然后shift后手动放置符号位,但是发现paddle.where不支持int16,所以感觉在python层实现可能还需要更强的类型支持

是否可以给cast api和paddle.where 补充下类型支持?

ok我尝试一下~

@cocoshe
Copy link
Contributor Author

cocoshe commented Dec 1, 2023

但是paddle目前cast api并不支持uint16, uint32, uint64与int16, int32, int64的转换;我也想过用paddle.where来给有符号负数的位置强行转换为无符号数,也就是符号位先置零(主要是要解决有符号数逻辑右移的情况),然后shift后手动放置符号位,但是发现paddle.where不支持int16,所以感觉在python层实现可能还需要更强的类型支持

是否可以给cast api和paddle.where 补充下类型支持?

看了一下paddle中似乎不支持uint16, uint32, uint64的dtype吗?之前一些类型拓展我看到普遍都是添加对一些复数和bfloat这种类型的支持,其他同学补充的unsigned支持也是唯一支持的unsigned类型(uint8)。想问下想要支持其他unsigned类型有什么可以参考的例子吗?

>>> paddle.to_tensor([1,2,3], dtype=paddle.int16)
Tensor(shape=[3], dtype=int16, place=Place(cpu), stop_gradient=True,
       [1, 2, 3])
>>> paddle.to_tensor([1,2,3], dtype=paddle.uint16)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'paddle' has no attribute 'uint16'. Did you mean: 'int16'?
>>> paddle.to_tensor([1,2,3], dtype=paddle.uint32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'paddle' has no attribute 'uint32'. Did you mean: 'int32'?
>>> paddle.to_tensor([1,2,3], dtype=paddle.uint64)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'paddle' has no attribute 'uint64'. Did you mean: 'int64'?

@xuxinyi389
Copy link
Contributor

@xuxinyi389 查明白之前的问题了,是在CPU下x和y在broadcast的时候,会将x和y中len(shape)更大的作为x,更小的作为y,然后再做broadcast,所以当len(x.shape)比len(y.shape)小的时候,原本的x>>y就变成了y>>x,需要补充InverseXXX来解决(看上去是个历史遗留todo,未来应该会适应)

然后现在遇到一个问题,就是支持的类型是uint8, int8, int16, int32, int64,但是之前调研的JAX中是组合cast和算术位移来实现逻辑位移,然而它的基础类型是numpy,支持uint16, uint32, uint64这种,但是paddle目前cast api并不支持uint16, uint32, uint64int16, int32, int64的转换;我也想过用paddle.where来给有符号负数的位置强行转换为无符号数,也就是符号位先置零(主要是要解决有符号数逻辑右移的情况),然后shift后手动放置符号位,但是发现paddle.where不支持int16,所以感觉在python层实现可能还需要更强的类型支持

然后我尝试在cpp层,想要在底层bitwise计算的时候加入一个bool判断算术位移还是逻辑位移,但是发现funcs::ElementwiseCompute写的比较特定,不太方便拓展出一个bool参数来计算

想问一下有什么建议嘛?emmm感觉有点头大(思考算术位移的必要性emmmm

paddle cast如果支持 uint 向 int转换,在超出int表示范围时会损失精度,这样的转换是不合理的。

@cocoshe
Copy link
Contributor Author

cocoshe commented Dec 1, 2023

@xuxinyi389 查明白之前的问题了,是在CPU下x和y在broadcast的时候,会将x和y中len(shape)更大的作为x,更小的作为y,然后再做broadcast,所以当len(x.shape)比len(y.shape)小的时候,原本的x>>y就变成了y>>x,需要补充InverseXXX来解决(看上去是个历史遗留todo,未来应该会适应)
然后现在遇到一个问题,就是支持的类型是uint8, int8, int16, int32, int64,但是之前调研的JAX中是组合cast和算术位移来实现逻辑位移,然而它的基础类型是numpy,支持uint16, uint32, uint64这种,但是paddle目前cast api并不支持uint16, uint32, uint64int16, int32, int64的转换;我也想过用paddle.where来给有符号负数的位置强行转换为无符号数,也就是符号位先置零(主要是要解决有符号数逻辑右移的情况),然后shift后手动放置符号位,但是发现paddle.where不支持int16,所以感觉在python层实现可能还需要更强的类型支持
然后我尝试在cpp层,想要在底层bitwise计算的时候加入一个bool判断算术位移还是逻辑位移,但是发现funcs::ElementwiseCompute写的比较特定,不太方便拓展出一个bool参数来计算
想问一下有什么建议嘛?emmm感觉有点头大(思考算术位移的必要性emmmm

paddle cast如果支持 uint 向 int转换,在超出int表示范围时会损失精度,这样的转换是不合理的。

辛苦review~,嗯嗯有道理的,那这样看来似乎竞品都没有可以参考的实现方案。我感觉想要复用大部分bitwise那一套代码的话,是不是可以干脆把算术算数位移和逻辑位移拆开(因为额外的bool参数似乎不太方便融合到bitwise那一套体系中)。比如原来的bitwise_right_shift(x, y, is_arithmetic)拆成bitwise_right_arithmetic_shift(x, y)bitwise_right_logic_shift(x, y),然后最后在

https://github.com/PaddlePaddle/Paddle/pull/58092/files#diff-9852da3163fbef8a980df35d58e40d115076ed3eb511b020a76963c03d68a2c6R60-R63

template <typename T>
struct BitwiseRightShiftFunctor {
  HOSTDEVICE T operator()(const T a, const T b) const { return a >> b; }
};

这个地方,分别对uint8和其他int类型进行处理,主要针对有符号数且为负数,且进行算术右移的时候,存一下符号位然后符号位置零,shift之后放置符号位(大致这样)

或者有什么其他比较好的实现方法嘛?希望大佬指点指点~

@xuxinyi389
Copy link
Contributor

@xuxinyi389 查明白之前的问题了,是在CPU下x和y在broadcast的时候,会将x和y中len(shape)更大的作为x,更小的作为y,然后再做broadcast,所以当len(x.shape)比len(y.shape)小的时候,原本的x>>y就变成了y>>x,需要补充InverseXXX来解决(看上去是个历史遗留todo,未来应该会适应)
然后现在遇到一个问题,就是支持的类型是uint8, int8, int16, int32, int64,但是之前调研的JAX中是组合cast和算术位移来实现逻辑位移,然而它的基础类型是numpy,支持uint16, uint32, uint64这种,但是paddle目前cast api并不支持uint16, uint32, uint64int16, int32, int64的转换;我也想过用paddle.where来给有符号负数的位置强行转换为无符号数,也就是符号位先置零(主要是要解决有符号数逻辑右移的情况),然后shift后手动放置符号位,但是发现paddle.where不支持int16,所以感觉在python层实现可能还需要更强的类型支持
然后我尝试在cpp层,想要在底层bitwise计算的时候加入一个bool判断算术位移还是逻辑位移,但是发现funcs::ElementwiseCompute写的比较特定,不太方便拓展出一个bool参数来计算
想问一下有什么建议嘛?emmm感觉有点头大(思考算术位移的必要性emmmm

paddle cast如果支持 uint 向 int转换,在超出int表示范围时会损失精度,这样的转换是不合理的。

辛苦review~,嗯嗯有道理的,那这样看来似乎竞品都没有可以参考的实现方案。我感觉想要复用大部分bitwise那一套代码的话,是不是可以干脆把算术算数位移和逻辑位移拆开(因为额外的bool参数似乎不太方便融合到bitwise那一套体系中)。比如原来的bitwise_right_shift(x, y, is_arithmetic)拆成bitwise_right_arithmetic_shift(x, y)bitwise_right_logic_shift(x, y),然后最后在

https://github.com/PaddlePaddle/Paddle/pull/58092/files#diff-9852da3163fbef8a980df35d58e40d115076ed3eb511b020a76963c03d68a2c6R60-R63

template <typename T>
struct BitwiseRightShiftFunctor {
  HOSTDEVICE T operator()(const T a, const T b) const { return a >> b; }
};

这个地方,分别对uint8和其他int类型进行处理,主要针对有符号数且为负数,且进行算术右移的时候,存一下符号位然后符号位置零,shift之后放置符号位(大致这样)

或者有什么其他比较好的实现方法嘛?希望大佬指点指点~

可以分开实现的。你可以尝试下在python层抽象一层,对用户提供例如bitwise_right_shift(x, y, is_arithmetic)这样的接口,而在底层拆开实现

@cocoshe
Copy link
Contributor Author

cocoshe commented Dec 4, 2023

@xuxinyi389 查明白之前的问题了,是在CPU下x和y在broadcast的时候,会将x和y中len(shape)更大的作为x,更小的作为y,然后再做broadcast,所以当len(x.shape)比len(y.shape)小的时候,原本的x>>y就变成了y>>x,需要补充InverseXXX来解决(看上去是个历史遗留todo,未来应该会适应)
然后现在遇到一个问题,就是支持的类型是uint8, int8, int16, int32, int64,但是之前调研的JAX中是组合cast和算术位移来实现逻辑位移,然而它的基础类型是numpy,支持uint16, uint32, uint64这种,但是paddle目前cast api并不支持uint16, uint32, uint64int16, int32, int64的转换;我也想过用paddle.where来给有符号负数的位置强行转换为无符号数,也就是符号位先置零(主要是要解决有符号数逻辑右移的情况),然后shift后手动放置符号位,但是发现paddle.where不支持int16,所以感觉在python层实现可能还需要更强的类型支持
然后我尝试在cpp层,想要在底层bitwise计算的时候加入一个bool判断算术位移还是逻辑位移,但是发现funcs::ElementwiseCompute写的比较特定,不太方便拓展出一个bool参数来计算
想问一下有什么建议嘛?emmm感觉有点头大(思考算术位移的必要性emmmm

paddle cast如果支持 uint 向 int转换,在超出int表示范围时会损失精度,这样的转换是不合理的。

辛苦review~,嗯嗯有道理的,那这样看来似乎竞品都没有可以参考的实现方案。我感觉想要复用大部分bitwise那一套代码的话,是不是可以干脆把算术算数位移和逻辑位移拆开(因为额外的bool参数似乎不太方便融合到bitwise那一套体系中)。比如原来的bitwise_right_shift(x, y, is_arithmetic)拆成bitwise_right_arithmetic_shift(x, y)bitwise_right_logic_shift(x, y),然后最后在
https://github.com/PaddlePaddle/Paddle/pull/58092/files#diff-9852da3163fbef8a980df35d58e40d115076ed3eb511b020a76963c03d68a2c6R60-R63

template <typename T>
struct BitwiseRightShiftFunctor {
  HOSTDEVICE T operator()(const T a, const T b) const { return a >> b; }
};

这个地方,分别对uint8和其他int类型进行处理,主要针对有符号数且为负数,且进行算术右移的时候,存一下符号位然后符号位置零,shift之后放置符号位(大致这样)
或者有什么其他比较好的实现方法嘛?希望大佬指点指点~

可以分开实现的。你可以尝试下在python层抽象一层,对用户提供例如bitwise_right_shift(x, y, is_arithmetic)这样的接口,而在底层拆开实现

ok明白了

@cocoshe
Copy link
Contributor Author

cocoshe commented Dec 17, 2023

@xuxinyi389 我注意到coverage的ci中cpu下的kernel的四个DEFINE_BITWISE_KERNEL_WITH_INVERSE宏没覆盖到

https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/9736567/job/24767260

然后我尝试在这个宏加了一个log:

#define DEFINE_BITWISE_KERNEL_WITH_INVERSE(op_type)                         \
  template <typename T, typename Context>                                   \
  void Bitwise##op_type##Kernel(const Context& dev_ctx,                     \
                                const DenseTensor& x,                       \
                                const DenseTensor& y,                       \
                                DenseTensor* out) {                         \
    LOG(INFO) << "I am in the .cc kernel!!!!!!";                            \    ######### 打了这一行log ###########
    funcs::Bitwise##op_type##Functor<T> func;                               \
    funcs::InverseBitwise##op_type##Functor<T> inv_func;                    \
    auto x_dims = x.dims();                                                 \
    auto y_dims = y.dims();                                                 \
    if (x_dims.size() >= y_dims.size()) {                                   \
      funcs::ElementwiseCompute<funcs::Bitwise##op_type##Functor<T>, T>(    \
          dev_ctx, x, y, func, out);                                        \
    } else {                                                                \
      funcs::ElementwiseCompute<funcs::InverseBitwise##op_type##Functor<T>, \
                                T>(dev_ctx, x, y, inv_func, out);           \
    }                                                                       \
  }

然后把基类unittest的setup中的place写死成cpu

class TestBitwiseLeftShiftAPI(unittest.TestCase):
    def setUp(self):
        self.init_input()
        self.place = (
            # paddle.CUDAPlace(0)
            # if paddle.is_compiled_with_cuda()
            # else paddle.CPUPlace()
            paddle.CPUPlace()
        )

最后能过test,并且走的是cpu版本的kernel,下面是log:

λ xxxy /home/pd/Paddle/build ctest -R test_bitwise_shift -V
UpdateCTestConfiguration  from :/home/pd/Paddle/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :/home/pd/Paddle/build/DartConfiguration.tcl
Test project /home/pd/Paddle/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 742
    Start 742: test_bitwise_shift_op

742: Test command: /home/cmake-3.18.0-Linux-x86_64/bin/cmake "-E" "env" "PYTHONPATH=/home/pd/Paddle/build/python" "/usr/bin/python" "/home/pd/Paddle/tools/test_runner.py" "test_bitwise_shift_op"
742: Test timeout computed to be: 10000000
742: grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
742: W1217 04:08:01.149598 94391 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.3, Runtime API Version: 12.0
742: W1217 04:08:01.150851 94391 gpu_resources.cc:164] device: 0, cuDNN Version: 8.8.
742: I1217 04:08:01.243474 94391 program_interpreter.cc:214] New Executor is Running.
742: I1217 04:08:01.243816 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.250216 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.258257 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.265053 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.272917 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.278879 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.286705 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.292797 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.301602 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.307641 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.315480 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.321446 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.329121 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.335136 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.343986 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.350009 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.357797 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.363785 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.371469 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.380086 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.390872 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.396921 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.404939 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.410903 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.418599 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.424595 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.433780 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.439796 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.447381 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.453291 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.460729 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.466603 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.475275 94391 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
742: I1217 04:08:01.481164 94391 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
1/1 Test #742: test_bitwise_shift_op ............   Passed    5.06 sec

The following tests passed:
        test_bitwise_shift_op

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   5.24 sec

此外,我也安装了whl包验证了一下:

λ xxxy /home/pd/Paddle/build python
Python 3.9.18 (main, Aug 25 2023, 13:20:04) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
>>> x = paddle.to_tensor([], dtype=paddle.int64)
>>> x = paddle.to_tensor([10,20,30], dtype=paddle.int64, place=paddle.CPUPlace())
>>> y = paddle.to_tensor([1,2,3], dtype=paddle.int64, place=paddle.CPUPlace())
>>> paddle.bitwise_left_shift(x, y, is_arithmetic=True)
I1217 04:15:22.898773 94440 bitwise_kernel.cc:65] I am in the .cc kernel!!!!!!
Tensor(shape=[3], dtype=int64, place=Place(cpu), stop_gradient=True,
       [20 , 80 , 240])
>>> paddle.bitwise_left_shift(x, y, is_arithmetic=False)
I1217 04:15:30.973914 94440 bitwise_kernel.cc:66] I am in the .cc kernel!!!!!!
Tensor(shape=[3], dtype=int64, place=Place(cpu), stop_gradient=True,
       [20 , 80 , 240])
>>> paddle.bitwise_right_shift(x, y, is_arithmetic=True)
I1217 04:16:38.313091 94440 bitwise_kernel.cc:67] I am in the .cc kernel!!!!!!
Tensor(shape=[3], dtype=int64, place=Place(cpu), stop_gradient=True,
       [5, 5, 3])
>>> paddle.bitwise_right_shift(x, y, is_arithmetic=False)
I1217 04:16:45.778152 94440 bitwise_kernel.cc:68] I am in the .cc kernel!!!!!!
Tensor(shape=[3], dtype=int64, place=Place(cpu), stop_gradient=True,
       [5, 5, 3])

也是有glog信息的,不知道为啥coverage的ci说没覆盖到这四个宏。。。

另附符号数逻辑右移例子,以int8为例,-20补码1110,1100,逻辑右移两位0011,1011为59的补码

>>> x = paddle.to_tensor([10,-20,30], dtype=paddle.int8, place=paddle.CPUPlace())
>>> y = paddle.to_tensor([1,2,3], dtype=paddle.int8, place=paddle.CPUPlace())
>>> paddle.bitwise_right_shift(x, y, is_arithmetic=False)
I1217 04:29:36.087157 94440 bitwise_kernel.cc:68] I am in the .cc kernel!!!!!!
Tensor(shape=[3], dtype=int8, place=Place(cpu), stop_gradient=True,
       [5 , 59, 3 ])

@cocoshe
Copy link
Contributor Author

cocoshe commented Dec 18, 2023

@xuxinyi389 对另外funcs/bitwise_functors.h下的coverage检测没覆盖的问题,我也尝试了一下,在一系列Functor中加了LOG,如下:

template <typename T>
struct BitwiseLeftShiftArithmeticFunctor {
  HOSTDEVICE T operator()(const T a, const T b) const { 
    LOG(INFO) << "This is BitwiseLeftShiftArithmeticFunctor";               // line: 53
    return a << b; }
};

template <typename T>
struct InverseBitwiseLeftShiftArithmeticFunctor {
  inline HOSTDEVICE T operator()(const T a, const T b) const {     
    LOG(INFO) << "This is BitwiseLeftShiftArithmeticFunctor";
return b << a; }
};

template <typename T>
struct BitwiseLeftShiftLogicFunctor {
  HOSTDEVICE T operator()(const T a, const T b) const {     
    LOG(INFO) << "This is BitwiseLeftShiftArithmeticFunctor";
return a << b; }
};

然后安装whl包测试:

>>> x
Tensor(shape=[2, 3], dtype=uint8, place=Place(cpu), stop_gradient=True,
       [[8 , 12, 8 ],
        [2 , 12, 3 ]])
>>> y
Tensor(shape=[2, 3], dtype=uint8, place=Place(cpu), stop_gradient=True,
       [[1, 1, 4],
        [3, 3, 1]])
>>> paddle.bitwise_right_shift(x,y)
Tensor(shape=[2, 3], dtype=uint8, place=Place(cpu), stop_gradient=True,
       [[4, 6, 0],
        [0, 1, 1]])
>>> paddle.bitwise_left_shift(x,y)
I1218 01:42:49.461642 181565 bitwise_functors.h:53] This is BitwiseLeftShiftArithmeticFunctor
I1218 01:42:49.461676 181565 bitwise_functors.h:53] This is BitwiseLeftShiftArithmeticFunctor
I1218 01:42:49.461685 181565 bitwise_functors.h:53] This is BitwiseLeftShiftArithmeticFunctor
I1218 01:42:49.461694 181565 bitwise_functors.h:53] This is BitwiseLeftShiftArithmeticFunctor
I1218 01:42:49.461701 181565 bitwise_functors.h:53] This is BitwiseLeftShiftArithmeticFunctor
I1218 01:42:49.461710 181565 bitwise_functors.h:53] This is BitwiseLeftShiftArithmeticFunctor
Tensor(shape=[2, 3], dtype=uint8, place=Place(cpu), stop_gradient=True,
       [[16 , 24 , 128],
        [16 , 96 , 6  ]])
>>> 

这个53行,也就是这个Functor的内部,是走到了的,不明白为什么coverage没检测到。

辛苦师傅帮忙review一下,hackathon 22号前ddl,想赶一赶尽量merge~

@cocoshe
Copy link
Contributor Author

cocoshe commented Dec 18, 2023

(欸好像coverage的CI突然×变√了,辛苦review~

)

def init_input(self):
self.x = np.random.randint(1, 20, [2, 3]).astype('uint8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low=1,high=20。这样取值似不合理,应该取dtype所能表示的上下限更合理。


class TestBitwiseLeftShiftAPI_UINT8(TestBitwiseLeftShiftAPI):
def init_input(self):
self.x = np.random.randint(1, 20, [2, 3]).astype('uint8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

class TestDygraphInplaceBitwiseRightShift_logic(TestDygraphInplaceLogicAnd):
def init_data(self):
self.input_var_numpy = paddle.randint(
low=0, high=10, shape=[3, 4, 5], dtype="int32"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

with self.assertRaises(ValueError):
self.inplace_api_processing(broadcast_input)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最终的设计细节与RFC有差异的地方,同步更新到RFC

@xuxinyi389
Copy link
Contributor

(欸好像coverage的CI突然×变√了,辛苦review~
可能是现有coverage工具对于预处理阶段的处理逻辑代码存在不足,已经豁免

@xuxinyi389
Copy link
Contributor

所有的low,high记得同步修改,不只我评论的那几处

@cocoshe
Copy link
Contributor Author

cocoshe commented Dec 20, 2023

看CI过了,之前是发现如果直接位移的话,cpu和gpu下的位移结果有几率不一致,本地gpu测的可以和np的接口对齐,但是试了下换成cpu在左移的时候,如果移动位数很大,发现会偶尔自动发生取模优化(例如uint8左移20bit的时候,实际会左移(20%8=4)位),而gpu下的结果一直都是0,让它溢出。

另外如果移动的数y为负数时,在cpp中是一种"未定义的行为",实际情况可能会被强转成unsigned的类型?比如a << (unsigned int)-5表示a << 0xFFFFFFFB(不过同样在cpu上的问题,无法对齐;在gpu上遇到y为负数时,左移时置零,右移时取决x的符号位,可以对齐)

The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

所以为了和np的接口对齐和cpu、gpu结果一致,后来加的代码对这两方面在Functor中额外进行了一些约束。辛苦review~
@xuxinyi389

@xuxinyi389
Copy link
Contributor

LGTM

Comment on lines 346 to 364
- op : bitwise_left_shift_arithmetic
args : (Tensor x, Tensor y)
output : Tensor(out)
infer_meta :
func : ElementwiseInferMeta
kernel :
func : bitwise_left_shift_arithmetic
backend : x
inplace: (x -> out)

- op : bitwise_left_shift_logic
args : (Tensor x, Tensor y)
output : Tensor(out)
infer_meta :
func : ElementwiseInferMeta
kernel :
func : bitwise_left_shift_logic
backend : x
inplace: (x -> out)
Copy link
Contributor

@jeff41404 jeff41404 Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the C++ operator development specifications, the naming and parameters of operator need to be consistent with Python API, so that operator can reuse documentation of Python API. When user writes C++ code directly using operator, it will be relatively simple. So the operator needs to be bitwise_left_shift with is_arithmetic in args, kernels in file paddle/phi/kernels/bitwise_kernel.h should also haveis_arithmetic in args. and it's best if Functor in paddle/phi/kernels/funcs/bitwise_functors.h can be unified, but it's okay if it can't be unified.

Comment on lines 387 to 406
- op : bitwise_right_shift_arithmetic
args : (Tensor x, Tensor y)
output : Tensor(out)
infer_meta :
func : ElementwiseInferMeta
kernel :
func : bitwise_right_shift_arithmetic
backend : x
inplace: (x -> out)

- op : bitwise_right_shift_logic
args : (Tensor x, Tensor y)
output : Tensor(out)
infer_meta :
func : ElementwiseInferMeta
kernel :
func : bitwise_right_shift_logic
backend : x
inplace: (x -> out)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the C++ operator development specifications, the naming and parameters of operator need to be consistent with Python API, so that operator can reuse documentation of Python API. When user writes C++ code directly using operator, it will be relatively simple. So the operator needs to be bitwise_right_shift with is_arithmetic in args, kernels in file paddle/phi/kernels/bitwise_kernel.h should also haveis_arithmetic in args. and it's best if Functor in paddle/phi/kernels/funcs/bitwise_functors.h can be unified, but it's okay if it can't be unified.

Comment on lines 126 to 164
PD_REGISTER_KERNEL(bitwise_left_shift_arithmetic,
CPU,
ALL_LAYOUT,
phi::BitwiseLeftShiftArithmeticKernel,
uint8_t,
int8_t,
int16_t,
int,
int64_t) {}

PD_REGISTER_KERNEL(bitwise_left_shift_logic,
CPU,
ALL_LAYOUT,
phi::BitwiseLeftShiftLogicKernel,
uint8_t,
int8_t,
int16_t,
int,
int64_t) {}

PD_REGISTER_KERNEL(bitwise_right_shift_arithmetic,
CPU,
ALL_LAYOUT,
phi::BitwiseRightShiftArithmeticKernel,
uint8_t,
int8_t,
int16_t,
int,
int64_t) {}

PD_REGISTER_KERNEL(bitwise_right_shift_logic,
CPU,
ALL_LAYOUT,
phi::BitwiseRightShiftLogicKernel,
uint8_t,
int8_t,
int16_t,
int,
int64_t) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After unification, registration can be reduced to bitwise_right_shift and bitwise_left_shift

Comment on lines 7168 to 7171
if is_arithmetic:
return _C_ops.bitwise_left_shift_arithmetic(x, y)
else:
return _C_ops.bitwise_left_shift_logic(x, y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After operator(C++ API) of bitwise_left_shift have parameter is_arithmetic , No need for this if, the code is simpler and clearer

return _C_ops.bitwise_left_shift_arithmetic(x, y)
else:
return _C_ops.bitwise_left_shift_logic(x, y)
if is_arithmetic:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue as above

>>> paddle.bitwise_left_shift(x, y)
Tensor(shape=[2, 4], dtype=int64, place=Place(gpu:0), stop_gradient=True,
[[2 , 8 , 32 , 128],
[64 , 136, 128, 130]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example code needs to provide the case when is_arithmetic = False to facilitate user understanding

>>> paddle.bitwise_right_shift(x, y)
Tensor(shape=[2, 4], dtype=int64, place=Place(gpu:0), stop_gradient=True,
[[5 , 5 , 5 , 5 ],
[4 , 2 , 8 , 32]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example code needs to provide the case when is_arithmetic = False to facilitate user understanding

@cocoshe
Copy link
Contributor Author

cocoshe commented Jan 8, 2024

@jeff41404 Thanks for your review and instruction, I refactored the code, and put the if judgement to a more deeper cpp space instead of python api.

Because of the limitation of funcs::ElementwiseCompute, we can't add more args(for example, is_arithmetic here) to Functor if we want to reuse the code. I think the paddle/phi/kernels/funcs/bitwise_functors.h can't be unified.

@jeff41404
Copy link
Contributor

@jeff41404 Thanks for your review and instruction, I refactored the code, and put the if judgement to a more deeper cpp space instead of python api.

Because of the limitation of funcs::ElementwiseCompute, we can't add more args(for example, is_arithmetic here) to Functor if we wan't to reuse the code. I think the paddle/phi/kernels/funcs/bitwise_functors.h can't be unified.

OK, thanks

jeff41404
jeff41404 previously approved these changes Jan 9, 2024
Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

python/paddle/tensor/math.py Outdated Show resolved Hide resolved
python/paddle/tensor/math.py Outdated Show resolved Hide resolved
python/paddle/tensor/math.py Outdated Show resolved Hide resolved
python/paddle/tensor/math.py Outdated Show resolved Hide resolved
python/paddle/tensor/math.py Outdated Show resolved Hide resolved
python/paddle/tensor/math.py Outdated Show resolved Hide resolved
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
cocoshe and others added 5 commits January 9, 2024 19:24
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for docs

@luotao1 luotao1 merged commit b622e96 into PaddlePaddle:develop Jan 10, 2024
29 checks passed
@luotao1 luotao1 changed the title 【Hackathon 5th No.34】为 Paddle 新增 bitwise_right_shift / bitwise_right_shift_ / bitwise_left_shift / bitwise_left_shift_ API 【Hackathon 5th No.34】为 Paddle 新增 bitwise_right_shift / bitwise_right_shift_ / bitwise_left_shift / bitwise_left_shift_ API -part Jan 10, 2024
@cocoshe cocoshe deleted the bitwise_shift_coco_dev branch January 10, 2024 03:54
Xinyu302 added a commit to Xinyu302/Paddle that referenced this pull request Jan 15, 2024
* [DimExpr] DimExpr support hash (PaddlePaddle#60471)

* open warning with `paddle.utils.deprecated` (PaddlePaddle#60458)

* open_warning

* update unittest

* update

* fix typos

* fix warning in test runner

* uncomment

* cleanup todo

* using VisibleDeprecationWarning

* update comment

* fix typo

* fix indentation

* fix

* fix

* fix indent level and test

* update

---------

Co-authored-by: SigureMo <sigure.qaq@gmail.com>

* [AutoParallel] Auto Trans PP to VPP (PaddlePaddle#60467)

* [AutoParallel] Auto Trans PP to VPP

* add comment

* 【PIR OpTest Fix No.23】 fix test_distribute_fpn_proposals_op (PaddlePaddle#60335)

* fix

* fix

* fix  test_lookup_table_v2_bf16_op (PaddlePaddle#60332)

* Fix shape error in combined-indexing setitem (PaddlePaddle#60447)

* add ut

* fix shape error in combine-indexing

* fix ut

* [auto parallel] Add pp lazy init, bug fix for xavier (PaddlePaddle#60441)

* [PIR] add slice_array_dense api (PaddlePaddle#60433)

* fix

* fix

* Set value with scalar (PaddlePaddle#60452)

* set_value with scalar

* fix ut

* [PIR]Support custom op in PIR (PaddlePaddle#59790)

* support custom op in pir

* fix compile bugs

* fix bugs

* delete code

* fix windows bugs

* fix windows bugs

* add symbol to paddle lib

* fix windows bugs

* revert code

* fix bugs

* fix bugs

* perfect code according comment

* fix py3

* revert third party

* fix bugs

* fix bug

* fix compile bugs

* fix windows

* [Prim][PIR] support roll, gather, scatter, scatter_nd_add op backward in pir prim (PaddlePaddle#60481)

* prim gather op backward

* prim scatter op backward

* prim roll op backward

* prim scatter_nd op backward

* [PIR] delete dense_tensor mem_desc_ (PaddlePaddle#60024)

* delete dense_tensor mem_desc_

* [PIR] Complement op defs (PaddlePaddle#60475)

* complement translation of legacy matmul
* Complement op mappings in translation for deformable_conv_v1.

* [pir]Supporting constant_folding_pass for train (PaddlePaddle#60355)

* [pir]Supporting constant_folding_pass for train

* fix

* Update constant_folding_pass.cc

* [Dynamic Shape] Fuse shape ops into generate shape op pass (PaddlePaddle#60490)

* add shape.generate_shape op

* rename shape.generate_shape to cinn_op.generate_shape

* refactor GenerateShapeOp::SymbolBinding

* move GenerateShapeOp related helper functions into generate_shape_util.cc

* minor fix

* minor fix

* backup

* refine signature of ConvertDimExprToAttribute

* minor fix for signature of ConvertDimExprToAttributes

* remove SubstituteDimExpr from generate_shape_util.h

* Fix compile error

* Fix unittest compile error

* Code format

* Code format

* Fix _hiden_size to _hidden_size (PaddlePaddle#60485)

* [DimExpr] Add substitute DimExpr util (PaddlePaddle#60493)

* add SubstituteDimExpr

* Fix compile error

* Code format

* Polish DimExprUtilTest

* Change namesapce

* Fix unittest

* Polish DimExprUtilTest

* [xpu]add sine_pos fuse pass and sine_pos xpu kernel (PaddlePaddle#60025)

* add split with variable in factors and rewrite vectorize,unroll,bind error handling mechanism (PaddlePaddle#60449)

* [CodeStyle] Fix regression of Ruff in sot (PaddlePaddle#60483)

* support cast op from FP32 to low precision (PaddlePaddle#60385)

* test=document_fix (PaddlePaddle#60399)

* [XPU] refine flash attention ut (PaddlePaddle#60474)

* [XPU] refine flash attention ut

* refine tolerance

* [Inference] support collect shape in sub block (PaddlePaddle#60451)

* support collect shape in sub block

* udpate

* udpate

* fix process mesh incorrect set in converter (PaddlePaddle#60504)

* 【CMake opt No.13】Remove CINN DEPS in test/cpp/pir/shape_dialect/CMakeLists.txt	 (PaddlePaddle#60517)

* Update CMakeLists.txt

* Apply suggestions from code review

* Apply suggestions from code review

* Update CMakeLists.txt

* Update CMakeLists.txt

* 【pir】 add tensorarray op createarrylike, add_n (PaddlePaddle#60460)

* optimize backward

* [PIR] add vjp interface for while op

* [PIR] fix ci error.

* modify while stopgradient

* merge

* modify while grad bug

* modify while grad op

* modify

* increment vp

* [PIR] add get_used_external_value interface for block.

* while case

* delete print

* delete print

* Update python/paddle/autograd/ir_backward.py

* [PIR] add unit_test for get_used_external_value

* modify while_loop

* code_style

* modofy ci bug

* modify while api

* modify ci

* modify array

* Update python/paddle/autograd/ir_backward.py

* Update test/legacy_test/test_cond.py

* update

* modify array_write grad info

* merge

* add_n and createarraylike

* conflict

* modify exe bug

* modify kernel choose

---------

Co-authored-by: winter-wang <1030748926@qq.com>

* Add align iter space tactic (PaddlePaddle#60498)

Add align iter space tactic

* [Dynamic Shape] Add helper function MakeGenerateShapeOpAttribute (PaddlePaddle#60512)

* add helper function MakeGenerateShapeOpAttribute

* fix complier complaint

* Code format

* [Prim][PIR] Set prim gflag for pure cpp (PaddlePaddle#60505)

* inference support decomp

* polish code

* add decomp base define

* add decomp base define2

* change decomp infer

* fix symbol overload

* fix test case

* debug

* debug

* decomp add debug info

* add cpp flag

* revert

* remove unused flag

* [PIR] Refine and fix pir exe (PaddlePaddle#60443)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* update 2023 security advisory, test=document_fix (PaddlePaddle#60527)

* [Inference] refine common/*.h for inference lib (PaddlePaddle#60513)

* 【complex op】No.19 add complex support for triangular_solve (PaddlePaddle#59529)

* fix reshard dist_attr (PaddlePaddle#60535)

* 【auto parallel】剔除切分推导相关的头文件对proto 的依赖 (PaddlePaddle#60543)

* decouple proto

* format

* format

* strcuct pre def

* [PIR] Support Operation::Clone Interface (PaddlePaddle#60536)

* [PIR] Support Operation::Clone Interface

* modify into shared_ptr

* [Dynamic Shape] Add FullyInsertBroadcastPass and Broadcast Op (PaddlePaddle#60511)

* add ShapeBroadcastOp

* add pass FullyInsertBroadcastPass

* InferSymbolicShape of BroadcastShape Op

* Delete unit test

* Fix return error

* Code format

* Fix error message

* Update paddle/cinn/hlir/dialect/operator/transforms/fully_insert_broadcast_pass.cc

Co-authored-by: Bo Zhang <105368690+zhangbopd@users.noreply.github.com>

---------

Co-authored-by: Bo Zhang <105368690+zhangbopd@users.noreply.github.com>

* Fix OpTranslatorTest name (PaddlePaddle#60518)

* fix name

* fix name

* fix name

* fix name

* [PIR] migrate DataFeeder into pir (PaddlePaddle#60434)

* 【PIR API adaptor No.90,92】Migrate some ops into pir (PaddlePaddle#59801)

* [DimExpr] Convert Broadcast to BroadcastTree (PaddlePaddle#60440)

* backup BroadcastTree

* add SubstituteDimExpr

* add helper function ConstructBroadcastTree

* Fix compile error

* Code format

* Polish DimExprUtilTest

* Add cmake file

* Change namesapce

* Fix compile error

* Fix unittest

* reconstruct BroadcastTree

* Polish DimExprUtilTest

* Reconstruct BroadcastTree

* Finish BroadcastBranch

* Finish BroadcastBranch

* Finish BroadcastBranch

* Add Unittest

* Remove unnecessary dim_expr_util

* Add header file

* [Dynamic Shape] Erase expand (PaddlePaddle#60525)

* EraseExpandOp

* minor fix

* minor fix

* Code format

* [inference] Support wint4 groupwise with cutlass gemm (PaddlePaddle#60422)

* support gemv-groupwise func && weightQuanter-groupwise && weightDeQuanter-groupwise

* fix build bug

* add unit_test && fix bug

* delete useless code

* fix ci build bug

* fix ci && optimize

* fix merge conflict

* add op change info

* fix weight_only_linear_pass

* fix format

* solve ci unit_test

* init

* support cutlass gemm with groupwise

* add unit test

* fix strange bug

* delete random bug

* fix sm70 build bug

* try to fix ci build bug

* fix bug

* fix volta build bug

* skip sm70 in groupwise mode

* change cutlass branch

* simplify extent of loop after fuse and add corresponding test case (PaddlePaddle#60538)

* fix bug of put_along_axis (PaddlePaddle#60551)

* remove clearPass to allow custom device use fusion under fp16 (PaddlePaddle#60541)

* fix fleetutil get_online_pass_interval bug2; test=develop (PaddlePaddle#60544)

* fix vs2017 limit (PaddlePaddle#60528)

* 【Hackathon 5th No.20】为 Paddle 新增 Exponential 和 Gamma API (PaddlePaddle#57899)

* add exponential

* add gamma distribution

* refine docs

* add kl_divergence and test

* resolve conflicts

* resolve conflicts

* fix bug

* refine test

* fix test timeout

* refine code

* add standard_gamma kernel

* fix comments

* fix tests

* fix tests

* fix comments

* fix tests

* fix gamma grad

* fix yaml

* fix bugs

* fix tests

* fix standard_gamma_grad

* fix test

* fix test

* add cdf & icdf

* add cdf & icdf

* refine comments

* fix

* fix

* fix head file

* fix

* fix cuda op

* fix

* fix

* refine test

* fix test

* refine comments

* fix comments

* fix

* fix

* fix type check

* fix docs

* delete useless comments

* [CINN] Add IntrinsicOps into ir_codes_collector (PaddlePaddle#60556)

This PR fixed a bug of running Resnet PaddleClas.

The bug is due to vectorize introduce an intrinsic GetAddr and we didn't collect the tensor of GetAddr in ir_node_collector, this would caused tensor alias won't create in cuda code.

TODO: we may modify IntrinsicOp in the near future

* 【auto parallel】custom op  spmd rule register  (PaddlePaddle#60509)

* custom op spmd rule register

* custom op spmd rule register

* custom op spmd rule register

* custom op spmd rule register

* polish

* 【AutoParallel】Add master grad in AMP-O2 of AutoParallel (PaddlePaddle#59987)

* add master_grad in auto-parallel

* reset third_party

* fix coverage

* support bf16 master_grad

* fix bug in master_grad

* change code according to review

* change the way to find optimizer op

* [Dy2St] Fix `NameloadJstTransformer` missing transform call kwargs (PaddlePaddle#60515)

---------

Co-authored-by: gouzil <66515297+gouzil@users.noreply.github.com>

* cinn(backends): generate infer shape kernel to infer shape of output tensor (PaddlePaddle#60519)

通过二维指针来返回后端infer shape的结果。生成的cinn ir如下。tensor_shape_args是一个二维指针。 infer_shape_set_value(0, 0, S1, tensor_shape_args) 表示将第0个output tensor的第0维设置为S1。

* fix tensor math method inplace converter (PaddlePaddle#60546)

* [xpu]Add vis_decoder_attention_xpu_pass && modify qkv_attention_xpu_kernel (PaddlePaddle#60361)

* [Prim][PIR] support abs, instance_norm op backward in prim pir (PaddlePaddle#60444)

* abs op backward

* add test case

* update code

* update code

* update code

* update code

* update code

* instance_norm op backward

* add instance_norm_v2 test cast

* custom op

* [PIR] remove log simply name mechnism from phi to common. (PaddlePaddle#60507)

* [InferSymbolicShape] Delete redundent value_id_to_shapeordata_ (PaddlePaddle#60554)

* 【Hackathon 5th No.25】add gammaln api (PaddlePaddle#60553)

* fix (PaddlePaddle#60570)

* [CINN] Add tile tactic and bind cuda tactic (PaddlePaddle#60534)

* [CINN] Add tile tactic

* [CINN] Add bind cuda tactic

* 【PIR OpTest Fix No.8】 fix test_shuffle_batch_op (PaddlePaddle#59631)

* fix test_shuffle_batch_op

* fix

* 【PIR OpTest Fix No.14】 fix test_nce (PaddlePaddle#60255)

* fix test_nce

* fix test_nce

* Update ops.yaml

* fix

* Update utils.cc

* Update ops.yaml

* 【PIR OpTest Fix No.19】 fix test_ftrl_op (PaddlePaddle#60329)

* fix test_ftrl_op

* fix

* [auto parallel] Lazy init for MP. Add reshard infer shape. (PaddlePaddle#60563)

* [PIR] Add unittest for Operation::Clone and Group::Clone (PaddlePaddle#60577)

* [PIR] dce pass disable custom op (PaddlePaddle#60578)

* [Inference] Fix bug of RunWithExternalStream API in new executor (PaddlePaddle#60122)

* fix bug of RunWithExternalStream API in new executor

* add test

* fix bug of RunWithExternalStream API in new executor

* reset flage in RunWithExternalStream

* fix bug

* add param swith_stream

* fix bug

* modify python api

* fix bug

* Resubmit PR-58859 (PaddlePaddle#60310)

* allow multiple rng state in generator

* Fix 60142; Fix some comments from sneaxiy

* Overwrite copy constructors

* add api

* pre-commit

* tensor_array slice in PIR (PaddlePaddle#60503)

* use slice_array, now will meet error of destory opresult still in use

* disable the pir test until the bug fixed

* Set DistModel state_dict keys to structure_names (PaddlePaddle#60478)

* exclude xpu

* check structure name mapping

* test pp

* polish

* support dynamic save static load

* support dygraph save static load

* polish

* polish

* use structured_name as key in DistModel state_dict

* polish

* polish

* fix checkpoint path conflict

* test get_rank_to_files

* static save dynamic load test

* fix sm75 build bug (PaddlePaddle#60583)

* replace LOG(INFO) with VLOG(6)

* Add CanProveDivisible for symbolic calculation (PaddlePaddle#60572)

* add CanProveDivisible for symbolic calculation

* delete extra cout for debug

* fix according to some comments

* [PIR][DynamicShape] make shape pass default and fix some bugs (PaddlePaddle#60548)

att, make shape pass default and fix some bugs

* Fix words (PaddlePaddle#60603)

* 【auto parallel】custom op use spmd rule (PaddlePaddle#60571)

* custom op use smpd rule

* custom op use smpd rule

* [auto parallel] add lazy init ut to llama (PaddlePaddle#60585)

* 【pir】 modify array_write and array_read vjp , add a simple while with array_write (PaddlePaddle#60575)

* optimize backward

* [PIR] add vjp interface for while op

* [PIR] fix ci error.

* modify while stopgradient

* merge

* modify while grad bug

* modify while grad op

* modify

* increment vp

* [PIR] add get_used_external_value interface for block.

* while case

* delete print

* delete print

* Update python/paddle/autograd/ir_backward.py

* [PIR] add unit_test for get_used_external_value

* modify while_loop

* code_style

* modofy ci bug

* modify while api

* modify ci

* modify array

* Update python/paddle/autograd/ir_backward.py

* Update test/legacy_test/test_cond.py

* update

* modify array_write grad info

* merge

* add_n and createarraylike

* conflict

* modify array_write vjp

* modify array_write vjp

* Update paddle/fluid/pybind/manual_static_op_function.h

* modify array_write vjp

* modify ci bug

* modify

* modify

* Update test/legacy_test/test_while_loop_op.py

* modify inplace array_read

* Update test/legacy_test/test_while_op.py

* Update test/ir/pir/test_while_api.py

---------

Co-authored-by: winter-wang <1030748926@qq.com>

* [Prim][PIR] add leaky_relu, sigmoid, instance_norm op forward prim  (PaddlePaddle#60564)

* hardswish op prim sink

* hardswish op prim

* add composite

* add leaky_relu, sigmoid op forward prim

* remove hardswish op forward

* add instance_norm op forward prim

* [CINN]Add bucket context (PaddlePaddle#60549)

* [CINN] Add tile tactic

* [CINN] Add bind cuda tactic

* [CINN] Add bucket contexts

* fix group output args bug

* Add CUDNNv8 max pooling (PaddlePaddle#59413)

* Add CUDNNv8 version of pool2d

* Minor fix

* Fix build failure

* Remove dygraph API

* Fix CI failure

* Fix CI failure

* Fix timeout

* Fix timeout

* Add comments

* Minor fix

* update lbfgs to avoid the randomness caused by paddle.dot() temporarily (PaddlePaddle#60591)

* update lbfgs to avoid the randomness caused by paddle.dot() temporarily

* add note

* set_pir_tests_properties for some tests (PaddlePaddle#60401)

* fix

* Update CMakeLists.txt

* Update pir_op_test_white_list

* Update pir_op_test_white_list

* Update pir_op_test_white_list

* Add tests to whitelist (PaddlePaddle#60522)

* fix

* add

* fix double grad without convert inplace (PaddlePaddle#60614)

* fix fleetutil get_online_pass_interval bug3 (PaddlePaddle#60615)

* fix fleetutil get_online_pass_interval bug3; test=develop

* fix fleetutil get_online_pass_interval bug3; test=develop

* fix fleetutil get_online_pass_interval bug3; test=develop

* [PIR][DynamicShape] Add an example for broadcast in dynamic shape infer (PaddlePaddle#60608)

* Add an example for broadcast in dynamic shape infer

* fix_convert_all_blocks (PaddlePaddle#60613)

* fix_convert_all_blocks

* [Paddle-TRT] support set_value dynamic shape (PaddlePaddle#60508)

[Paddle-TRT] support set_value dynamic shape (PaddlePaddle#60508)

* fix (PaddlePaddle#60625)

* [PIR] Support Region Clone in Operation::Clone (PaddlePaddle#60590)

* deg2rad test passed (PaddlePaddle#60619)

* [PIR+CINN]Fix Pool2d Variant Attibute for kernel_size (PaddlePaddle#60623)

* [PIR+CINN]Fix Pool2d Variant Attibute for kernel_size

* fix padding_size

* fix pooling_type

* [SOT] move_gpu_pinned_to_gpu (PaddlePaddle#60395)

* PIR API adaptor No.35、40】 Migrate paddle.nn.ChannelShuffle/ClipGradByNorm into pir (PaddlePaddle#60445)

* fix some bugs

* fix bugs

* Update clip.py

* Update test_channel_shuffle.py

* Update test_clip_by_norm_op.py

* Update test_clip_by_norm_op.py

* add param name for dist_tensor parameter (PaddlePaddle#60574)

* Fix (PaddlePaddle#60631)

* [PIR] Reify InferSymbolicShapeInterface (PaddlePaddle#60438)

* Reify InferSymbolicShapeInterface

* [Dynamic Shape] Remove ShapeBroadcastOp redundant codes (PaddlePaddle#60609)

* [Dy2St] fix `test_grad` in PIR mode (PaddlePaddle#60621)


---------

Co-authored-by: xiaoguoguo626807 <100397923+xiaoguoguo626807@users.noreply.github.com>

* reconstruct llama ci cases (PaddlePaddle#60637)

* 【AutoParallel】Unify the fp16 and bf16 in auto-parallel (PaddlePaddle#60514)

* unify the fp16 and bf16

* change white_list in AMP

* add dtype support

* fix bug in dtype

* [Dynamic Shape] Add SplitGenerateShapeIntoShapeOpsPass (PaddlePaddle#60624)

* [Dynamic Shape] Add SplitGenerateShapeIntoShapeOpsPass

* Fix compile error

* Fix compile error

* update pdsa-2023-019, test=document_fix (PaddlePaddle#60646)

* [SOT] sot export test files (PaddlePaddle#60547)

* Improve the performence of put_along_axis (PaddlePaddle#60618)

* fix bug of put_along_axis

* improve performence of put_along_axis

* [AutoParallel] Fit vpp for gradient_merge pass (PaddlePaddle#60560)

* add dist attr

* add op namescope

* add test_semi_auto_parallel_hybrid_strategy (PaddlePaddle#60537)

* [PIR]Open uts for AdaptiveAvgPool3D (PaddlePaddle#60636)

* test (PaddlePaddle#60654)

* [CINN] Add OptimizeReductionTactic (PaddlePaddle#60661)

* [Paddle-Trt]update set_value cmakelist (PaddlePaddle#60664)

[Paddle-Trt]update set_value cmakelist

* [auto parallel] fix reshape infer shape (PaddlePaddle#60632)

* [CINN+PIR]Clean Old GroupScheduler logic and switch into new_group_scheduler (PaddlePaddle#60642)

* [CINN]Fix HasDynamicShape Bug while Type is NULL (PaddlePaddle#60658)

* [PIR] pir onednn support legact istruction and lrn (PaddlePaddle#60502)

* pir onednn support legact istruction and lrn

* c_softmax_with_cross_entropy support bf16 for xpu (PaddlePaddle#60472)

* enable custom device to use silu_fuse_pass (PaddlePaddle#60595)

move SetUseCustomDevice to all platform

* [XPU] add empty_like op and test, update XHPC to 20240105 (PaddlePaddle#60617)

* [XPU] update XHPC date and refine FA ut (PaddlePaddle#60598)

* [XPU] update XHPC date

* update comments for ut

* correct adamw bf16 unit test and the way to get data type (PaddlePaddle#60565)

* Fix some PADDLE_THROW error type and change test cases (PaddlePaddle#60487)

* fix error type

* fix TypeError

fix type

fix

fix

fix

fix

* fix typo

* as_complex as_real check_grad (PaddlePaddle#60666)

* [Fix Bug] Fix Bugs of Two Pass (PaddlePaddle#60626)

* [Fix Bug] Fix Bugs of Two Pass

* Fix GenerateShapeOp bug

* Modify unit test

* Fix MakeGetterDimExpr4SymbolName

* 【Hackathon 5th No.34】为 Paddle 新增 bitwise_right_shift / bitwise_right_shift_ / bitwise_left_shift / bitwise_left_shift_ API (PaddlePaddle#58092)

* This PR enable offset of generator for custom device. (PaddlePaddle#60616)

* [SOT] Convert dtype to `DataType` in PIR mode (PaddlePaddle#60627)

* [PIR] Change output to block_arg from copy to a shared for the execution of while (PaddlePaddle#60607)

* test

* fix

* fix

* fix

* 【auto parallel】custom op spmd infer add args check (PaddlePaddle#60633)

* add bound check

* add bound check

* [PIR] Open PIR flag for test_ifelse (PaddlePaddle#60685)

* open pir flag for test_ifelse

* Update test_ifelse.py

* Update test_ifelse.py

* [CIN+PIR]Fix SplitOpPattern Bug in pd_to_cinn_pass (PaddlePaddle#60669)

* [CIN+PIR]Fix SplitOpPattern Bug in pd_to_cinn_pass

* fix index error

* refine pir_all_path UT

* fix bug

* fix uncontiguous tensor resize bug (PaddlePaddle#60684)

* fix uncontiguous tensor resize bug

* [PIR]Support inplace  custom op in pir (PaddlePaddle#60529)

* support inplace in pir

* fix inference ut

* fix win bugs

* fix win bug

* fix

* polish code

* polish code

* print log

* print log

* debug

* fix win bugs

* fix windows

* fix (PaddlePaddle#60634)

* [Docs] Update latest release version in README (PaddlePaddle#60691)

* [CINN] Refine cmake for pass in cinn (PaddlePaddle#60683)

* refine cmake for pass in cinn

* add dependency in cmake

* add dependency in cmake

* [PIR]Open uts for PReLU (PaddlePaddle#60645)

* [PIR]Open uts for ReLU6 (PaddlePaddle#60650)

* [PIR]Open uts for RReLU (PaddlePaddle#60660)

* [NPU] fix storage_properties type mismatch with OneDNN and NPU (PaddlePaddle#60566)

* fix ttfnet_darknet53_1x_coco in pir mode (PaddlePaddle#60663)

* [auto parallel] shard tensor stop gradient support (PaddlePaddle#60699)

* [PIR][DynamicShape] Polish some codes (PaddlePaddle#60651)

att, polish some codes

* [PIR] fix onednn double reg (PaddlePaddle#60720)

* fix onednn double reg

* 【pir】modify add_n in while use blockarg instead of input value (PaddlePaddle#60668)

* test

* fix

* fix

* fix

* modify add_n block_arg

* modify increment return value

* merge

* modfiy whiel_op.py

---------

Co-authored-by: zhangbo9674 <zhangbo54@baidu.com>

* [PIR] Open test_case ut (PaddlePaddle#60721)

* fix

* fix

* [PIR] rename data_layout (PaddlePaddle#60678)

* rename data_layout

* [xpu]: check op is null (PaddlePaddle#60656)

* 【Hackathon 5th No.1】 为 Paddle 新增 copysign API (PaddlePaddle#57785)

* add copysign op

* fix codestyle

* codestyle

* fix test

* fix std bug

* merge init

* merge init

* merge init

* add static cast

* add std

* static cast

* static cast

* copysignf

* static cast to float input

* float input

* static cast to double input

* fix

* add inplace test

* fix api

* fix cast when grad

* modify paddle.cast_ to cast_

* remove cast in python api

* support fp16 && bf16

* set grad y to zero

* fix en doc

* support number input

* add hostdevice

* refactor kernel

* fix nan when backward

* add broadcast unit test

* modify .cu

* Update __init__.py

* Update __init__.py

* for ci test

* static float

* codestyle

* static double

* fix broadcast, try coverage

* Delete paddle/phi/kernels/funcs/broadcast_function.h

* remove unused

* Update math.py

* Update math.py

* fix en doc

* add test for output dtype, integer unsupported for now

* update

* update

* fix

* fix

* add cast for input

* fix

* add pir test

* fix doc

* fix doc

* fix doc

* detail doc

* adjust for MSVC

* fix

* Update python/paddle/tensor/math.py

Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>

* Update python/paddle/tensor/math.py

Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>

* fix doc output dtype, fix Equation

* codestyle

* codestyle

* Update math.py

---------

Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>

* rms_norm_infer_spmd (PaddlePaddle#60709)

* [PIR]Open more tests for bernoulli and celu (PaddlePaddle#60706)

* bernoulli && celu

* celu test_error

* [PIR]Open uts for scatter_nd_add (PaddlePaddle#60698)

* [PIR]Open uts for scatter_nd_add

* Fix ut

* [PIR]Open uts for sinh (PaddlePaddle#60714)

* [PIR]Open uts for Softshrink and Softsign (PaddlePaddle#60716)

* [PIR] polish the ir_mapping implimentation. (PaddlePaddle#60675)

* [PIR] fix onednn layout transform yaml format (PaddlePaddle#60680)

* fix onednn layout transform yaml format

* 【CINN】Complete error handler mechanism of dynamic schedule (PaddlePaddle#60718)

* complete error handler mechanism of dynamic schedule

* fix some output info

* fix windows C++17 bug (PaddlePaddle#60736)

* [XPU] fc pass and delete pass nodes check (PaddlePaddle#60314)

* fix_local_windows_compile (PaddlePaddle#60682)

* [PIR] fix onednn dialect name (PaddlePaddle#60665)

* fix onednn dialect name

* 【pir】add tesnor to array kernel etc (PaddlePaddle#60703)

* merge

* modfiy kernel

* modify net

* modify print

* Fix defition definition (PaddlePaddle#60679)

* cholesky and cholesky_solve tests (PaddlePaddle#60726)

* [PIR]Open uts for searchsorted (PaddlePaddle#60700)

* [PIR]Open uts for selu (PaddlePaddle#60702)

* [PIR]Open uts for selu

* Fix ut

* [PIR]Open uts for sequence_mask (PaddlePaddle#60704)

* [PIR] adjust pir pass log printing (PaddlePaddle#60723)

* adjust pir pass log printing

* update

* update

* update

* fix compile

* Fix Throughtput Throughput (PaddlePaddle#60741)

* please last md (PaddlePaddle#60749)

* [CINN+PIR]Fix Fetch XShape Variable logic (PaddlePaddle#60722)

* [PIR][DynamicShape] Remove redundant code for shapeAnalysis and shapedTypeInterface (PaddlePaddle#60744)

att, remove redundant code for shapeAnalysis and shapedTypeInterface

* 【PIR Dist Op Reg No.1】 reg push_sparse_v2 (PaddlePaddle#60473)

* code reg push_sparse_v2

* [Dynamic Shape] Provide operator<< For BroadcastTree (PaddlePaddle#60730)

* [PIR] change IR clone to const and support clone operation successors (PaddlePaddle#60752)

* support ir clone const and support clone operation successors

* refine ir_mapping

* refine region clone

* [CINN] Refine fully_insert_broadcast_pass (PaddlePaddle#60676)

* refine fully_insert_broadcast_pass

* fix complie bug

* fix complie

* fix conflict

* [PIR] einsum's inner_cache and xshape set to optional (PaddlePaddle#60748)

* einsum's inner_cache and xshape set to intermediate

* Update paddle/fluid/pir/dialect/operator/ir/ops.yaml

---------

Co-authored-by: kangguangli <kangguangli@hotmail.com>

* reduce runtime of unit-tests in windows-trt (PaddlePaddle#60731)

* modify trt test to deal with Timeout

* windows

* [Paddle-TRT] upgrade EnqueueV2 to EnqueueV3 (PaddlePaddle#59950)

* 【Hackathon 5th No.110】为 Paddle 增强 sparse.matmul API (PaddlePaddle#59890)

* Fix rank_relatvie rank_relative (PaddlePaddle#60770)

* add graph_key to specific graph's varmap (PaddlePaddle#60567)

* add graph_key to specific graph's varmap

* fix inpalce case

* fix inpalce case

* 【Hackathon 5th No.38】为 Paddle 新增 FractionalMaxPool2d / FractionalMaxPool3d API -kernel (PaddlePaddle#59847)

* [Init] add fractional max pool kernel and api

* [Fix] pooling.cu seed offset

* [Change] remove adaptive from fractional max pool

* [Change] fractional max 2d gpu pooling.cu grad

* [Change] fractional max 2d gpu pooling.cu grad with dim3

* [Change] use UnchangedInferMeta

* [Change] test api with uint16

* [Change] wrap test disable_static

* [Change] regiester float16/bfloat16

* [Change] remove bfloat16 from cpu kernrl

* [Change] test dtypes in cpu and gpu

* [Change] test_fractional_max_pool3d_2d/3d timeout to 30s

* [Fix] resolve conflict

* [Change] win32 cannot detect bfloat16 correctly

* [Change] force set_device

* [Add] test random_u is None

* [Change] use kernel_size for overlapping mode

* [Change] clean headers

* [CodeStyle] pooling

* [Change] rename op

* [Change] rename func without index

* [Prim][PIR] Recover pir bn (PaddlePaddle#60689)

* reopen bn prim pir

* fix atol

* decomp support batch_norm_

* fix test case

* fix bug

* fix  code

* [PIR]fc_with_special_op_fuse_pass bug fix (PaddlePaddle#60751)

* bug fix

update

* update

* delete all debug message

* add code deleted wrong at last commit

* delete createAutoMixedPrecisionPass in analysis_predictor.cc

---------

Co-authored-by: HongyuJia <jiahongyu@baidu.com>
Co-authored-by: ooo oo <106524776+ooooo-create@users.noreply.github.com>
Co-authored-by: SigureMo <sigure.qaq@gmail.com>
Co-authored-by: zhaoyingli <86812880+zhaoyinglia@users.noreply.github.com>
Co-authored-by: xingmingyyj <135400902+xingmingyyj@users.noreply.github.com>
Co-authored-by: JYChen <zoooo0820@qq.com>
Co-authored-by: Yuang Liu <liuyuang@baidu.com>
Co-authored-by: zhangbo9674 <82555433+zhangbo9674@users.noreply.github.com>
Co-authored-by: YuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: kevin <chengyf112@gmail.com>
Co-authored-by: wanghuancoder <wanghuan29@baidu.com>
Co-authored-by: kangguangli <kangguangli@hotmail.com>
Co-authored-by: zhangyuqin1998 <75946871+zhangyuqin1998@users.noreply.github.com>
Co-authored-by: co63oc <co63oc@users.noreply.github.com>
Co-authored-by: NeroLoh <745827440@qq.com>
Co-authored-by: 傅剑寒 <Xs1580802568@gmail.com>
Co-authored-by: lzydev <lizhiyu02@baidu.com>
Co-authored-by: tianshuo78520a <707759223@qq.com>
Co-authored-by: houj04 <35131887+houj04@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
Co-authored-by: LiYuRio <63526175+LiYuRio@users.noreply.github.com>
Co-authored-by: 张春乔 <83450930+Liyulingyue@users.noreply.github.com>
Co-authored-by: xiaoguoguo626807 <100397923+xiaoguoguo626807@users.noreply.github.com>
Co-authored-by: winter-wang <1030748926@qq.com>
Co-authored-by: BiynXu <62832681+BiynXu@users.noreply.github.com>
Co-authored-by: cyber-pioneer <116002591+cyber-pioneer@users.noreply.github.com>
Co-authored-by: Vigi Zhang <VigiZhang@users.noreply.github.com>
Co-authored-by: zbt78 <1095497213@qq.com>
Co-authored-by: liuzhenhai93 <liuzhenhai93@outlook.com>
Co-authored-by: Aurelius84 <zhangliujie@baidu.com>
Co-authored-by: Bo Zhang <105368690+zhangbopd@users.noreply.github.com>
Co-authored-by: Lu Qi <61354321+MarioLulab@users.noreply.github.com>
Co-authored-by: LoneRanger <836253168@qq.com>
Co-authored-by: freeliuzc <lzc842650834@gmail.com>
Co-authored-by: YibLiu <68105073+YibinLiu666@users.noreply.github.com>
Co-authored-by: engineer1109 <jialiang.wang@xdxct.com>
Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com>
Co-authored-by: xuxinyi389 <104957571+xuxinyi389@users.noreply.github.com>
Co-authored-by: MayYouBeProsperous <ljmhz@outlook.com>
Co-authored-by: Huihuang Zheng <zhhsplendid@163.com>
Co-authored-by: gouzil <66515297+gouzil@users.noreply.github.com>
Co-authored-by: 6clc <chaoliu.lc@foxmail.com>
Co-authored-by: Terry <38135104+TR666@users.noreply.github.com>
Co-authored-by: winter-wang <78149749+winter-wang@users.noreply.github.com>
Co-authored-by: Wang Xin <xinwang614@gmail.com>
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>
Co-authored-by: Frank Lin <eee4017@gmail.com>
Co-authored-by: pangengzheng <117730991+pangengzheng@users.noreply.github.com>
Co-authored-by: lanxianghit <47554610+lanxianghit@users.noreply.github.com>
Co-authored-by: Tian Zheng <tizheng@nvidia.com>
Co-authored-by: lijialin03 <124568209+lijialin03@users.noreply.github.com>
Co-authored-by: Wangzheee <634486483@qq.com>
Co-authored-by: zhink <33270771+zhink@users.noreply.github.com>
Co-authored-by: huangjiyi <43315610+huangjiyi@users.noreply.github.com>
Co-authored-by: Chen Zhiyang <1792266893@qq.com>
Co-authored-by: feifei-111 <2364819892@qq.com>
Co-authored-by: fsczz <57291768+fsczz@users.noreply.github.com>
Co-authored-by: Haohongxiang <86215757+haohongxiang@users.noreply.github.com>
Co-authored-by: Sonder <55493212+AndSonder@users.noreply.github.com>
Co-authored-by: Liujie0926 <44688141+Liujie0926@users.noreply.github.com>
Co-authored-by: WangZhen <23097963+0x45f@users.noreply.github.com>
Co-authored-by: risemeup1 <62429225+risemeup1@users.noreply.github.com>
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>
Co-authored-by: zhangyikun02 <48021248+zhangyk0314@users.noreply.github.com>
Co-authored-by: Jianbang Yang <yangjianbang112@gmail.com>
Co-authored-by: enzodechine <enzo9533@hotmail.com>
Co-authored-by: Zhan Rongrui <46243324+zrr1999@users.noreply.github.com>
Co-authored-by: coco <69197635+cocoshe@users.noreply.github.com>
Co-authored-by: zhaohaixu <49297029+zhaohaixu@users.noreply.github.com>
Co-authored-by: chen2016013 <111894720+chen2016013@users.noreply.github.com>
Co-authored-by: zyfncg <zhangyunfei07@baidu.com>
Co-authored-by: Qi Li <qili93@qq.com>
Co-authored-by: zhangbo9674 <zhangbo54@baidu.com>
Co-authored-by: Liuyinfeng <30849840+gitliuyf@users.noreply.github.com>
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
Co-authored-by: wendaxiao <113992173+wenxiaohahaha@users.noreply.github.com>
Co-authored-by: cyberslack_lee <luhputu0815@gmail.com>
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com>
Co-authored-by: GGBond8488 <33050871+GGBond8488@users.noreply.github.com>
Co-authored-by: megemini <megemini@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants